Secure RAG infrastructure for document intelligence, citations, access control, evaluation, and observability.
ForgeRAG is not a "chat with PDF" toy. It is a production-style Retrieval Augmented Generation platform where an organization can upload documents, index them asynchronously, ask grounded questions, receive cited answers, restrict access by workspace and document permissions, inspect failures, and evaluate answer quality over time.
Teams want AI search over private documents, but a useful enterprise system needs more than a prompt and a file upload. It needs durable ingestion, tenant isolation, authorization, source citations, audit trails, evaluation, observability, and deployment discipline.
ForgeRAG is designed to show the infrastructure behind enterprise AI search:
- Documents are ingested asynchronously.
- Answers must be grounded in retrieved chunks.
- Answers must include citations.
- Retrieval must be observable.
- Every query must be logged.
- Ingestion failures must be visible.
- Access control matters from the start.
The current implementation includes:
- Go API service
- Workspace owner registration
- Workspace-scoped API key authentication
- Role-based access control for document APIs
- PDF, text, and markdown upload
- Local document storage with checksum and file metadata
- Durable ingestion jobs with worker leasing, retries, backoff, attempt history, and dead-letter handling
- Worker-based text extraction for PDF, text, and markdown files
- Cleaned document chunks with overlap, page metadata, and token estimates
- Chunk embeddings generated during ingestion
- Local deterministic embedding provider for development and tests
- OpenAI-compatible embedding provider configuration
- Tenant-filtered pgvector semantic search endpoint
- Persisted
document_chunksrecords with a workspace-scoped inspection endpoint - Postgres connection and readiness check
- SQL migrations with pgvector enabled and HNSW indexing
- Docker Compose stack
- Structured JSON logs
- Request IDs
- Consistent API error responses
- Workspace and tenant-scoped document metadata endpoints
- Architecture and deployment documentation
Future milestones add answer generation, citations, query history, evaluation, observability dashboards, and frontend screens.
flowchart LR
User[User or API Client] --> API[Go API Service]
API --> DB[(Postgres + pgvector)]
API --> Embed[Embedding Provider]
API --> Storage[Document Storage]
API --> Jobs[Durable Job Queue]
Jobs --> Worker[Go Ingestion Worker]
Worker --> Storage
Worker --> Extract[Text Extraction]
Worker --> Chunk[Chunking]
Worker --> Embed
Worker --> DB
API --> LLM[Chat Model Provider]
API --> Telemetry[Logs, Metrics, Traces]
- API service: authentication, workspaces, documents, search, ask, feedback, admin inspection
- Worker service: document extraction, chunking, embedding, indexing, retries, failure classification
- Postgres: transactional data, pgvector semantic index, query history, audit logs
- Storage: original uploaded documents and extracted artifacts
- Evaluation runner: repeatable quality tests for retrieval and answers
- Dashboard: document status, query history, citations, feedback, evaluation, health
GET /health
GET /ready
POST /auth/register
POST /api/v1/workspaces
POST /api/v1/documents
GET /api/v1/documents
GET /api/v1/documents/{id}
GET /api/v1/documents/{id}/chunks
POST /api/v1/search
GET /api/v1/admin/jobs
GET /api/v1/admin/jobs/{id}
POST /api/v1/admin/jobs/{id}/retryPlanned API:
POST /auth/login
GET /workspaces/{id}
DELETE /documents/{id}
POST /ask
GET /queries
GET /queries/{id}
POST /queries/{id}/feedback
POST /eval/datasets
POST /eval/cases
POST /eval/runs
GET /eval/runs/{id}Core tables:
users
workspaces
workspace_members
api_keys
documents
document_versions
document_files
document_chunks
document_permissions
document_collections
collection_members
jobs
job_attempts
rag_queries
rag_query_retrievals
rag_feedback
eval_datasets
eval_cases
eval_runs
eval_results
audit_logsForgeRAG treats failures as product-visible infrastructure events, not hidden logs.
Current ingestion and retrieval failure classes:
document_parse_failed
storage_failed
embedding_failed
chunk_persist_failed
search_failed
database_unavailable
job_lease_expiredPlanned provider and answer failure classes:
provider_timeout
llm_failed
permission_denied
rate_limited
insufficient_contextRequirements:
- Docker
- Go 1.24 or newer
Start the full local stack:
docker compose up --buildRun the API directly:
cd backend
go run ./cmd/apiRun the ingestion worker directly:
cd backend
go run ./cmd/workerThe API listens on http://localhost:8080 by default.
Health checks:
curl http://localhost:8080/health
curl http://localhost:8080/readyRegister an owner, workspace, and first API key:
curl -X POST http://localhost:8080/auth/register \
-H "Content-Type: application/json" \
-d "{\"email\":\"owner@example.com\",\"name\":\"Acme Owner\",\"workspace_name\":\"Acme Finance\"}"The api_key.token value is shown only once. Use it as a bearer token for workspace APIs.
Create another workspace:
curl -X POST http://localhost:8080/api/v1/workspaces \
-H "Authorization: Bearer <api_key_token>" \
-H "Content-Type: application/json" \
-d "{\"name\":\"Acme Legal\"}"Create a document metadata record:
curl -X POST http://localhost:8080/api/v1/documents \
-H "Authorization: Bearer <api_key_token>" \
-H "Content-Type: application/json" \
-d "{\"workspace_id\":\"<workspace_id>\",\"title\":\"Refund Policy\",\"source_type\":\"manual\"}"Upload a PDF, text, or markdown document:
curl -X POST http://localhost:8080/api/v1/documents \
-H "Authorization: Bearer <api_key_token>" \
-F "workspace_id=<workspace_id>" \
-F "title=Refund Policy" \
-F "file=@./refund-policy.pdf"Uploaded documents are stored locally under FORGERAG_STORAGE_DIR, and the database records file name, storage URI, content type, size, checksum, version, document status, and queued ingestion job. The worker resolves the stored file URI, extracts text, chunks it with overlap, embeds each chunk, persists document_chunks, and updates document status based on the job result.
Inspect extracted chunks for a document:
curl http://localhost:8080/api/v1/documents/<document_id>/chunks \
-H "Authorization: Bearer <api_key_token>"Search indexed chunks semantically:
curl -X POST http://localhost:8080/api/v1/search \
-H "Authorization: Bearer <api_key_token>" \
-H "Content-Type: application/json" \
-d "{\"query\":\"What is the refund window?\",\"top_k\":5}"Inspect ingestion jobs as an admin or owner:
curl http://localhost:8080/api/v1/admin/jobs \
-H "Authorization: Bearer <api_key_token>"ForgeRAG defaults to FORGERAG_EMBEDDING_PROVIDER=local, which uses deterministic hash embeddings. This keeps local development, tests, and demos working without paid credentials.
For production-style embeddings, set:
FORGERAG_EMBEDDING_PROVIDER=openai
FORGERAG_EMBEDDING_MODEL=text-embedding-3-small
FORGERAG_OPENAI_API_KEY=<your_api_key>The current pgvector column is vector(1536), so FORGERAG_EMBEDDING_DIMENSIONS must remain 1536 unless you add a matching database migration.
Copy .env.example and adjust values as needed.
Important variables:
FORGERAG_ENV
FORGERAG_HTTP_ADDR
FORGERAG_DATABASE_URL
FORGERAG_MIGRATIONS_DIR
FORGERAG_STORAGE_DIR
FORGERAG_MAX_UPLOAD_BYTES
FORGERAG_EMBEDDING_PROVIDER
FORGERAG_EMBEDDING_MODEL
FORGERAG_EMBEDDING_DIMENSIONS
FORGERAG_OPENAI_API_KEY
FORGERAG_OPENAI_BASE_URL
FORGERAG_WORKER_POLL_INTERVAL
FORGERAG_JOB_LEASE_DURATION
FORGERAG_JOB_RETRY_BACKOFF
FORGERAG_LOG_LEVELMilestone 2 uses workspace-scoped API keys tied to users. Registration creates a user, workspace, owner membership, and API key in one transaction. Document and search APIs require:
Authorization: Bearer <api_key_token>- membership in the API key workspace
- role checks for write operations
- tenant-scoped document and retrieval queries
The production path still adds:
- document permissions
- secret redaction
- rate limiting
- request and upload size limits
Planned evaluation dimensions:
- answer correctness
- groundedness
- citation presence
- retrieval relevance
- refusal correctness
- latency
- token usage and cost
The API already emits structured request logs with request IDs. Worker job attempts record ingestion failures and retry behavior. Search responses include embedding, retrieval, and total latency fields. Later milestones add OpenTelemetry-style spans and metrics across:
- HTTP requests
- ingestion jobs
- text extraction
- chunking
- embedding calls
- vector search
- LLM generation
- database operations
Milestone 1 is Docker Compose ready. Production deployment will add:
- GitHub Actions CI
- migration step
- container health checks
- secrets management notes
- backup and restore notes
- cloud deployment guide
- optional Prometheus and Grafana
- Milestone 0: product boundary and architecture
- Milestone 1: backend foundation
- Milestone 2: auth, tenants, and workspaces
- Milestone 3: document upload and storage
- Milestone 4: async ingestion queue
- Milestone 5: text extraction and chunking (implemented)
- Milestone 6: embeddings and vector search (implemented)
- Milestone 7: RAG answer generation with citations
- Milestone 8: query history, audit logs, and feedback
- Milestone 9: document-level access control
- Milestone 10: evaluation system
- Milestone 11: observability and tracing
- Milestone 12: frontend dashboard
- Milestone 13: DevOps and deployment
- Milestone 14: security and reliability hardening
- Milestone 15: public proof package