Fast RAG is a local, privacy-focused Retrieval-Augmented Generation system. It is built on FastAPI and utilizes PostgreSQL with pgvector for dense vector search, complemented by pg_trgm for sparse lexical matching. This combination creates a hybrid search engine capable of retrieving relevant text with much higher precision than using either method in isolation.
Docling manages the heavy lifting of document parsing, supporting PDFs, DOCX files, images, and PPTX decks. LangGraph provides the structure for the ingestion and query pipelines, ensuring a reliable flow of data. For the LLM layer, you can run local models via Ollama or plug in an OpenAI API key if preferred. The system supports real-time response streaming through Server-Sent Events (SSE). While a React-based frontend is included in the repository, the core is designed to be flexible and open to custom modifications.
Runs Locally. Your data and models never leave your hardware. There are no cloud dependencies and no privacy trade-offs to manage.
Hybrid Retrieval. By merging vector similarity with keyword search, the system identifies both conceptual matches and exact terminology within a single query.
Broad File Support. You can process PDFs, Word documents, PowerPoint decks, or images. Docling converts these various formats into plain text before the chunking process begins.
Fast Setup. A single Docker command initializes the entire stack, including the pgvector database. You also have the option to run it directly on your host machine.
Clean API. The system offers standard REST endpoints and SSE streaming, making it simple to build a custom UI or integrate the backend into your existing toolset.
Optional Frontend. The bundled React application provides a functional, ready-to-use interface right out of the box.
1. Install Python packages and download models
pip install -r requirements.txt
docling-tools models download
2. Configure environment variables
cp env.example .env
# Define your database connection strings in the .env file
3. Initialize the database (Skip this step if you are using Docker)
python scripts/init_db.py
4. Launch the server
python main.py
Once the server is active, you can access these addresses:
• Application: http://localhost:8000
• API Documentation: http://localhost:8000/docs
• Redoc: http://localhost:8000/redoc
The repository includes a comprehensive Docker configuration with pgvector pre-configured. A single command stands up both the database and the API server, allowing you to begin uploading documents immediately.
To run the React interface separately:
cd frontend-app
npm install
npm run dev
The development server will start at http://localhost:5173, communicating with the FastAPI backend running on port 8000.
Fast RAG operates using a modular pipeline:
Document Flow: Upload → Convert → Split → Embed → Store.
Search Layer: The system combines vector similarity scores and lexical keyword hits to return the most contextually relevant document chunks.
Answer Generation: The retrieved context is passed to your selected LLM to generate a final answer. Streaming is enabled by default to reduce perceived latency.
LangGraph Integration: The orchestration layer remains transparent. You can inspect every step of the process or extend the graph with custom nodes to suit your specific workflow.
DupCheck: Open-Source Image Duplication & Tampering Detection (Python)
Dianman VPN: Free Trial, Unlimited Data & Zero Throttling
AI Presentation Generator: An Open-Source Gamma Alternative for Slide Decks
Yank Note Review: A Hackable Markdown Editor That Runs Code
Google Analytics MCP Server: Query GA4 Data With Gemini CLI
Zen Browser: about:config Tweaks, 1Password Setup, and Customization Guide
Scira: The Minimalist AI Search Engine for Grok, Claude, and Beyond
n8n-MCP: Give Claude Access to 525+ n8n Nodes in Minutes
AingDesk: Run Local AI Models and Build a Private Knowledge Base
Xiaomi MiMo-7B: Built From Scratch for Math and Code Reasoning
Wasteland SLG Guide: Survival Tips & Alliance Strategy
How to Install Unregistered Apps on Android