DeepDoc is a local research tool designed to analyze and extract insights from your private files. It processes a variety of formats—including PDFs, Word documents, scans, and text files—by "chunking" the content and indexing it in a vector database. When you submit a query in plain English, DeepDoc generates a content outline and deploys a multi-agent AI system to search your local data, refine the findings, and synthesize the information. The result is a structured, insight-dense Markdown report. Because the system runs entirely on your hardware, no data is uploaded to the cloud, and you maintain full control over the LLM configurations.
How It Works Under the Hood
The Workflow Pipeline
The Local DeepResearcher module operates on two inputs: your files and your instructions.
Deploying DeepDoc Locally
Prerequisite: Install uv
DeepDoc uses uv to manage virtual environments and dependencies. Download it from the official uv GitHub repository and follow the installation instructions for your operating system.
1. Clone the Repository
git clone https://github.com/Datalore-ai/deepdoc.git
cd deepdoc
2. Create a Virtual Environment
uv venv
3. Activate the Environment
.venv\Scripts\activate
source .venv/bin/activate
4. Set Environment Variables Copy the template file and configure your keys:
cp .env.example .env
You must add your API keys for the tool to function:
MISTRAL_API_KEY=
TAVILY_API_KEY=
OPENAI_API_KEY=
# Default settings
QDRANT_URL=http://localhost:6333
COLLECTION_NAME=knowledge_base
EMBEDDING_MODEL=BAAI/bge-small-en-v1.5
QDRANT_DISABLE_THREADING=true
5. Install Dependencies
uv pip install -r requirements.txt
6. Start Qdrant with Docker Ensure Docker and Docker Compose are running, then start the services:
docker-compose up --build
This initializes the vector database in the background.
7. Launch the Application
python main.py
The CLI will guide you through the dataset creation process. Completed reports are saved in the output_files folder.
Customizing Behavior
You can modify DeepDoc’s operational logic in configuration.py. Two main blocks control the system's behavior:
import uuid
# LLM settings
LLM_CONFIG = {
"provider": "openai",
"model": "gpt-4o-mini",
"temperature": 0.5, # Adjust for creativity vs. precision
}
# Research loop parameters
THREAD_CONFIG = {
"configurable": {
"thread_id": str(uuid.uuid4()),
"max_queries": 3,
"search_depth": 2,
"num_reflections": 2,
"n_points": 1,
}
}
You can switch models, adjust the "temperature" to control randomness, or increase num_reflections to force the agents to perform more rigorous quality checks on the data they retrieve.
ETF Grid Trading Strategy Design Tool: Smart Parameters & Risk Control
TradingAgents-MCP: A 15-Agent AI Framework for Real-Time Stock Analysis
ShareGPT-4o-Image & Janus-4o: Open-Source Models Reaching GPT-4o Output Quality
NotebookLlama: An Open-Source NotebookLM Alternative with AI Voice
BuildAdmin: Vue 3 + ThinkPHP 8 Admin Panel with CRUD Generator
Apple Doc MCP: SwiftUI & UIKit Documentation for Cursor & Claude
How to Install and Use Vosk Offline Speech Recognition
TypeAgent: Build AI Agents With Structured Memory and Human-in-the-Loop
Anyi VPN Review: Free 365-Day Trial with No Data Caps or Ads
Agent-MCP: Building Multi-Agent Systems with the Model Context Protocol
SuperCoder: A Terminal-Based Coding Assistant for Searching, Editing, and Debugging
Lapce: A Fast, Rust-Powered Code Editor with Remote Development