Graph-Code is a multilingual, graph-based Retrieval-Augmented Generation (RAG) system designed to make complex codebases searchable through natural language. It leverages Tree-sitter to parse the Abstract Syntax Trees (AST) of repositories written in Python, JavaScript, TypeScript, Rust, Go, Scala, and Java. Because the parser is language-agnostic, it can consistently extract structural data, functional relationships, and external dependencies, storing them within a Memgraph database as a unified knowledge graph.
By integrating Google Gemini or local models via Ollama, Graph-Code translates natural language questions into precise Cypher queries. This allows developers to explore internal code relationships intuitively, retrieve specific source snippets, and navigate nested logic without manual grep-ing or constant context switching. The system maintains a consistent schema across all supported languages, ensuring a uniform experience regardless of the stack.
Extensive Language Support: Compatible with Python, JavaScript, TypeScript, Rust, Go, Scala, and Java repositories.
Tree-sitter Parsing: Utilizes reliable, language-agnostic AST extraction to map code structure.
Knowledge Graph Storage: Stores code elements and their interconnections as nodes and edges within Memgraph.
Natural Language Interface: Allows users to interrogate a codebase using plain English queries.
Automated Cypher Generation: Converts English prompts into graph queries using cloud models (Google Gemini) or local alternatives (Ollama).
Source Snippet Retrieval: Directly fetches the implementation code for identified functions, methods, or classes.
Dependency Mapping: Analyzes pyproject.toml and similar files to map external package dependencies.
Nested Logic Handling: Accurately represents complex class hierarchies and nested function definitions.
Unified Schema: All supported languages are mapped to a standardized graph model.
The system is divided into two primary modules:
codebase_rag/): An interactive command-line interface for querying the knowledge graph.Tree-sitter Integration: Performs language-agnostic parsing through dedicated grammars.
Graph Database: Memgraph serves as the storage layer for code nodes and their relationships.
LLM Integration: Supports Google Gemini for cloud-based processing and Ollama for local execution.
Code Analysis Engine: Employs advanced AST traversal to identify and link code elements across different languages.
Query Utilities: Specialized tools for executing graph searches and retrieving source code.
Configurable Mappings: Language-specific parsing rules are fully configurable.
uv package manager for dependency handling.Clone the repository:
git clone https://github.com/vitali87/code-graph-rag.git
cd code-graph-rag
Install dependencies:
For standard Python support:
uv sync
For full multilingual support:
uv sync --extra treesitter-full
This installs Tree-sitter grammars for: Python (.py), JavaScript (.js, .jsx), TypeScript (.ts, .tsx), Rust (.rs), Go (.go), Scala (.scala, .sc), and Java (.java).
Configure environment variables:
cp .env.example .env
# Edit .env with your specific configuration
Model Configuration Options:
Option 1: Cloud Model (Gemini)
# .env file
LLM_PROVIDER=gemini
GEMINI_API_KEY=your_gemini_api_key_here
You can obtain a free API key from Google AI Studio.
Option 2: Local Model (Ollama)
# .env file
LLM_PROVIDER=local
LOCAL_MODEL_ENDPOINT=http://localhost:11434/v1
LOCAL_ORCHESTRATOR_MODEL_ID=llama3
LOCAL_CYPHER_MODEL_ID=llama3
LOCAL_MODEL_API_KEY=ollama
To set up Ollama:
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Download a model
ollama pull llama3
# Other options include: llama3.1, mistral, or codellama
Local models ensure data privacy and eliminate API costs, though accuracy may vary compared to Gemini.
Start Memgraph:
docker-compose up -d
Ingest a codebase into the knowledge graph.
To initialize a new graph:
python -m codebase_rag.main --repo-path ../../ai-engineering-hub/video-rag-gemini --update-graph --clean
To add more repositories to the existing graph:
python -m codebase_rag.main --repo-path /path/to/repo2 --update-graph
python -m codebase_rag.main --repo-path /path/to/repo3 --update-graph
The system automatically identifies languages based on file extensions.
Launch the interactive RAG command-line tool:
python -m codebase_rag.main --repo-path /path/to/your/repo
You can toggle between cloud and local providers via CLI arguments:
Run with a local model:
python -m codebase_rag.main --repo-path /path/to/your/repo --llm-provider local
Run with a cloud model:
python -m codebase_rag.main --repo-path /path/to/your/repo --llm-provider gemini
Specify custom model IDs:
# Using specific local models
python -m codebase_rag.main --repo-path /path/to/your/repo \
--llm-provider local \
--orchestrator-model llama3.1 \
--cypher-model codellama
# Using specific Gemini models
python -m codebase_rag.main --repo-path /path/to/your/repo \
--llm-provider gemini \
--orchestrator-model gemini-2.0-flash-thinking-exp-01-21 \
--cypher-model gemini-2.5-flash-lite-preview-06-17
--llm-provider: Select either gemini or local.--orchestrator-model: The model responsible for RAG orchestration.--cypher-model: The model dedicated to generating Cypher queries.The knowledge graph is organized using specific node types and relationships.
__init__.py).function_definition, class_definitionfunction_declaration, arrow_function, class_declarationfunction_item, struct_item, enum_item, impl_itemfunction_declaration, method_declaration, type_declarationfunction_definition, class_definition, object_definition, trait_definitionmethod_declaration, class_declaration, interface_declaration, enum_declarationCONTAINS_PACKAGE / CONTAINS_MODULE / CONTAINS_FILE / CONTAINS_FOLDER: Defines the file system hierarchy.DEFINES: Indicates that a module contains a specific class or function.DEFINES_METHOD: Indicates that a class contains a specific method.DEPENDS_ON_EXTERNAL: Links the project to its external dependencies.System settings are managed through environment variables in the .env file.
LLM_PROVIDER: Set to "gemini" for cloud or "local" for local models.GEMINI_API_KEY: Required for Gemini access.GEMINI_MODEL_ID: The primary orchestrator (Default: gemini-2.5-pro-preview-06-05).MODEL_CYPHER_ID: The Cypher generation model (Default: gemini-2.5-flash-lite-preview-06-17).LOCAL_MODEL_ENDPOINT: Ollama API URL (Default: http://localhost:11434/v1).LOCAL_ORCHESTRATOR_MODEL_ID: Orchestration model (Default: llama3).LOCAL_CYPHER_MODEL_ID: Cypher generation model (Default: llama3).LOCAL_MODEL_API_KEY: API key for local usage (Default: ollama).MEMGRAPH_HOST: Database hostname (Default: localhost).MEMGRAPH_PORT: Database port (Default: 7687).TARGET_REPO_PATH: The default repository path (Default: .).tree-sitter: The core parsing engine.tree-sitter-{language}: Individual grammars for supported languages.pydantic-ai: The framework used for AI agent orchestration.pymgclient: Python client for Memgraph.loguru: For structured logging.python-dotenv: For managing environment variables.| Language | Extensions | Functions | Classes/Structs | Modules | Package Detection |
|---|---|---|---|---|---|
| Python | .py |
✅ | ✅ | ✅ | __init__.py |
| JavaScript | .js, .jsx |
✅ | ✅ | ✅ | - |
| TypeScript | .ts, .tsx |
✅ | ✅ | ✅ | - |
| Rust | .rs |
✅ | ✅ (struct/enum) | ✅ | - |
| Go | .go |
✅ | ✅ (struct) | ✅ | - |
| Scala | .scala, .sc |
✅ | ✅ (class/obj) | ✅ | Package decl. |
| Java | .java |
✅ | ✅ (class/intf) | ✅ | Package decl. |
# Base Python support only
uv sync
# Full multilingual support (recommended)
uv sync --extra treesitter-full
# Add specific languages manually
uv add tree-sitter-python tree-sitter-rust tree-sitter-go
The system is configuration-driven. Each language is defined in codebase_rag/language_config.py, which specifies file extensions, node types, and naming conventions. Adding support for a new language typically only requires updates to this configuration file.
docker-compose ps.7687.http://localhost:3000 to inspect the graph directly.ollama list.ollama pull llama3.curl http://localhost:11434/v1/models.
DeepSeek-OCR WebUI: Batch OCR with Markdown Tables and Visual Bounding Boxes
Web Codegen Scorer: Test AI-Generated Web Code Quality Before You Ship
Parlant: Build AI Agents That Follow Rules, Not Prompts
ntopng Network Traffic Monitor: Identify Bandwidth Consumption and Network Bottlenecks
OxyGent: Build Multi-Agent Systems That Learn and Scale Without YAML
ZeroGraph TS: A 300-Line TypeScript Framework for AI Agent Coding
Agents From Scratch: AI Email Assistant with Human-in-the-Loop Approval
AingDesk: Run Local AI Models and Build a Private Knowledge Base
AI看线: A-Share Analysis with K-Line Charts and Gemini AI Forecasts
GraphGen: Build Knowledge Graphs to Generate Smarter Training Data
Xiaomi MiMo-7B: Built From Scratch for Math and Code Reasoning
How to Install Unregistered Apps on Android