CodeIndexer equips AI-powered IDEs with advanced indexing and deep context awareness. By combining vector databases like Milvus with industry-standard embedding models, it indexes your entire codebase to enable natural language discovery. This eliminates the limitations of traditional keyword searches and the context constraints of LLMs when working on large-scale projects. Rather than hunting for specific terms, you can describe the functionality you need and locate it instantly. The toolkit features context-aware discovery, AST-based smart chunking to preserve code structure, incremental file synchronization, and compatibility with various embedding providers. It operates through a core engine, a dedicated VS Code extension, and an MCP server.
• LLMs possess limited context windows, often failing to process large codebases effectively. • Regex and keyword searches overlook the structural relationships between code components. • Many IDEs lack genuine context awareness, failing to recognize how disparate parts of a project connect. • Developers frequently lose time navigating complex repositories manually. • Traditional search tools cannot bridge the gap between human intent and the underlying source code.
• Context awareness – It recognizes the relationships between different modules and functions across the codebase. • Semantic search – You can use plain English queries, such as “find the function that handles user authentication,” to locate relevant code. • AI-driven logic – The system understands code intent and structural connectivity rather than just matching text strings. • Cross-platform utility – Support for the Model Context Protocol (MCP) and a VS Code extension ensures it fits into various development environments.
Semantic code search – Locate code based on its purpose. For example, search for “functions that interact with vector databases” without needing to know specific variable or function names.
Smart indexing – Automatically processes the entire codebase to build a semantic vector database enriched with structural context.
Context-aware discovery – Identifies related code snippets based on functional meaning rather than literal text matches.
Incremental file sync – Utilizes Merkle trees for efficient change detection, ensuring that only modified files are re-indexed.
Smart chunking – Employs AST-based splitting to maintain code logic and context, with an automatic fallback mechanism for unsupported formats.
Accelerated development – Reduces the time spent searching for existing logic, allowing more focus on active feature development.
Multiple embedding services – Compatible with OpenAI, VoyageAI, Ollama, and other leading providers.
Vector storage – Optimized for use with Milvus or Zilliz Cloud (fully managed).
VS Code integration – Includes a native extension designed to integrate into your existing coding workflow.
MCP support – Features a Model Context Protocol server to facilitate interactions with AI agents.
Progress tracking – Provides real-time status updates throughout the indexing process.
Customizable – Offers granular control over file extensions, ignore patterns, and choice of embedding models.
Available in three formats: a Chrome extension (for specific workflows), a VS Code extension, and an MCP server.
CodeIndexer uses a monorepo architecture consisting of three primary packages:
• @code-indexer/core – The central engine that manages embeddings and vector database integration.
• VSCode Extension – Provides semantic search capabilities directly within Visual Studio Code.
• @code-indexer/mcp – A Model Context Protocol server designed for AI agent communication.
Embedding services: OpenAI, VoyageAI, Ollama Vector databases: Milvus or Zilliz Cloud (fully managed) Code splitters: AST-based (with automatic fallback) and LangChain character splitters.
Languages: TypeScript, JavaScript, Python, Java, C++, C#, Go, Rust, PHP, Ruby, Swift, Kotlin, Scala, and Markdown.
Dev tools: VS Code, Model Context Protocol (MCP).
• Node.js ≥ 20.0.0 • pnpm ≥ 10.0.0 • An active Milvus database • An API key from OpenAI or VoyageAI
# Using npm
npm install @code-indexer/core
# Using pnpm
pnpm add @code-indexer/core
# Using yarn
yarn add @code-indexer/core
OpenAI API key – Obtain your key from the OpenAI dashboard and set:
OPENAI_API_KEY=your-openai-api-key
Milvus configuration – For Zilliz Cloud (the fully managed version of Milvus, which offers a free tier):
MILVUS_ADDRESS = your Zilliz Cloud instance’s public endpoint
MILVUS_TOKEN = your Zilliz Cloud token
MILVUS_ADDRESS=https://xxx-xxxxxxxxxxxx.serverless.gcp-us-west1.cloud.zilliz.com
MILVUS_TOKEN=xxxxxxx
If you are hosting your own Milvus instance, configure the address and token to match your local setup.
The @code-indexer/core package serves as the primary engine for embeddings, vector storage, and search operations.
import { CodeIndexer, MilvusVectorDatabase, OpenAIEmbedding } from '@code-indexer/core';
const embedding = new OpenAIEmbedding({
apiKey: process.env.OPENAI_API_KEY || 'your-openai-api-key',
model: 'text-embedding-3-small'
});
const vectorDatabase = new MilvusVectorDatabase({
address: process.env.MILVUS_ADDRESS || 'localhost:19530',
token: process.env.MILVUS_TOKEN || ''
});
const indexer = new CodeIndexer({ embedding, vectorDatabase });
const stats = await indexer.indexCodebase('./your-project', (progress) => {
console.log(`${progress.phase} - ${progress.percentage}%`);
});
console.log(`Indexed ${stats.indexedFiles} files, ${stats.totalChunks} chunks`);
const results = await indexer.semanticSearch('./your-project', 'vector database operations', 5);
results.forEach(result => {
console.log(`File: ${result.relativePath}:${result.startLine}-${result.endLine}`);
console.log(`Score: ${(result.score * 100).toFixed(2)}%`);
console.log(`Content: ${result.content.substring(0, 100)}...`);
});
The following packages extend the functionality of @code-indexer/core. Each package includes comprehensive documentation and usage examples.
This is a Model Context Protocol server that allows AI assistants and agents to communicate with CodeIndexer. It exposes indexing and search capabilities as standard MCP tools.
Cursor – Add the following to ~/.cursor/mcp.json (global) or your project-specific .cursor/mcp.json:
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["-y", "@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
Claude Desktop – Add this to the configuration file:
{
"mcpServers": {
"code-indexer": {
"command": "npx",
"args": ["@code-indexer/mcp@latest"],
"env": {
"OPENAI_API_KEY": "your-openai-api-key",
"MILVUS_ADDRESS": "localhost:19530"
}
}
}
}
Other tools (Claude Code, Windsurf, VS Code, Cherry Studio, etc.) – Follow a similar configuration pattern. The universal execution command is:
npx @code-indexer/mcp@latest
You can install the extension directly from the VS Code Marketplace. Search for “Semantic Code Search” in the Extensions view (Ctrl+Shift+X or Cmd+Shift+X) and select Install.
git clone https://github.com/zilliztech/CodeIndexer.git
cd CodeIndexer
pnpm install
pnpm build
pnpm dev
pnpm build # Build all packages
pnpm build:core # Build core engine only
pnpm build:vscode # Build VS Code extension only
pnpm build:mcp # Build MCP server only
cd examples/basic-usage
pnpm dev
Languages: .ts, .tsx, .js, .jsx, .py, .java, .cpp, .c, .h, .hpp, .cs, .go, .rs, .php, .rb, .swift, .kt, .scala, .m, .mm
Documentation: .md, .markdown
CodeIndexer automatically excludes the following directories and files:
node_modules/**, dist/**, build/**, .git/**, .vscode/**, .idea/**, *.log, *.min.js, *.map
OpenThoughts-Agent: Train Small AI Models with HPC Scale
ETF Grid Trading Strategy Design Tool: Smart Parameters & Risk Control
BananaFace: Open Source AI Stylist for Consistent Character Design
Qwen3-ASR-Toolkit: Transcribe Long Audio Files Beyond the 3-Minute Limit
AIPy: Execute Python via Natural Language Directly in Your Terminal
Zen Browser: about:config Tweaks, 1Password Setup, and Customization Guide
Agentic-Trading: Multi-Agent Simulator with A2A Protocol and ADK
mRemoteNG Setup: Manage RDP, SSH, and VNC in One Tabbed Console
Notes MCP Guide: Connect Apple Notes to Claude, Cursor, and LLMs
Cuby Text: Open-Source Block-Based Knowledge Management
SmartPDF: Summarize PDFs with Llama 3.3
Liebao VPN: Download, Install & Use on Android & iOS