Microsoft has open-sourced NLWeb, a project designed to simplify the integration of conversational interfaces into websites. By combining natural language interaction with structured web data—leveraging Schema.org and similar formats—NLWeb establishes a straightforward protocol. This allows both human users and AI agents to query a website using plain language, while the backend returns results in JSON and Schema.org formats for clear interpretation by both machines and developers.
1. Interaction Protocol This defines the fundamental rules for querying a website via natural language. Responses utilize Schema.org to standardize data structures, ensuring that output remains easily readable for developers and recognizable to automated parsers.
2. Implementation Toolset The project provides ready-to-use components for handling structured content such as product catalogs, recipes, or travel listings. It also includes UI widgets, allowing developers to deploy a conversational endpoint on a site in minutes rather than days.
An NLWeb instance functions as a Model Context Protocol (MCP) server, exposing a core ask method to handle natural language queries. In this ecosystem, think of MCP as the transport protocol (analogous to HTTP) and NLWeb as the data format (analogous to HTML). While one defines the communication rules, the other carries the specific content.
Clone the Repository
git clone https://github.com/microsoft/NLWeb
cd NLWeb
Create and Activate a Virtual Environment
python -m venv myenv
# Linux/macOS:
source myenv/bin/activate
# Windows:
myenv\Scripts\activate
Install Dependencies
cd code
pip install -r requirements.txt
Configure Environment Variables
Copy the environment template:
cp .env.template .env
Edit the .env file to include your LLM API key (e.g., Azure OpenAI).
Next, adjust the configuration files as needed:
config_llm.yaml: Select your LLM provider and model (defaults to Azure OpenAI 4.1 series).config_embedding.yaml: Define the embedding model (defaults to Azure OpenAI text-embedding-3-small).config_retrieval.yaml: Set this to qdrant_local for a local vector database.Load Test Data
Use the tools.db_load utility to import RSS feeds. For example:
python -m tools.db_load https://feeds.libsyn.com/121695/rss Behind-the-Tech
python -m tools.db_load https://feeds.megaphone.fm/recodedecode Decoder
Launch the Service
python app-file.py
Navigate to http://localhost:8000 to test the conversational endpoint. You can also explore various UI examples at http://localhost:8000/static/.
Prompt Tools: Open-Source Desktop App to Stop Losing Your Best AI Prompts
AI Presentation Generator: An Open-Source Gamma Alternative for Slide Decks
LiveMCPBench: Benchmark AI Agents on Real-World MCP Tool Tasks
Fuck-U-Code: A Brutally Honest Code Quality Analyzer
Smart-Admin Setup Guide: Environment, Backend, Frontend, and Mobile Deployment
Checkmate: Open-Source Server Monitoring with Uptime Alerts
LeRobot: Train Real-World Robots with Hugging Face's PyTorch Library
Chatterbox TTS API: Open Source Text-to-Speech for Developers
ConEmu: A Highly Customizable Windows Terminal with Tabs and Split Panes
Ditto Clipboard Manager: Never Lose Your Copied Text Again
sherpa-onnx: Offline Speech Recognition, TTS, and VAD Without the Cloud
PyVideoTrans: Open-Source Video Translation & Dubbing Tool