Microsoft’s NLWeb: Converting Any Website into a Conversational Interface

5月20日 Published inDeveloper Tools

Microsoft has open-sourced NLWeb, a project designed to simplify the integration of conversational interfaces into websites. By combining natural language interaction with structured web data—leveraging Schema.org and similar formats—NLWeb establishes a straightforward protocol. This allows both human users and AI agents to query a website using plain language, while the backend returns results in JSON and Schema.org formats for clear interpretation by both machines and developers.

Two Core Components

1. Interaction Protocol This defines the fundamental rules for querying a website via natural language. Responses utilize Schema.org to standardize data structures, ensuring that output remains easily readable for developers and recognizable to automated parsers.

2. Implementation Toolset The project provides ready-to-use components for handling structured content such as product catalogs, recipes, or travel listings. It also includes UI widgets, allowing developers to deploy a conversational endpoint on a site in minutes rather than days.

Cross-Platform Support

  • Operating Systems: Fully tested on Windows, macOS, and Linux.
  • Vector Stores: Compatible with Qdrant, Snowflake, Milvus, Azure AI Search, and other leading providers.
  • LLMs: Supports OpenAI, DeepSeek, Gemini, Anthropic, and most major language models.

Integrating NLWeb with MCP

An NLWeb instance functions as a Model Context Protocol (MCP) server, exposing a core ask method to handle natural language queries. In this ecosystem, think of MCP as the transport protocol (analogous to HTTP) and NLWeb as the data format (analogous to HTML). While one defines the communication rules, the other carries the specific content.

Repository Structure

  • Core Service: Manages the primary query logic and is designed for easy extension and customization.
  • Connectors: Provides plug-and-play modules for various LLMs and vector databases.
  • Data Tools: Includes utilities to import Schema.org JSONL files, RSS feeds, and other data types into a vector database.
  • Frontend Service: Features a lightweight web server and a basic query UI. For production environments, this can be replaced with a custom UI that embeds the API directly.

Documentation & Support

  • Quickstart: Guides for local deployment and cloud-based setups (such as Azure).
  • Technical Documentation: Covers REST API specifications, the query lifecycle, prompt tuning, and UI customization.
  • License: Released under the MIT License, allowing for free modification and commercial use.

Local Deployment Guide (Python 3.10+)

  1. Clone the Repository

    git clone https://github.com/microsoft/NLWeb
    cd NLWeb
    
  2. Create and Activate a Virtual Environment

    python -m venv myenv
    # Linux/macOS:
    source myenv/bin/activate
    # Windows:
    myenv\Scripts\activate
    
  3. Install Dependencies

    cd code
    pip install -r requirements.txt
    
  4. Configure Environment Variables Copy the environment template: cp .env.template .env Edit the .env file to include your LLM API key (e.g., Azure OpenAI). Next, adjust the configuration files as needed:

    • config_llm.yaml: Select your LLM provider and model (defaults to Azure OpenAI 4.1 series).
    • config_embedding.yaml: Define the embedding model (defaults to Azure OpenAI text-embedding-3-small).
    • config_retrieval.yaml: Set this to qdrant_local for a local vector database.
  5. Load Test Data Use the tools.db_load utility to import RSS feeds. For example: python -m tools.db_load https://feeds.libsyn.com/121695/rss Behind-the-Tech python -m tools.db_load https://feeds.megaphone.fm/recodedecode Decoder

  6. Launch the Service python app-file.py Navigate to http://localhost:8000 to test the conversational endpoint. You can also explore various UI examples at http://localhost:8000/static/.