Fay is an open-source digital human framework that integrates large language models with interactive digital characters. It is available in three specialized versions—Retail, Assistant, and Agent—to suit different project requirements. Typical applications include virtual shopping guides, broadcast hosts, service staff, tutors, voice assistants, and mobile text-based helpers.
The framework allows developers to assemble digital humans or voice assistants without the complications of tightly coupled systems. Its architecture cleanly separates core concerns: audio input, speech recognition (ASR), sentiment analysis, NLP processing, emotional speech synthesis (TTS), audio output, and facial expression control. Each module operates independently.
Model Adapter Layer: Handles high-level integration with digital human models, including photorealistic 3D drivers and Live2D anime-style characters. It also facilitates low-level communication with mainstream large language models, including DeepSeek and other reasoning-focused LLMs.
Function Component Layer: Manages ASR, TTS, NLP, and expression control. These components are modular, allowing developers to swap individual elements to meet specific needs.
Terminal Adapter Layer: Provides standardized interfaces for connecting the framework to microcontrollers, mobile applications, websites, and large-scale displays without requiring a major codebase overhaul.
• Supports full offline operation for environments without network access.
• Enables continuous streaming interactions for low-latency voice conversations.
• Manages multi-user concurrency to handle high-load scenarios.
• Supports custom knowledge bases via a simple qa.csv file.
• Provides configurable wake words and interaction rules.
• Features an agent decision system built on the Model Context Protocol (MCP).
• Offers APIs for both text and voice interaction.
• Includes a dedicated control interface for digital human drivers.
• Provides an interface for automated broadcast tasks.
• Supports real photo driving via the "xuniren" method.
• Compatible with Live2D character models.
• Integrates with Unreal Engine 5 (UE5) and Unity 3D models.
License: MIT. Open for commercial use.
Environment: Lightweight footprint. Built on Python 3.12. Dependencies are managed via a standard requirements.txt.
Startup Options:
• Source Code: Execute main.py as the primary controller.
• Docker: GPU-accelerated images are available for containerized deployment.
Remote Communication: Integrated Ngrok support facilitates cross-device messaging, allowing smartphones, PCs, and wearables to connect to the core framework.
| Directory/File | Description |
|---|---|
| Core Modules | |
ai_module |
Contains AI algorithms and LLM integration logic |
core |
The framework engine; manages interaction flow and state |
genagents |
Manages agents using React-based decision logic |
simulation_engine |
Controls digital human motion and facial expressions |
| Functional Components | |
asr |
Handles speech-to-text recognition |
tts |
Manages speech synthesis; supports voices like Azure's Xiaoxiao |
gui |
Graphical configuration panel for persona settings (name, role, wake word) |
| Tools and Configs | |
utils |
General utilities, including config_util.py for parsing settings |
config.json |
Stores core parameters like model paths and API endpoints |
system.conf.bak |
Backup file for system configurations |
| Deployment | |
requirements.txt |
Lists Python dependencies, such as PyAudio |
fay_booter.py |
Boot script for silent background startup |
main.py |
The main controller entry point |
Download and install Python 3.12 from [python.org](www.python.org/downloads/release/python-3120), selecting the version appropriate for your system.
Install the Visual Studio Build Tools from [Microsoft](learn.microsoft.com/zh-cn/visualstudio/releases/2022/release-notes). During the setup process, select the "Desktop development with C++" workload to install the necessary MSVC compiler.
Clone the repository.
git clone https://github.com/xszyou/Fay.git
cd Fay
Install the required dependencies.
pip install -r requirements.txt
On Ubuntu, install the build tools first: sudo apt install build-essential portaudio19-dev.
Configure the system.
• Copy the system.conf.bak file and rename it to system.conf.
• Update the following key settings in the configuration file:
"llm_model_path": "path/to/your/local/LLM", // Required for offline model usage
"asr_provider": "local ASR service address", // Can be swapped for Alibaba, Tencent, etc.
"tts_voice": "Xiaoxiao" // Select any supported Azure voice
Launch the application.
python main.py
To run in background mode on Linux or macOS, use: nohup python main.py &
1. Photorealistic 3D Avatars
Utilize the [fay-ue5](github.com/xszyou/fay-ue5) integration for Unreal Engine 5 to generate and drive lifelike models created from a single photograph.
2. Anime-Style Characters
Load Live2D models and manage their expressions using facial capture technology. Detailed integration documentation is available here.
Sign up for an Ngrok account and obtain your authentication token.
Initialize the tunnel.
ngrok http 6000
This command maps local port 6000 to a public URL.
Configure your mobile app or smart device to connect to the framework using the generated Ngrok address.
The GUI allows for the adjustment of various identity settings:
Identity Basics: Set the name (e.g., "Faye"), assign a role (e.g., Assistant), and select a gender.
Interaction Logic:
• Wake Word: Set a trigger phrase like "Hello" (supports prefix wake mode).
• Sensitivity: Adjust the responsiveness of the assistant to user input.
Media Settings:
• Voice: Select a preferred voice, such as Azure's "Xiaoxiao."
• Auto-broadcast URL: Set to http://127.0.0.1:6000 for local testing purposes.
By configuring these components, developers can deploy a voice-interactive digital human system capable of listening, speaking, and operating across multiple devices. This framework is well-suited for intelligent customer service, virtual training, brand marketing, or any application where a human face and voice enhance the user experience.
Index-TTS-LoRA: Fine-Tuning Voice Models for Natural Speech Synthesis
Dianman VPN: Free Trial, Unlimited Data & Zero Throttling
openAgent: Open Source Enterprise AI Platform With RAG and Agent Workflows
SpikingBrain: 100x Faster LLM Inference via Spike Sparsity
SE-Agent: Self-Evolving AI Agent Tops SWE-bench Verified
AIPy: Execute Python via Natural Language Directly in Your Terminal
Claude Code Chat UI: Run Claude Code on Windows Without WSL
NotebookLlama: An Open-Source NotebookLM Alternative with AI Voice
ThinkChain: Stream Claude's Reasoning with Local Tools and MCP
AppFlowy: Open-Source Notion Alternative With Local Data Control
ChatTTS: A Text-to-Speech Model Optimized for Dialogue
How to Create a 3D Grouped Bar Chart in Origin2024 | Step-by-Step Guide