Vosk is an open-source, offline speech recognition toolkit designed for versatile deployment across Android, iOS, Raspberry Pi, and server environments. It offers broad compatibility, supporting development in Python, Java, C#, Node.js, and several other languages.
The toolkit supports more than 20 languages and dialects, including English, Hindi, German, French, Spanish, Portuguese, Chinese, and Russian. Despite their capabilities, the models are remarkably lightweight, typically around 50 MB each. Vosk excels at continuous, large-vocabulary transcription and audio streaming with near-zero latency. It also allows for on-the-fly vocabulary reconfiguration and supports speaker identification. Whether you are targeting resource-constrained devices like a Raspberry Pi or an Android phone, or scaling up to large server clusters, Vosk is built to handle the load.
To integrate Vosk into your Android project, add the following Maven dependency to your build configuration:
repositories {
mavenCentral()
}
dependencies {
implementation group: 'com.alphacephei', name: 'vosk-android', version: '0.3.32+'
}
If you prefer to build the library from source, use the following commands:
export ANDROID_SDK_HOME=...
cd vosk-api/android/lib
./build-vosk.sh
gradle build
To obtain the iOS version, please send an email to [email protected] including your organization name and relevant project details.
The most straightforward way to install Vosk is via pip. It is compatible with:
Ensure your Python environment is between version 3.5 and 3.9, and that pip is version 20.3 or higher. Then, execute:
pip3 install vosk
For specific architectures such as riscv64, you can install directly from a wheel:
pip3 install https://github.com/alphacep/vosk-api/releases/download/v0.3.42/vosk-0.3.42-py3-none-linux_riscv64.whl
Vosk offers both WebSocket and gRPC server options. The most efficient deployment method is through Docker:
docker run -d -p 2700:2700 alphacep/kaldi-en:latest
Building from source is a more involved process that requires a specific version of Kaldi. Follow these steps:
cd <KALDI_ROOT>
git clone -b v1 --single-branch --depth=1 https://github.com/alphacep/kaldi /opt/kaldi
cd kaldi/tools
make openfst cub
./extras/install_openblas_clapack.sh
cd ../src
./configure --mathlib=OPENBLAS_CLAPACK --shared
make -j 10 online2 lm rnnlm
cd ../..
git clone https://github.com/alphacep/vosk-api --depth=1
cd vosk-api/src
KALDI_ROOT=<KALDI_ROOT> make
You can transcribe files directly using the provided command-line utility:
vosk-transcriber -i test.mp4 -o test.txt
vosk-transcriber -i test.mp4 -t srt -o test.srt
vosk-transcriber -l fr -i test.m4a -t srt -o test.srt
vosk-transcriber --list-languages
Alternatively, you can run the Python demonstration script:
git clone https://github.com/alphacep/vosk-api
cd vosk-api/python/example
python3 ./test_simple.py test.wav
For Maven projects, include the following dependencies:
repositories {
mavenCentral()
}
dependencies {
implementation group: 'net.java.dev.jna', name: 'jna', version: '5.7.0'
implementation group: 'com.alphacephei', name: 'vosk', version: '0.3.31+'
}
Installation is handled via NuGet:
dotnet add package Vosk
To run the demo:
git clone https://github.com/alphacep/vosk-api
cd vosk-api/csharp/demo
dotnet run
Install the Vosk module using npm:
npm install vosk
Vosk provides a robust framework for implementing offline speech recognition across mobile, desktop, and server-side applications. Its streamlined installation and configuration process allow you to integrate high-quality voice transcription into your projects immediately.
AgentFlow: Modular AI Agent Framework Outperforms GPT-4o
AhaSpeed VPN Review: High-Speed Performance, No Ads, and Unlimited Bandwidth
Windows-Use: Enabling LLMs to Control the Windows GUI Without Vision Models
OpenHands: The AI Agent That Writes Code and Executes Commands
Common Ground: Multi-Agent Collaboration That Actually Works
Claude Code for Windows: Run Natively Without WSL or Docker
Immich Setup Guide: How to Self-Host Your Own Google Photos Alternative
Memvid: Store Millions of Text Chunks in a Single MP4 File
MusicFree: A Modular Open-Source Music Player for Android and HarmonyOS
SmartPDF: Summarize PDFs with Llama 3.3
DeerFlow: Modular Multi-Agent Research With LangGraph and MCP
PyVideoTrans: Open-Source Video Translation & Dubbing Tool