Vosk is an open-source, offline speech recognition toolkit designed for versatile deployment across Android, iOS, Raspberry Pi, and server environments. It offers broad compatibility, supporting development in Python, Java, C#, Node.js, and several other languages.
The toolkit supports more than 20 languages and dialects, including English, Hindi, German, French, Spanish, Portuguese, Chinese, and Russian. Despite their capabilities, the models are remarkably lightweight, typically around 50 MB each. Vosk excels at continuous, large-vocabulary transcription and audio streaming with near-zero latency. It also allows for on-the-fly vocabulary reconfiguration and supports speaker identification. Whether you are targeting resource-constrained devices like a Raspberry Pi or an Android phone, or scaling up to large server clusters, Vosk is built to handle the load.
To integrate Vosk into your Android project, add the following Maven dependency to your build configuration:
repositories {
mavenCentral()
}
dependencies {
implementation group: 'com.alphacephei', name: 'vosk-android', version: '0.3.32+'
}
If you prefer to build the library from source, use the following commands:
export ANDROID_SDK_HOME=...
cd vosk-api/android/lib
./build-vosk.sh
gradle build
To obtain the iOS version, please send an email to [email protected] including your organization name and relevant project details.
The most straightforward way to install Vosk is via pip. It is compatible with:
Ensure your Python environment is between version 3.5 and 3.9, and that pip is version 20.3 or higher. Then, execute:
pip3 install vosk
For specific architectures such as riscv64, you can install directly from a wheel:
pip3 install https://github.com/alphacep/vosk-api/releases/download/v0.3.42/vosk-0.3.42-py3-none-linux_riscv64.whl
Vosk offers both WebSocket and gRPC server options. The most efficient deployment method is through Docker:
docker run -d -p 2700:2700 alphacep/kaldi-en:latest
Building from source is a more involved process that requires a specific version of Kaldi. Follow these steps:
cd <KALDI_ROOT>
git clone -b v1 --single-branch --depth=1 https://github.com/alphacep/kaldi /opt/kaldi
cd kaldi/tools
make openfst cub
./extras/install_openblas_clapack.sh
cd ../src
./configure --mathlib=OPENBLAS_CLAPACK --shared
make -j 10 online2 lm rnnlm
cd ../..
git clone https://github.com/alphacep/vosk-api --depth=1
cd vosk-api/src
KALDI_ROOT=<KALDI_ROOT> make
You can transcribe files directly using the provided command-line utility:
vosk-transcriber -i test.mp4 -o test.txt
vosk-transcriber -i test.mp4 -t srt -o test.srt
vosk-transcriber -l fr -i test.m4a -t srt -o test.srt
vosk-transcriber --list-languages
Alternatively, you can run the Python demonstration script:
git clone https://github.com/alphacep/vosk-api
cd vosk-api/python/example
python3 ./test_simple.py test.wav
For Maven projects, include the following dependencies:
repositories {
mavenCentral()
}
dependencies {
implementation group: 'net.java.dev.jna', name: 'jna', version: '5.7.0'
implementation group: 'com.alphacephei', name: 'vosk', version: '0.3.31+'
}
Installation is handled via NuGet:
dotnet add package Vosk
To run the demo:
git clone https://github.com/alphacep/vosk-api
cd vosk-api/csharp/demo
dotnet run
Install the Vosk module using npm:
npm install vosk
Vosk provides a robust framework for implementing offline speech recognition across mobile, desktop, and server-side applications. Its streamlined installation and configuration process allow you to integrate high-quality voice transcription into your projects immediately.
DupCheck: Open-Source Image Duplication & Tampering Detection (Python)
What to Eat: AI Recipes and Meal Planning You Can Self-Host
Twitter AI Monitor: Automated Tweet Summaries and Chinese Translation
Index-TTS-LoRA: Fine-Tuning Voice Models for Natural Speech Synthesis
Liebao VPN Free Trial: 4K Streaming & Easy Setup on Any Device
TradingAgents-MCP: A 15-Agent AI Framework for Real-Time Stock Analysis
mRemoteNG Setup: Manage RDP, SSH, and VNC in One Tabbed Console
MemoryOS: Equip AI Agents with Persistent Recall via a Memory Hierarchy
ConEmu: A Highly Customizable Windows Terminal with Tabs and Split Panes
Motionity: Free Online Animation Editor with Keyframes and Masks
Add Area Fill to Line Charts in Excel: Step-by-Step
Perspective: Interactive Data Visualization for the Browser and Python