How to Install and Use Vosk Offline Speech Recognition

6月13日 Published inVoice & Speech Tools

Vosk is an open-source, offline speech recognition toolkit designed for versatile deployment across Android, iOS, Raspberry Pi, and server environments. It offers broad compatibility, supporting development in Python, Java, C#, Node.js, and several other languages.

The toolkit supports more than 20 languages and dialects, including English, Hindi, German, French, Spanish, Portuguese, Chinese, and Russian. Despite their capabilities, the models are remarkably lightweight, typically around 50 MB each. Vosk excels at continuous, large-vocabulary transcription and audio streaming with near-zero latency. It also allows for on-the-fly vocabulary reconfiguration and supports speaker identification. Whether you are targeting resource-constrained devices like a Raspberry Pi or an Android phone, or scaling up to large server clusters, Vosk is built to handle the load.

Installing Vosk

On Android

To integrate Vosk into your Android project, add the following Maven dependency to your build configuration:

repositories {
    mavenCentral()
}

dependencies {
    implementation group: 'com.alphacephei', name: 'vosk-android', version: '0.3.32+'
}

If you prefer to build the library from source, use the following commands:

export ANDROID_SDK_HOME=...
cd vosk-api/android/lib
./build-vosk.sh
gradle build

On iOS

To obtain the iOS version, please send an email to [email protected] including your organization name and relevant project details.

Using Python

The most straightforward way to install Vosk is via pip. It is compatible with:

  • Linux x86_64
  • Raspbian for Raspberry Pi 3/4
  • Linux arm64
  • OSX (both x86 and M1 chips)
  • Windows (x86 and 64-bit)

Ensure your Python environment is between version 3.5 and 3.9, and that pip is version 20.3 or higher. Then, execute:

pip3 install vosk

For specific architectures such as riscv64, you can install directly from a wheel:

pip3 install https://github.com/alphacep/vosk-api/releases/download/v0.3.42/vosk-0.3.42-py3-none-linux_riscv64.whl

Server Deployment

Vosk offers both WebSocket and gRPC server options. The most efficient deployment method is through Docker:

docker run -d -p 2700:2700 alphacep/kaldi-en:latest

Building from Source

Building from source is a more involved process that requires a specific version of Kaldi. Follow these steps:

cd <KALDI_ROOT>
git clone -b v1 --single-branch --depth=1 https://github.com/alphacep/kaldi /opt/kaldi
cd kaldi/tools
make openfst cub
./extras/install_openblas_clapack.sh
cd ../src
./configure --mathlib=OPENBLAS_CLAPACK --shared

make -j 10 online2 lm rnnlm

cd ../..
git clone https://github.com/alphacep/vosk-api --depth=1

cd vosk-api/src
KALDI_ROOT=<KALDI_ROOT> make

Using Vosk: Code Examples

Python Example

You can transcribe files directly using the provided command-line utility:

vosk-transcriber -i test.mp4 -o test.txt
vosk-transcriber -i test.mp4 -t srt -o test.srt
vosk-transcriber -l fr -i test.m4a -t srt -o test.srt
vosk-transcriber --list-languages

Alternatively, you can run the Python demonstration script:

git clone https://github.com/alphacep/vosk-api
cd vosk-api/python/example
python3 ./test_simple.py test.wav

Java Example

For Maven projects, include the following dependencies:

repositories {
    mavenCentral()
}

dependencies {
    implementation group: 'net.java.dev.jna', name: 'jna', version: '5.7.0'
    implementation group: 'com.alphacephei', name: 'vosk', version: '0.3.31+'
}

C# Example

Installation is handled via NuGet:

dotnet add package Vosk

To run the demo:

git clone https://github.com/alphacep/vosk-api
cd vosk-api/csharp/demo
dotnet run

Node.js Example

Install the Vosk module using npm:

npm install vosk

Vosk provides a robust framework for implementing offline speech recognition across mobile, desktop, and server-side applications. Its streamlined installation and configuration process allow you to integrate high-quality voice transcription into your projects immediately.