Turn eBooks & PDFs into Audio with Abogen – Fast TTS Tool

5月9日 Published inVoice & Speech Tools

Abogen converts ePub, PDF, and plain text files into high-quality audio in seconds. The tool also generates synchronized subtitles automatically, making it an ideal choice for creating audiobooks or voiceovers for platforms like Instagram, YouTube, and TikTok. Powered by the Kokoro-82M model, Abogen produces speech that sounds both natural and fluid.

Abogen Features

  1. Broad File Support – Works with ePub, PDF, and TXT formats.
  2. Voice Mixer – Blend multiple voice models to create a custom profile. You can adjust the weight of each voice and save your settings for future use.
  3. Chapter Management – Select specific chapters when processing ePub files, or choose specific chapters and pages for PDFs. Abogen automatically inserts markers such as <<CHAPTER_MARKER:Chapter Title>>. These markers can also be added manually to text files, allowing you to split audio by chapter or reprocess a single section if an error occurs.
  4. Additional Utilities – Options include replacing single line breaks with spaces, setting a maximum word count per subtitle line, and creating desktop shortcuts. Users can also quickly access configuration and temporary folders, clear temporary files, and enable automatic updates on startup.

How to Install Abogen

  1. Windows – Start by downloading the latest .msi file from the espeak-ng releases page and running the installer. If you are using an NVIDIA GPU, run the command: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128. Next, install Abogen by running pip install abogen. Alternatively, download the repository, unzip the files, and run WINDOWS_INSTALL.bat. This script handles all dependencies, including CUDA, within a dedicated environment, though espeak-ng must still be installed manually.

  2. Mac – Open the terminal and run brew install espeak-ng, followed by pip install abogen. Note that this installation method has not yet undergone extensive testing.

  3. Linux – Install espeak-ng using your distribution's package manager: sudo apt install espeak-ng (Ubuntu/Debian), sudo pacman -S espeak-ng (Arch), or sudo dnf install espeak-ng (Fedora). Then, run pip install abogen. If you encounter a “No matching distribution found” error, ensure you are using a supported Python version (3.10 to 3.12). You can use pyenv to manage multiple Python versions.

  4. Docker – Download and unzip the repository or clone it using Git. Navigate to the abogen directory containing the Dockerfile. Open a terminal and build the image with the command: docker build --progress plain -t abogen .. Once the build is complete, launch the container using the command specific to your OS:

    • Windows: docker run --name abogen -v %cd%:/shared -p 5800:5800 -p 5900:5900 --gpus all abogen
    • Linux: docker run --name abogen -v $(pwd):/shared -p 5800:5800 -p 5900:5900 --gpus all abogen
    • MacOS: docker run --name abogen -v $(pwd):/shared -p 5800:5800 -p 5900:5900 abogen

    You can access the Abogen interface at http://localhost:5800 via your web browser or connect a VNC client to localhost:5900. Use the /shared directory to transfer files between the host machine and the container. Future sessions can be managed with docker start abogen and docker stop abogen. Note: Inside the Docker container, the audio preview feature is currently unavailable due to ALSA errors, and the options to open temporary or configuration directories will not function.

How to Use Abogen

Launch Abogen and drag your ePub, PDF, or text file into the target area, or click to browse your files. You can also input text directly into the built-in editor. Configure your preferences using the following settings:

  1. Speed – Adjust playback from 0.1x to 2.0x.
  2. Voice Selection – The first letter indicates the language (e.g., “a” for American English, “b” for British English), and the second letter indicates the gender (“m” for male, “f” for female). Use the voice mixer to create custom blends.
  3. Subtitle Style – Choose from options like Disabled, Sentence, Sentence + Comma, or specific word counts (1 word, 2 words, etc.) to determine the length of each subtitle line. Currently, subtitle generation is only supported for English, as Kokoro only provides timestamps for English text. Support for other languages can be requested through the Kokoro project.
  4. Output Format – Select from .WAV, .FLAC, .MP3, or .M4B (which supports chapters).
  5. Save Location – Choose to save the output next to the source file, on the desktop, or in a specific folder.

Click “Start” to begin the conversion. Once the process is finished, you can open the file, navigate to the output folder, or start a new project.

Supported Languages

Abogen supports American English (code “a”), British English (“b”), Spanish (“e”), French (“f”), Hindi (“h”), Italian (“i”), Japanese (requires misaki[ja]), Brazilian Portuguese (“p”), and Chinese (requires misaki[zh]).

MPV Configuration

For the best experience, we recommend using the MPV player to view generated audio. MPV can display subtitles even when no video track is present. Below is a sample mpv.conf:

save-position-on-quit
keep-open=yes
--audio-device=openal
--sub-margin-x=235
--sub-pos=60
# --- Audio quality ---
audio-spdif=ac3,dts,eac3,truehd,dts-hd
audio-channels=auto
audio-samplerate=48000
volume-max=100

Common Issues & Fixes

If Abogen fails to launch or operate correctly, run abogen-cli from your command line to start the application in terminal mode. This will provide detailed error logs. If the issue persists, please open a new report on the project’s Issues page, pasting the error log and a description of the problem.