AI Podcast Transcriber Turns Audio Into Clean Text and Smart Summaries

9月4日 Published inAudio Tools

AI Podcast Transcriber converts spoken audio into clean, readable text. It accepts episodes from Apple Podcasts, Xiaoyuzhou FM, standard RSS feeds, or direct audio URLs. The system downloads the audio, transcribes it locally using OpenAI’s Faster-Whisper model, and then refines the output. The AI tightens rambling passages, preserves the speaker’s original voice, and removes filler words such as "um," "uh," and verbal false starts. If you require a summary in a language other than the source audio, the tool uses GPT-4o for conditional translation. The final output consists of structured notes ready for research, content creation, accessibility needs, or internal knowledge bases.

The core engine operates entirely on your machine. Because Faster-Whisper handles speech recognition locally, no audio data ever leaves your hardware. While the first run requires downloading a 75 MB model file, subsequent sessions are significantly faster. You can choose between a plain transcript or a transcript paired with a summary. When the summary language differs from the detected source, GPT-4o handles the translation.

The interface is built with a mobile-first approach. HTML, Tailwind CSS, and vanilla JavaScript keep the front end lightweight, allowing it to run on phones, tablets, or desktops without a dedicated client.

The architecture prioritizes local processing, calling external APIs only when necessary for advanced summarization.

  • Front end: Tailwind handles the styling. Native JavaScript manages the UI, including link input, language selection, and one-click downloads.
  • Back end: Node.js and Express serve the application. The server parses podcast links, triggers Python scripts for audio processing, and manages temporary file cleanup.
  • AI and Local Processing: Python scripts load Faster-Whisper to run transcriptions locally, ensuring privacy. OpenAI's GPT models handle the optional text refinement and translation.
  • Supporting Tools: Ffmpeg is used for audio manipulation. A Python virtual environment isolates dependencies to ensure Faster-Whisper runs without version conflicts.

Getting Started

This setup requires basic familiarity with the terminal.

1. Prerequisites

  • Node.js 16 or later
  • Python 3.8 or later
  • ffmpeg (brew install ffmpeg on macOS, apt install ffmpeg on Linux)
  • An OpenAI API key (required for summaries and translation)

2. Deployment Steps

# Clone the repo
git clone https://github.com/wendy7756/podcast-transcriber
cd podcast-transcriber

# Install Node dependencies
npm install

# Create and activate a Python virtual environment (in the project root)
python3 -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

# Install Python packages
pip install --upgrade pip
pip install faster-whisper

# Set up environment variables
cp .env.example .env
# Edit .env: add your OPENAI_API_KEY
# Optional: change PORT or WHISPER_MODEL

# Start the app
npm start
# For development with auto‑restart: npm run dev

Open http://localhost:3000 and paste a podcast link to begin.

3. Key Environment Variables

  • OPENAI_API_KEY: Required for summarization and translation.
  • USE_LOCAL_WHISPER: Defaults to true. Set to false only if using an external service (not recommended for local privacy).
  • WHISPER_MODEL: base (75 MB) is the default. small (400 MB) offers higher accuracy but requires more resources.
  • PORT: Defaults to 3000.

Troubleshooting

Most technical issues involve the Python virtual environment or missing dependencies.

1. "/bin/sh: .../venv/bin/python: No such file or directory" This indicates the virtual environment was not created or activated correctly. Rebuild it using these commands:

rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install faster-whisper
which python   # This should return a path inside your project's venv folder

2. Transcription hangs or fails Check the following:

  • Confirm the terminal prompt shows the (venv) prefix.
  • Ensure faster-whisper is installed in the active environment (pip list | grep faster-whisper).
  • Verify you have at least 4 GB of available RAM.
  • Confirm ffmpeg is installed (which ffmpeg).

3. Slow initial transcription Faster-Whisper must download the model file (75 MB for the base model) during its first execution. Subsequent transcriptions will start immediately.

4. 500 Internal Server Error This typically points to a local environment configuration issue rather than a network error.

  • Confirm the podcast link is valid and accessible in a browser.
  • Recreate the virtual environment as shown in step 1.
  • If the problem persists, review the terminal logs for specific error messages and open an issue on GitHub.