AI Podcast Transcriber converts spoken audio into clean, readable text. It accepts episodes from Apple Podcasts, Xiaoyuzhou FM, standard RSS feeds, or direct audio URLs. The system downloads the audio, transcribes it locally using OpenAI’s Faster-Whisper model, and then refines the output. The AI tightens rambling passages, preserves the speaker’s original voice, and removes filler words such as "um," "uh," and verbal false starts. If you require a summary in a language other than the source audio, the tool uses GPT-4o for conditional translation. The final output consists of structured notes ready for research, content creation, accessibility needs, or internal knowledge bases.
The core engine operates entirely on your machine. Because Faster-Whisper handles speech recognition locally, no audio data ever leaves your hardware. While the first run requires downloading a 75 MB model file, subsequent sessions are significantly faster. You can choose between a plain transcript or a transcript paired with a summary. When the summary language differs from the detected source, GPT-4o handles the translation.
The interface is built with a mobile-first approach. HTML, Tailwind CSS, and vanilla JavaScript keep the front end lightweight, allowing it to run on phones, tablets, or desktops without a dedicated client.
The architecture prioritizes local processing, calling external APIs only when necessary for advanced summarization.
Getting Started
This setup requires basic familiarity with the terminal.
1. Prerequisites
brew install ffmpeg on macOS, apt install ffmpeg on Linux)2. Deployment Steps
# Clone the repo
git clone https://github.com/wendy7756/podcast-transcriber
cd podcast-transcriber
# Install Node dependencies
npm install
# Create and activate a Python virtual environment (in the project root)
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install Python packages
pip install --upgrade pip
pip install faster-whisper
# Set up environment variables
cp .env.example .env
# Edit .env: add your OPENAI_API_KEY
# Optional: change PORT or WHISPER_MODEL
# Start the app
npm start
# For development with auto‑restart: npm run dev
Open http://localhost:3000 and paste a podcast link to begin.
3. Key Environment Variables
OPENAI_API_KEY: Required for summarization and translation.USE_LOCAL_WHISPER: Defaults to true. Set to false only if using an external service (not recommended for local privacy).WHISPER_MODEL: base (75 MB) is the default. small (400 MB) offers higher accuracy but requires more resources.PORT: Defaults to 3000.Troubleshooting
Most technical issues involve the Python virtual environment or missing dependencies.
1. "/bin/sh: .../venv/bin/python: No such file or directory" This indicates the virtual environment was not created or activated correctly. Rebuild it using these commands:
rm -rf venv
python3 -m venv venv
source venv/bin/activate
pip install faster-whisper
which python # This should return a path inside your project's venv folder
2. Transcription hangs or fails Check the following:
(venv) prefix.faster-whisper is installed in the active environment (pip list | grep faster-whisper).which ffmpeg).3. Slow initial transcription
Faster-Whisper must download the model file (75 MB for the base model) during its first execution. Subsequent transcriptions will start immediately.
4. 500 Internal Server Error This typically points to a local environment configuration issue rather than a network error.
Gemini Conversation Timeline: Jump to Any Message Instantly
TradingAgents-MCP: A 15-Agent AI Framework for Real-Time Stock Analysis
VibeVoice: Long-Form Multi-Speaker TTS for Natural Dialogue Generation
Machine Learning for Beginners: A Free 26-Lesson Curriculum
n8n-MCP: Give Claude Access to 525+ n8n Nodes in Minutes
LeRobot: Train Real-World Robots with Hugging Face's PyTorch Library
ERPNext Open Source ERP: Installation Guide for Accounting and Inventory
MaskSearch: Training LLMs for Expert-Level Search Capabilities
Paperless GPT: Smarter OCR and Auto-Tagging for Paperless-NGX
BiliNote: Convert YouTube and Bilibili Videos Into Markdown Notes
Nping: A High-Performance Concurrent Ping Tool in Rust with Live Charts
Notes: An Open-Source C++ Markdown App with Kanban Support