Qwen3-ASR-Studio is a high-performance web application designed as a streamlined interface for Alibaba Cloud’s Qwen ASR model. Its primary objective is simple: to convert speech into text with minimal friction.
The application supports direct uploads of various audio formats—including WAV, MP3, FLAC, and M4A—as well as live microphone recording. During recording, a real-time waveform visualizer provides immediate feedback, while the underlying Qwen ASR model ensures rapid and accurate transcription.
To improve accuracy in specialized domains, users can add context hints such as specific names or technical terminology. The app automatically detects multiple languages, including Chinese, English, and Japanese. Furthermore, enabling Inverse Text Normalization (ITN) allows the system to convert spoken phrases like "January fifth" into concise written formats like "Jan 5."
For a more efficient workflow, users can hold the spacebar to record and release it to stop and initiate transcription. To reduce wait times on slower internet connections, audio files are compressed locally on the user's machine before being processed.
The most distinctive feature is Picture-in-Picture (PiP) mode. This creates a floating window that stays on top of other applications. When you speak, the transcribed text can be sent directly into any active text field, effectively serving as a global voice input method across your system.
Two distinct editing modes are available. Single-pass mode is designed for processing one audio file at a time to maintain a clean workspace. Notes mode aggregates multiple transcriptions into a single editable area, which is ideal for documenting long meetings or lectures.
Data security and persistence are handled through automatic saving. Transcripts, audio files, notes, and settings are stored within the browser’s IndexedDB rather than on a central server. This local-first approach protects user privacy and ensures that previously processed files do not need to be re-transcribed.
The History tab allows users to revisit previous transcriptions, while the Notes section keeps important results organized and separate from daily logs. A one-click option is available to clear all history when necessary.
Personalization options include a choice between light and dark themes, as well as an option to automatically copy results to the clipboard as soon as transcription finishes. The application remembers these preferences locally for a consistent experience across sessions.
Tech stack
Local deployment Requires Node.js v18 or higher. Use pnpm (recommended), npm, or yarn.
Clone the repo:
git clone https://github.com/yeahhe365/Qwen3-ASR-Studio.git
cd Qwen3-ASR-Studio
Install dependencies:
pnpm install (or npm install)
Start dev server:
pnpm dev (or npm run dev)
Open http://localhost:5173 in your browser.
OpenThoughts-Agent: Train Small AI Models with HPC Scale
Dianman VPN: Free Trial, Unlimited Data & Zero Throttling
YPrompt Review: Build Better AI Prompts With This Smart Tool
Semlib: Build LLM Pipelines With Map, Filter, and Sort in Python
Common Ground: Multi-Agent Collaboration That Actually Works
LetsMarkdown: Lightweight Collaborative Markdown Editor Powered by Rust
UTCP Explained: A Universal Tool Calling Protocol for APIs, LLMs, and Beyond
Firecrawl API: Converting Any Website Into Clean Markdown for LLMs
syftr: Optimize Agent Workflows with Pareto Front Search
II-Agent Review: An Open-Source LLM Assistant Built for Autonomous Tasks
Deep Search Lighting: Lightweight Web Search for LLMs
SuperCoder: A Terminal-Based Coding Assistant for Searching, Editing, and Debugging