Vision-Language-Action Models

LLM Training

Video Foundation Models

Image Tools

Dictionaries & Lexicons

Cryptocurrency Tools

Watermark Removal Tools

OCR Tools

Voice Interaction Models

AI Service Tools

ToolBoost >> Video Tools >> Qwen3-ASR-Toolkit: Transcribe Long Audio Files Beyond the 3-Minute Limit

Qwen3-ASR-Toolkit: Transcribe Long Audio Files Beyond the 3-Minute Limit

9月18日 Published inVideo Tools

Qwen3-ASR-Toolkit is a Python-based command-line utility designed to extend the capabilities of the Qwen-ASR API. By employing intelligent Voice Activity Detection (VAD), the tool segments long audio or video files into chunks shorter than three minutes. These segments are then processed concurrently using multiple threads, allowing users to bypass the official API’s duration limits and transcribe hours of content in a fraction of the time.

The toolkit supports nearly any audio or video format through its FFmpeg integration. It handles technical requirements automatically, such as resampling audio to 16kHz mono to ensure compatibility with the API. With a straightforward command-line interface, users only need to provide a DashScope API Key to access its full range of features.

Key Features

Bypass the 3-minute limit – Process files of any duration without interruption.
VAD-based segmentation – Intelligent splitting at natural pauses ensures that sentences remain intact.
Concurrent processing – Multi-threaded uploads and processing significantly reduce total wait times.
Automated post-processing – Identifies and removes common transcription "hallucinations" and repetitive phrases automatically.
Automatic resampling – Converts any sample rate or channel configuration to the required 16kHz mono format.
Extensive format support – Compatible with mp4, mov, mkv, mp3, wav, m4a, and various other media types.
User-friendly interface – Initiate complex transcription tasks with a single command.

How It Works

Media Input – The tool reads a local file or fetches data from a remote URL.
VAD Analysis – It scans the audio to identify silent intervals.
Intelligent Splitting – The file is cut at silent points to ensure every segment is under the three-minute threshold.
Parallel API Calls – A thread pool manages multiple simultaneous requests to process all segments at once.
Data Aggregation – The tool collects, sequences, and cleans the individual transcriptions.
Output Generation – The final consolidated transcript is displayed in the console and saved as a text file.

Installation

Prerequisites

Python 3.8 or higher
FFmpeg (for media processing)
- Ubuntu/Debian: sudo apt update && sudo apt install ffmpeg
- macOS: brew install ffmpeg
- Windows: Download the binary and add it to your system PATH
DashScope API Key (available via Alibaba Cloud)

Setting your API Key as an environment variable is recommended:

# Linux/macOS
export DASHSCOPE_API_KEY="your_api_key_here"

# Windows (PowerShell)
$env:DASHSCOPE_API_KEY="your_api_key_here"

Install

Option 1: Via PyPI (recommended)

pip install qwen3-asr-toolkit

Option 2: From source

git clone https://github.com/QwenLM/Qwen3-ASR-Toolkit.git
cd Qwen3-ASR-Toolkit
pip install .

Usage

The basic command syntax is as follows:

qwen3-asr -i <input_file_or_url> [-key <api_key>] [-j <num_threads>] [-c <context>] [-t <tmp_dir>] [-s]

Parameters

Parameter	Short	Description	Required
`--input-file`	`-i`	Path to a local file or a remote URL	Yes
`--context`	`-c`	Provide context/keywords to improve recognition of specific terms	No
`--dashscope-api-key`	`-key`	DashScope API key	No (if env variable is set)
`--num-threads`	`-j`	Number of concurrent threads (default: 4)	No
`--tmp-dir`	`-t`	Directory for temporary files (default: `~/qwen3-asr-cache`)	No
`--silence`	`-s`	Silent mode – suppresses progress information	No

Examples

1. Transcribe a local file

qwen3-asr -i "/path/to/my/long_lecture.mp4"

2. Transcribe a remote audio file

qwen3-asr -i "https://somewebsite.com/audios/podcast_episode.mp3"

3. Increase concurrency and provide an API key manually

qwen3-asr -i "/path/to/my/podcast.wav" -j 8 -key "your_api_key_here"

4. Improve accuracy with context hints

qwen3-asr -i "/path/to/my/tech_talk.mp4" -c "Qwen-ASR, DashScope, FFmpeg, VAD"

5. Execute in silent mode

qwen3-asr -i "/path/to/my/meeting_recording.m4a" -s

▶ Visit

Related Tools

ClipSketch AI: Frame-Accurate Video Tagging & AI Storyboard Generation

Wan2.2-Animate: Local Setup Guide for Image-to-Video and Character Consistency

Qwen3-ASR-Toolkit: Transcribe Long Audio Files Beyond the 3-Minute Limit

AI-FFmpeg-CLI: Turn Plain English into FFmpeg Commands

OpenCut: Free, Open-Source Video Editor (No Watermark, No Subscription)

HunyuanVideo-Avatar: Emotion-Controlled Multi-Person Video Generation

Extract Hardcoded Video Subtitles to SRT Files (No API)

PyVideoTrans: Open-Source Video Translation & Dubbing Tool

ERPNext Open Source ERP: Installation Guide for Accounting and Inventory

Immich Setup Guide: How to Self-Host Your Own Google Photos Alternative

Mantis: A Smarter Vision-Language-Action Model for Robots

OpenThoughts-Agent: Train Small AI Models with HPC Scale

ClipSketch AI: Frame-Accurate Video Tagging & AI Storyboard Generation

Tencent HunyuanVideo-1.5: 8.3B Video Model Runs on 14GB GPUs

HiChunk Review: Smarter Chunking for RAG Pipelines

Build Agent Kurama: A Private Local Research Assistant with LangChain & Ollama

GRAG: Continuous Image Editing Control for DiT Models

AI Multi-Agent Stock Trading System: GPT-5 and Claude 4.5 Sonnet

Wan2.2-Animate: Local Setup Guide for Image-to-Video and Character Consistency

ReCode: Recursive Code Generation for LLM Agents

Data Processing Tools

Mobile Applications

Image Tools

Code Editing & Preview

Large Language Models (LLMs)

Tiny Qwen: A Clean PyTorch Implementation of Qwen3 and Qwen2.5-VL

Smart-Admin Setup Guide: Environment, Backend, Frontend, and Mobile Deployment

LVCHA VPN Review: A Permanently Free VPN with No Ads and Fast Speeds

Halo Docker Compose Deployment Guide – Requirements & Setup

Claude Code Chat UI: Run Claude Code on Windows Without WSL

Emojied: Convert Any URL into a Single Emoji Short Link

Turn Google Gemini CLI Into a Standard API Proxy for Any OpenAI Client

mRemoteNG Setup: Manage RDP, SSH, and VNC in One Tabbed Console

Fooocus: Free Offline SDXL Image Generator & Installation Guide

Ventoy USB Tool: Boot Multiple ISOs Without Reformatting

Anyi VPN Review: Free 365-Day Trial with No Data Caps or Ads

IOPaint: Free Open-Source Image Inpainting and Object Removal