Tiny Qwen is a streamlined PyTorch implementation of the Qwen3 and Qwen2.5-VL models. It supports text processing, visual understanding—incorporating the image-referencing capabilities of Qwen2.5-VL—and both dense and Mixture-of-Experts (MoE) architectures. By stripping away the overhead found in standard Hugging Face libraries, the code is significantly more readable and easier to modify. It features an interactive chat interface and straightforward Python examples designed for quick integration.
Tiny Qwen also includes pretraining and instruction-tuning scripts specifically for the Qwen3V-4B-Preview projection layer, allowing you to replicate the model's training pipeline from scratch.
Quick tips for the chat interface:
/help to view available commands./exit or press Ctrl+C to close the session.To begin, select the Qwen3 model, then choose the Qwen3-4B-Instruct-2507 variant. Once the "Model loaded successfully" message appears, you can start the conversation. For instance, a simple "hello?" should return: "Hello! How can I assist you today?"
We recommend using uv to manage your virtual environment. Follow these steps for a clean installation:
Install uv and create the environment:pip install uv && uv venv
Activate the environment:
source .venv/bin/activate .venv\Scripts\activateInstall dependencies:uv pip install -r requirements.txt
Launch the chat interface:python run.py
Note on Multimodal Inputs:
While Qwen3 is text-only, Qwen2.5-VL supports images. To reference an image in the chat, use the @ symbol followed by the file path. For example:
@data/test-img-1.jpg tell me what you see in this image?
The system will confirm: ✓ Found image: data/test-img-1.jpg, then provide a description: The image shows a sunflower field with a close-up of a sunflower...
from PIL import Image
from model.model import Qwen2VL
from model.processor import Processor
model_name = "Qwen/Qwen2.5-VL-3B-Instruct"
model = Qwen2VL.from_pretrained(repo_id=model_name, device_map="auto")
processor = Processor(repo_id=model_name, vision_config=model.config.vision_config)
context = [
"<|im_start|>user\n<|vision_start|>",
Image.open("data/test-img-1.jpg"),
"<|vision_end|>What's on this image?<|im_end|>\n<|im_start|>assistant\n",
]
inputs = processor(context, device="cuda")
generator = model.generate(
input_ids=inputs["input_ids"],
pixels=inputs["pixels"],
d_image=inputs["d_image"],
max_new_tokens=64,
stream=True,
)
for token_id in generator:
token_text = processor.tokenizer.decode([token_id])
print(token_text, end="", flush=True)
print()
Follow these two steps to replicate the projection layer training for Qwen3V-4B-Preview.
Step 1: Pretraining (Using the LLaVA-595K dataset)
PYTHONPATH=. python train/s2_1_qwen3v_pretrain.py \
--devices 8 \
--batch_size 8 \
--epochs 1 \
--grad_accum 2 \
--max_seq_len 1024 \
--lr 5e-4 \
--weight_decay 0 \
--num_workers 4 \
--precision bf16-mixed \
--proj_out projection-pretrained.safetensors \
--cache_dir ./cache
Step 2: Instruction Tuning (Using the LLaVA-150K dataset)
PYTHONPATH=. python train/s2_2_qwen3v_instruct.py \
--devices 8 \
--batch_size 2 \
--epochs 3 \
--grad_accum 8 \
--max_seq_len 1024 \
--lr 5e-4 \
--weight_decay 0 \
--num_workers 4 \
--precision bf16-mixed \
--proj_out projection-instruct.safetensors \
--cache_dir ./cache \
--pretrained_proj projection-pretrained.safetensors \
--freeze_llm
Training Technicalities:
--devices flag to fit your specific setup.--freeze_llm flag ensures the base language model parameters remain unchanged.from model.model import Qwen3MoE
from model.processor import Processor
model_name = "Qwen/Qwen3-4B-Instruct-2507"
model = Qwen3MoE.from_pretrained(repo_id=model_name)
processor = Processor(repo_id=model_name)
context = [
"<|im_start|>user\n<|vision_start|>",
"<|vision_end|>Explain reverse linked list<|im_end|>\n<|im_start|>assistant\n",
]
inputs = processor(context, device="cuda")
generator = model.generate(
input_ids=inputs["input_ids"],
max_new_tokens=64,
stream=True
)
for token_id in generator:
token_text = processor.tokenizer.decode([token_id])
print(token_text, end="", flush=True)
print()
OpenThoughts-Agent: Train Small AI Models with HPC Scale
Open English Dictionary: 25,000+ LLM-Refined Word Entries for Deeper Chinese Understanding
Duck VPN Review: Stream Netflix & Unblock Social Apps Without Logs
Alger Music Player: Play Grayed-Out NetEase Songs with Desktop Lyrics
NPS Proxy: A Powerful Self-Hosted Tunnel to Expose Local Servers
SelfyAI: Build Your Own AI Agent as a Virtual World Asset
Gmail AutoAuth MCP Server: Control Gmail via Claude Desktop
Zotero PDF2zh: Translate Academic PDFs Directly Within Zotero
ONLYOFFICE Docs: A Powerful Online Collaborative Office Suite
MM-Wiki: A Lightweight Enterprise Wiki & Team Collaboration Tool
LiveTerm: A Next.js Terminal-Style Website Template
Dragon Ball RPG “Peak of Power” Review: Best Teams, Goku Skills, and F2P Guide