Tiny Qwen is a streamlined PyTorch implementation of the Qwen3 and Qwen2.5-VL models. It supports text processing, visual understanding—incorporating the image-referencing capabilities of Qwen2.5-VL—and both dense and Mixture-of-Experts (MoE) architectures. By stripping away the overhead found in standard Hugging Face libraries, the code is significantly more readable and easier to modify. It features an interactive chat interface and straightforward Python examples designed for quick integration.
Tiny Qwen also includes pretraining and instruction-tuning scripts specifically for the Qwen3V-4B-Preview projection layer, allowing you to replicate the model's training pipeline from scratch.
Quick tips for the chat interface:
/help to view available commands./exit or press Ctrl+C to close the session.To begin, select the Qwen3 model, then choose the Qwen3-4B-Instruct-2507 variant. Once the "Model loaded successfully" message appears, you can start the conversation. For instance, a simple "hello?" should return: "Hello! How can I assist you today?"
We recommend using uv to manage your virtual environment. Follow these steps for a clean installation:
Install uv and create the environment:pip install uv && uv venv
Activate the environment:
source .venv/bin/activate .venv\Scripts\activateInstall dependencies:uv pip install -r requirements.txt
Launch the chat interface:python run.py
Note on Multimodal Inputs:
While Qwen3 is text-only, Qwen2.5-VL supports images. To reference an image in the chat, use the @ symbol followed by the file path. For example:
@data/test-img-1.jpg tell me what you see in this image?
The system will confirm: ✓ Found image: data/test-img-1.jpg, then provide a description: The image shows a sunflower field with a close-up of a sunflower...
from PIL import Image
from model.model import Qwen2VL
from model.processor import Processor
model_name = "Qwen/Qwen2.5-VL-3B-Instruct"
model = Qwen2VL.from_pretrained(repo_id=model_name, device_map="auto")
processor = Processor(repo_id=model_name, vision_config=model.config.vision_config)
context = [
"<|im_start|>user\n<|vision_start|>",
Image.open("data/test-img-1.jpg"),
"<|vision_end|>What's on this image?<|im_end|>\n<|im_start|>assistant\n",
]
inputs = processor(context, device="cuda")
generator = model.generate(
input_ids=inputs["input_ids"],
pixels=inputs["pixels"],
d_image=inputs["d_image"],
max_new_tokens=64,
stream=True,
)
for token_id in generator:
token_text = processor.tokenizer.decode([token_id])
print(token_text, end="", flush=True)
print()
Follow these two steps to replicate the projection layer training for Qwen3V-4B-Preview.
Step 1: Pretraining (Using the LLaVA-595K dataset)
PYTHONPATH=. python train/s2_1_qwen3v_pretrain.py \
--devices 8 \
--batch_size 8 \
--epochs 1 \
--grad_accum 2 \
--max_seq_len 1024 \
--lr 5e-4 \
--weight_decay 0 \
--num_workers 4 \
--precision bf16-mixed \
--proj_out projection-pretrained.safetensors \
--cache_dir ./cache
Step 2: Instruction Tuning (Using the LLaVA-150K dataset)
PYTHONPATH=. python train/s2_2_qwen3v_instruct.py \
--devices 8 \
--batch_size 2 \
--epochs 3 \
--grad_accum 8 \
--max_seq_len 1024 \
--lr 5e-4 \
--weight_decay 0 \
--num_workers 4 \
--precision bf16-mixed \
--proj_out projection-instruct.safetensors \
--cache_dir ./cache \
--pretrained_proj projection-pretrained.safetensors \
--freeze_llm
Training Technicalities:
--devices flag to fit your specific setup.--freeze_llm flag ensures the base language model parameters remain unchanged.from model.model import Qwen3MoE
from model.processor import Processor
model_name = "Qwen/Qwen3-4B-Instruct-2507"
model = Qwen3MoE.from_pretrained(repo_id=model_name)
processor = Processor(repo_id=model_name)
context = [
"<|im_start|>user\n<|vision_start|>",
"<|vision_end|>Explain reverse linked list<|im_end|>\n<|im_start|>assistant\n",
]
inputs = processor(context, device="cuda")
generator = model.generate(
input_ids=inputs["input_ids"],
max_new_tokens=64,
stream=True
)
for token_id in generator:
token_text = processor.tokenizer.decode([token_id])
print(token_text, end="", flush=True)
print()
Earth Copilot: Query Geospatial Data Using Natural Language
Open Computer Use: AI Agents with Hands-On Desktop Control
Dayflow Mac App Review: Turn Screen Time Into an AI Timeline
Semlib: Build LLM Pipelines With Map, Filter, and Sort in Python
HackGPT Enterprise Review: AI-Native Pentesting for Security Teams
Alger Music Player: Play Grayed-Out NetEase Songs with Desktop Lyrics
Turn Google Gemini CLI Into a Standard API Proxy for Any OpenAI Client
Zettlr Setup and Developer Guide (macOS, Windows, Linux)
GraphGen: Build Knowledge Graphs to Generate Smarter Training Data
Xiaomi MiMo-7B: Built From Scratch for Math and Code Reasoning
Add Area Fill to Line Charts in Excel: Step-by-Step
Cnchar: A Lightweight JavaScript Library for Pinyin, Stroke Order & Idioms