ShareGPT-4o-Image is a high-quality dataset derived from GPT-4o’s image generation outputs. It consists of 92,000 samples designed to help open-source multimodal models approach GPT-4o’s level of image generation. (Note: This dataset matches the visual quality of GPT-4o’s outputs rather than replicating the full underlying model architecture.)
The dataset is divided into two primary categories:
Janus-4o is a multimodal large language model capable of both text-to-image and text-and-image-to-image generation. It is built upon the Janus-Pro architecture and undergoes fine-tuning using the ShareGPT-4o-Image dataset.
Following the fine-tuning process, Janus-4o demonstrates measurable improvements in image quality and introduces the ability to process combined text and image inputs. However, its overall performance remains slightly below that of the native GPT-4o image generation.
1. Setup Clone the official Janus repository and install the necessary dependencies:
git clone https://github.com/deepseek-ai/Janus.git
cd Janus
pip install -e .
2. Run Inference
Text-to-Image Generation Load the model and processor, then define a generation function with parameters such as temperature and parallel size. Execute the function with your desired prompt and output path.
# Loading model and processor (partial code shown)
prompt = "A stunning princess from Kabul in red and white traditional clothing, blue eyes, brown hair"
image_output_path = "./test.png"
text_to_image_generate(prompt, image_output_path, vl_chat_processor, vl_gpt, parallel_size=2)
Text-and-Image-to-Image Generation After loading the model and processor, use the generation function by passing the prompt, the source image path, and the destination output path.
prompt = "Turn the image into a nighttime scene."
input_image_path = "./test_input.png"
image_output_path = "./test_output.png"
text_and_image_to_image_generate(prompt, input_image_path, image_output_path, vl_chat_processor, vl_gpt, parallel_size=2)
A Gradio-based web interface is also available for easier testing:
pip install -e .[gradio]
python demo/app_janus4o.py
To reproduce the Janus-4o results, use the provided training script. This process initiates with Janus-Pro and fine-tunes the model on the ShareGPT-4o-Image dataset for both generation tasks.
accelerate launch --config_file configs/sft.yaml \
--num_processes 8 \
--num_machines 1 \
--machine_rank 0 \
--deepspeed_multinode_launcher standard train_janus.py \
--model_path deepseek-ai/Janus-Pro-7B \
--data_path [FreedomIntelligence/ShareGPT-4o-Image] \
--n_epochs 3 \
--train_bsz_per_gpu 1 \
--learning_rate 5e-6 \
--gradient_accumulation_steps 8
AgentFlow: Modular AI Agent Framework Outperforms GPT-4o
XunLong Review: AI Content Engine That Writes Reports, Fiction & Decks
Qwen3-ASR-Studio: Real-Time Voice Recognition with PiP Mode
SafeLine WAF Installation: System Requirements & Setup Guide
ChatGPT-on-WeChat Setup Guide: Run GPT-4o, Claude & More on WeChat
Cline AI Coding Assistant for VS Code: Powered by Claude Sonnet
PandaWiki Setup Guide: Building an AI-Powered Knowledge Base
Agentic-Trading: Multi-Agent Simulator with A2A Protocol and ADK
Transformers Library: Installation, Pipeline API, and Model Examples
OCode: Native AI Coding Assistant for Your Terminal (Ollama)
Lapce: A Fast, Rust-Powered Code Editor with Remote Development
Shendeng VPN: Unlimited Bandwidth, Smart Routing & VIP Membership (¥28/Month)