DeepSeek-OCR WebUI provides a streamlined, responsive interface for the DeepSeek-OCR model, optimized for high-speed text and table recognition. The tool returns extracted tables directly in Markdown format. It also features an interactive HTML overlay that shows exactly where text was detected on your image, complete with color-coded bounding boxes and real-time coordinates.
Users can upload images in batches and control recognition tasks using natural language prompts. You can choose from built-in presets, write custom instructions, or toggle between English and Chinese settings. The application can be hosted locally or deployed within a Docker container.
Batch Processing
Upload multiple images simultaneously for single-pass processing. The tool handles the entire queue automatically, so there is no need to process files one by one.
Preset Prompts
Common tasks such as general OCR, Markdown conversion, and table extraction are accessible via one-click presets, significantly reducing manual input.
Bilingual UI
The interface supports both English and Chinese, allowing you to switch languages instantly based on your preference.
Docker Support
Deploy the environment with a single command using Docker. This is ideal for server-side deployments or for those who want a quick, isolated setup for testing.
Live Progress Tracking
The UI provides real-time feedback on the processing status. A visual progress bar indicates exactly how far along the task is and how much time remains.
Table Rendering
Extracted tables are not just raw text; they are rendered as formatted Markdown tables directly within the output panel, making them ready for immediate copying or editing.
Visual Bounding Boxes
An integrated HTML annotation system highlights every detected text region. Each box is assigned a unique color, and coordinates are displayed in real time, allowing you to pinpoint the exact location of text within the original image.
Multiple Output Formats
Export your results as Markdown, HTML, or annotated images. You can select a single format or download all three versions at once.
Responsive Design
The layout is fully responsive, automatically adjusting for optimal use on desktops, tablets, or smartphones.
Upload Images
Click the "Upload Images" area and select one or more image files from your device.
Set a Prompt
Select a preset button or type a custom instruction to tell the model what to do (e.g., "Convert to markdown").
Start Recognition
Click "Start Recognition" to begin processing the queue.
Review Results
The right-hand panel displays the output and a summary. Click "View full result" to see the complete extracted text. The total character count is displayed next to the results link.
Obtain the DeepSeek-OCR model from one of the following repositories:
Prepare the Environment
Create and activate a conda environment, then install the necessary dependencies:
conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr
pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation
Configure the Model Path
Open start_ocr_webui.py and update line 26 to point to your local model directory:
# Original line
self.model_path = '/path/to/your/DeepSeek-OCR'
# Example edit
self.model_path = 'D:/models/DeepSeek-OCR'
Launch the Application
Run the following command within your activated environment:
python start_ocr_webui.py
Access the interface by opening your browser to http://localhost:7860.
General OCRFree OCR.
Performs standard plain text extraction without specific formatting.
Markdown Conversion<|grounding|>Convert the document to markdown.
Converts the document into clean Markdown for easy editing and formatting.
Table Extraction<|grounding|>Extract all tables and convert to markdown format.
Identifies all tables within the image and renders them as Markdown-formatted tables, ready for use in documentation.
MuMuAINovel: Write Novels With AI, Minus the Clutter
Claude Code Hub AI API Proxy for Teams Deploy in Minutes
One API Setup Guide: Manage LLM Keys and Access 100+ AI Models
Windows-Use: Enabling LLMs to Control the Windows GUI Without Vision Models
UTCP Explained: A Universal Tool Calling Protocol for APIs, LLMs, and Beyond
Scira: The Minimalist AI Search Engine for Grok, Claude, and Beyond
MaskSearch: Training LLMs for Expert-Level Search Capabilities
Nping: A High-Performance Concurrent Ping Tool in Rust with Live Charts
ACE-Step: 15x Faster Open-Source Music Generation Model
MM-Wiki: A Lightweight Enterprise Wiki & Team Collaboration Tool
Lapce: A Fast, Rust-Powered Code Editor with Remote Development
XMIF VPN Free Trial & $0.70/Month Plan – No Logs, 4K Speed