DeepSeek-OCR WebUI: Batch OCR with Markdown Tables and Visual Bounding Boxes

11月1日 Published inText Tools

DeepSeek-OCR WebUI provides a streamlined, responsive interface for the DeepSeek-OCR model, optimized for high-speed text and table recognition. The tool returns extracted tables directly in Markdown format. It also features an interactive HTML overlay that shows exactly where text was detected on your image, complete with color-coded bounding boxes and real-time coordinates.

Users can upload images in batches and control recognition tasks using natural language prompts. You can choose from built-in presets, write custom instructions, or toggle between English and Chinese settings. The application can be hosted locally or deployed within a Docker container.

Batch Processing
Upload multiple images simultaneously for single-pass processing. The tool handles the entire queue automatically, so there is no need to process files one by one.

Preset Prompts
Common tasks such as general OCR, Markdown conversion, and table extraction are accessible via one-click presets, significantly reducing manual input.

Bilingual UI
The interface supports both English and Chinese, allowing you to switch languages instantly based on your preference.

Docker Support
Deploy the environment with a single command using Docker. This is ideal for server-side deployments or for those who want a quick, isolated setup for testing.

Live Progress Tracking
The UI provides real-time feedback on the processing status. A visual progress bar indicates exactly how far along the task is and how much time remains.

Table Rendering
Extracted tables are not just raw text; they are rendered as formatted Markdown tables directly within the output panel, making them ready for immediate copying or editing.

Visual Bounding Boxes
An integrated HTML annotation system highlights every detected text region. Each box is assigned a unique color, and coordinates are displayed in real time, allowing you to pinpoint the exact location of text within the original image.

Multiple Output Formats
Export your results as Markdown, HTML, or annotated images. You can select a single format or download all three versions at once.

Responsive Design
The layout is fully responsive, automatically adjusting for optimal use on desktops, tablets, or smartphones.

How to Use DeepSeek-OCR WebUI

Upload Images
Click the "Upload Images" area and select one or more image files from your device.
Set a Prompt
Select a preset button or type a custom instruction to tell the model what to do (e.g., "Convert to markdown").
Start Recognition
Click "Start Recognition" to begin processing the queue.
Review Results
The right-hand panel displays the output and a summary. Click "View full result" to see the complete extracted text. The total character count is displayed next to the results link.

Model Download and Setup

1. Download the Model

Obtain the DeepSeek-OCR model from one of the following repositories:

2. Local Installation

Prepare the Environment
Create and activate a conda environment, then install the necessary dependencies:

conda create -n deepseek-ocr python=3.12.9 -y
conda activate deepseek-ocr
pip install -r requirements.txt
pip install flash-attn==2.7.3 --no-build-isolation

Configure the Model Path
Open start_ocr_webui.py and update line 26 to point to your local model directory:

# Original line
self.model_path = '/path/to/your/DeepSeek-OCR'

# Example edit
self.model_path = 'D:/models/DeepSeek-OCR'

Launch the Application
Run the following command within your activated environment:

python start_ocr_webui.py

Access the interface by opening your browser to http://localhost:7860.

System Requirements

Python: Version 3.12 or newer.
Hardware: A CUDA-capable GPU with at least 16 GB of VRAM (an RTX 4080 or higher is recommended).
Software: PyTorch installed with CUDA support.

Useful Prompt Examples

General OCR
Free OCR.
Performs standard plain text extraction without specific formatting.

Markdown Conversion
<|grounding|>Convert the document to markdown.
Converts the document into clean Markdown for easy editing and formatting.

Table Extraction
<|grounding|>Extract all tables and convert to markdown format.
Identifies all tables within the image and renders them as Markdown-formatted tables, ready for use in documentation.

▶ Visit

Related Tools

DeepSeek-OCR WebUI: Batch OCR with Markdown Tables and Visual Bounding Boxes

Paperless GPT: Smarter OCR and Auto-Tagging for Paperless-NGX

Mantis: A Smarter Vision-Language-Action Model for Robots

MiMo-Audio: 100M-Hour Pretrained Model for Few-Shot Speech Tasks