Zotero PDF2zh: Translate Academic PDFs Directly Within Zotero

5月21日 Published inPowerPoint Tools

Zotero PDF2zh is an open-source plugin designed to translate academic PDFs while maintaining their original formatting. By integrating directly into Zotero, it provides a streamlined workflow for researchers, eliminating the need for external software or the tedious process of copy-pasting text into web translators.

Environment Setup

If you prefer not to use Docker, start by creating a Python virtual environment. While this is optional, it ensures a clean installation and prevents dependency conflicts.

conda create -n zotero-pdf2zh python=3.12
conda activate zotero-pdf2zh

Install the required dependencies. Note that the plugin is designed for pdf2zh version 1.9.6; using this specific version is necessary for full compatibility.

python -m pip install pdf2zh==1.9.6 flask pypdf
python -m pip install pdfminer.six==20250416

Starting the Service

Option 1: Command Line

Download the server script directly:

wget https://github.com/guaguastandup/zotero-pdf2zh/raw/refs/heads/main/server.py

Launch the service on a port of your choice. This example uses port 8888:

python server.py 8888

If you choose a different port, remember to update the plugin settings within Zotero to match.

Option 2: Docker

Build the Docker image using the following command:

docker build --build-arg ZOTERO_PDF2ZH_FROM_IMAGE=byaidu/pdf2zh:1.9.6 --build-arg ZOTERO_PDF2ZH_SERVER_FILE_DOWNLOAD_URL=https://github.com/guaguastandup/zotero-pdf2zh/blob/main/server.py -t zotero-pdf2zh .

Run the container:

docker run zotero-pdf2zh

Option 3: Docker Compose

If you have a docker-compose.yaml file prepared, use these commands:

docker compose build
docker compose up -d

Configuration and Parameters

Create a config.json file in the same directory as server.py. This file defines your translation engine preferences and font paths.

Example configuration:

{
    "USE_MODELSCOPE": "0",
    "PDF2ZH_LANG_FROM": "English",
    "PDF2ZH_LANG_TO": "Simplified Chinese",
    "NOTO_FONT_PATH": "./LXGWWenKai-Regular.ttf",
    "translators": [
        {
            "name": "deepseek",
            "envs": {
                "DEEPSEEK_API_KEY": "sk-xxxxxxx",
                "DEEPSEEK_MODEL": "deepseek-chat"
            }
        },
        {
            "name": "zhipu",
            "envs": {
                "ZHIPU_API_KEY": "xxxxxx",
                "ZHIPU_MODEL": "glm-4-flash"
            }
        }
    ]
}

Fonts: LXGW WenKai is highly recommended for readability. WeChat Reading AI Kai is another excellent alternative. If you are using Docker, remember to mount the font file by adding the path to your run command: - ./zotero-pdf2zh/LXGWWenKai-Regular.ttf:/app/LXGWWenKai-Regular.ttf.

Engines: The plugin supports DeepSeek, Zhipu AI, and several other providers. Standard services like Bing and Google Translate are available out of the box and do not require an API key.

Key Features

PDF Translation

Right-click any PDF entry or attachment in Zotero and select PDF2zh: Translate PDF. Depending on your configuration, the plugin will generate one of two output types:

  • Mono: A clean, Chinese-only version. Select "Generate single-column mono file" for an optimized reading experience on mobile devices.
  • Dual: A bilingual version. This keeps the original structure intact, providing either side-by-side columns or a stacked comparison.

Cropping and Reformating

Two-column to single-column conversion: Select PDF2zh: Crop PDF. This produces a new file with -cut added to the filename, making papers significantly easier to read on small screens.

Bilingual layout options:

  • Dual-column compare: Right-click a dual file and select PDF2zh: Bilingual Compare (Two Columns). This organizes the text into distinct left and right columns.
  • Single-column compare: Select PDF2zh: Bilingual Compare (Single Column) to place English and Chinese paragraphs side-by-side within a single column. The resulting file will end with single-compare.

Performance Tips

Skip reference pages: You can reduce token consumption by enabling "Skip last pages" in the Zotero preferences, which prevents the AI from translating bibliographies.

Increase thread count: If your hardware allows, increasing the thread count to 20 or higher will significantly speed up batch translation tasks.

Off-peak discounts: Users of DeepSeek can take advantage of substantial traffic discounts (often up to 50%) between 00:30 and 08:00 Beijing time. Consider scheduling large translation jobs during these hours.

Output Examples

Original Text:

The memory demand of virtual machines (VMs) is increasing, while traditional DRAM-only memory systems have limited capacity and high power consumption.

Translated Result:

虚拟机(VM)的内存需求不断增加,而传统的纯DRAM内存系统容量有限且功耗较高。

Formatting Options:

  • Mono file: Provides a pure Chinese version that reads like a native document.
  • Dual file: Displays English on the left and Chinese on the right, preserving the original page layout.
  • Single-column compare: Aligns English and Chinese paragraphs side-by-side, which is ideal for close reading on wide monitors.

Zotero PDF2zh removes the friction from academic reading. By simply highlighting a paper and selecting a menu item, you can access a translated version within your existing reference manager.