Skill Seeker: Convert Any Documentation Site Into Claude AI Skills

10月24日 Published inAutomation Tools

Skill Seeker is an automation tool designed to convert any documentation website into a production-ready Claude AI skill. Developers, game creators, and technical teams can rapidly build high-quality AI skills for any framework, API, or tool, reducing manual effort to nearly zero.

The tool scrapes documentation, categorizes the content, and performs local AI enhancement using Claude Code Max, allowing you to avoid external API fees. It extracts the most relevant code examples and core concepts before bundling the data into a .zip file ready for upload to Claude.

Skill Seeker is built to manage massive documentation sets, ranging from 10,000 to over 40,000 pages. It intelligently splits large projects into focused sub-skills and generates a central router hub to connect them. Integration with Claude Code is extensive, enabling you to manage the entire workflow using natural language. Additionally, progress checkpoints and a resume function ensure that long-running scraping jobs can recover from interruptions.

The system automatically detects programming languages, classifies content, and includes eight ready-to-use presets. Comprehensive customization options are available, alongside multiple upload paths—including automated API uploads and manual drag-and-drop.

1. Document Scraping

  • Universal Scraper: Compatible with any documentation website.
  • Smart Categorization: Automatically organizes content by topic.
  • Language Detection: Identifies Python, JavaScript, C++, GDScript, and other common languages.
  • Built-in Presets: Out-of-the-box support for Godot, React, Vue, Django, FastAPI, and more.

2. PDF Support (New in v1.2.0)

  • Extraction Engine: Pulls text, code, and images from PDF files.
  • OCR Integration: Extracts content from scanned, image-based documents.
  • Encryption Handling: Supports password-protected PDF files.
  • Data Capture: Specifically extracts complex tables from PDF sources.
  • Parallel Processing: Reduces processing time for large PDFs by up to 3x.
  • Smart Caching: Increases speed by 50% on subsequent runs.

3. AI and Content Enhancement

  • AI-Driven Refinement: Transforms basic templates into detailed technical guides.
  • Cost-Free Enhancement: Utilizes Claude Code Max locally to eliminate Anthropic API charges.
  • MCP Server Integration: Invoke Skill Seeker using natural language commands within Claude Code.

4. Performance and Scalability

  • High-Volume Support: Capable of processing documentation sets exceeding 40,000 pages.
  • Router/Hub Architecture: Requests are automatically routed to the most relevant specialized sub-skill.
  • Concurrent Scraping: Build multiple skills simultaneously.
  • Checkpoint System: Allows long scraping tasks to resume after a crash or manual stop.
  • Efficient Caching: Scrape once and rebuild the skill package instantly.

5. Quality Assurance

  • Comprehensive Testing: Features 142 tests with 100% passing rates.

Quick Setup

Option 1: Integration via Claude Code (Recommended)

# One-time setup (approx. 5 minutes)
./setup_mcp.sh

# Interact with Claude Code using natural language:
"Generate a React skill from https://react.dev/"
"Scrape docs/manual.pdf and create a skill"

Time: Automated | Quality: Production-ready | Cost: Free

Option 2: Direct CLI (HTML Documentation)

# Install dependencies
pip3 install requests beautifulsoup4

# Generate a React skill with a single command
python3 cli/doc_scraper.py --config configs/react.json --enhance-local

# Upload the resulting output/react.zip to Claude.

Time: ~25 minutes | Quality: Production-ready | Cost: Free

Option 3: CLI for PDF Documentation

# Install PDF support
pip3 install PyMuPDF

# Basic PDF extraction
python3 cli/pdf_scraper.py --pdf docs/manual.pdf --name myskill

# Advanced extraction options
python3 cli/pdf_scraper.py --pdf docs/manual.pdf --name myskill \
    --extract-tables \
    --parallel \
    --workers 8

# Scanned PDF with OCR (Requires: pip install pytesseract Pillow)
python3 cli/pdf_scraper.py --pdf docs/scanned.pdf --name myskill --ocr

# Password-protected PDF
python3 cli/pdf_scraper.py --pdf docs/encrypted.pdf --name myskill --password mypassword

# Upload the resulting output/myskill.zip to Claude.

Time: 5–15 minutes (2–5 minutes with parallel processing) | Quality: Production-ready | Cost: Free


How Skill Seeker Works

graph LR
    A[Documentation Site] --> B[Skill Seeker]
    B --> C[Scraper]
    B --> D[AI Enhancer]
    B --> E[Packager]
    C --> F[Categorized Reference Files]
    D --> F
    F --> E
    E --> G[Claude Skill .zip File]
    G --> H[Upload to Claude AI]
  1. Scrape: Extracts all page content from the target documentation.
  2. Categorize: Organizes content into topics such as APIs, guides, and tutorials.
  3. Enhance: The AI analyzes the data and generates a comprehensive SKILL.md with functional examples.
  4. Package: Compiles all resources into a .zip file formatted specifically for Claude.

Getting Started in Detail

Option 1: MCP Server (Best for Claude Code users)

Interact with Skill Seeker using natural language directly within Claude Code:

# Clone the repository
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers

# One-time setup (5 minutes)
./setup_mcp.sh

# Restart Claude Code and use commands such as:
"List all available configs"
"Generate a Tailwind config from https://tailwindcss.com/docs"
"Scrape with configs/react.json"
"Package the skill in output/react/"

Advantages:

  • Eliminates the need to memorize CLI commands.
  • Uses natural language interaction.
  • Integrates directly into existing developer workflows.
  • Provides nine immediate tools, including automated uploading.

Resources:

  • MCP Setup Guide
  • MCP Testing Guide (covers all nine tools)
  • Large Documentation Guide (handling 10k–40k+ pages)
  • Upload Guide (importing skills into Claude)

Option 2: CLI (Traditional workflow)

Initial Setup: Virtual Environment
git clone https://github.com/yusufkaraaslan/Skill_Seekers.git
cd Skill_Seekers

python3 -m venv venv
source venv/bin/activate  # For macOS/Linux
# For Windows: venv\Scripts\activate

pip install requests beautifulsoup4 pytest
pip freeze > requirements.txt

Ensure the virtual environment is active before running Skill Seeker: source venv/bin/activate

Using Presets
# Estimate the page count (takes 1–2 minutes)
python3 cli/estimate_pages.py configs/godot.json

# Use the Godot preset
python3 cli/doc_scraper.py --config configs/godot.json

# Use the React preset
python3 cli/doc_scraper.py --config configs/react.json

# List all available presets
ls configs/
Interactive Mode

python3 cli/doc_scraper.py --interactive

Quick Mode
python3 cli/doc_scraper.py \
  --name react \
  --url https://react.dev/ \
  --description "React framework for UI development"

Uploading Skills to Claude

There are three methods for importing your packaged skill into Claude:

Option 1: Automated Upload (API-based)

# Set your API key
export ANTHROPIC_API_KEY=sk-ant-...

# Package and automatically upload
python3 cli/package_skill.py output/react/ --upload

# Alternatively, upload an existing .zip file
python3 cli/upload_skill.py output/react.zip

Benefits: Fully automated and ideal for CLI-based workflows. Requires an Anthropic API key from https://console.anthropic.com/.

Option 2: Manual Upload (No API key required)

# Package the skill
python3 cli/package_skill.py output/react/

# This command:
# 1. Generates output/react.zip
# 2. Automatically opens the output folder
# 3. Displays upload instructions

# Manual steps:
# 1. Navigate to https://claude.ai/skills
# 2. Select "Upload Skill"
# 3. Choose output/react.zip

Benefits: Does not require an API key; accessible to all users.

Option 3: Claude Code (MCP) — Smart Upload

Within Claude Code, simply state: "Package and upload the React skill"

If an API key is detected, it handles the upload automatically. Otherwise, it packages the file and provides the directory path for manual upload.


Key Features

1. Page Estimation

python3 cli/estimate_pages.py configs/react.json

Use this to assess the scope of a project and validate URL patterns before committing to a full scrape.

2. Existing Data Detection

The tool identifies previously scraped data to prevent redundant work. You can choose to reuse existing data or start a fresh scrape using the --fresh flag.

3. Knowledge Generation

The system extracts code patterns, detects programming languages, and builds a reference guide with practical examples. The resulting SKILL.md includes language-annotated code blocks and common usage patterns.

4. Smart Categorization

Categories are determined by analyzing URL structures, page titles, and content keywords, backed by a confidence scoring system.

5. Code Language Detection

Automatically identifies Python, JavaScript, GDScript, C++, and others by scanning for specific syntax markers.

6. Rebuild Support

If the documentation has been scraped once, you can rebuild the skill package instantly using the --skip-scrape flag.

7. AI-Driven Enhancement

You can enhance your SKILL.md through four different methods:

  • During Scrape (API): Uses the Anthropic API (requires a key).
  • During Scrape (Local): Uses Claude Code Max locally (no API cost).
  • Post-Scrape (API): Enhances an existing output folder via API.
  • Post-Scrape (Local): Enhances an existing output folder locally.

Local enhancement typically takes 30–60 seconds and produces a high-quality guide on par with the API version.

8. Large Documentation Support (10k–40k+ Pages)

For massive libraries like Godot or AWS:

  1. Estimate: Use estimate_pages.py to gauge size.
  2. Split: Use split_config.py with the router strategy to divide the project.
  3. Parallel Scrape: Run multiple configurations simultaneously to save hours of processing time.
  4. Router Generation: Create a central hub that directs user queries to the correct sub-skill.

9. Checkpoint Resume

Long scraping tasks are protected against crashes. If interrupted, run the command with the --resume flag to continue from the last saved page.


Workflow Examples

Initial Build (Scrape + Local Enhance): python3 cli/doc_scraper.py --config configs/godot.json --enhance-local Builds a production-ready .zip with an AI-enriched guide in roughly 20–40 minutes.

Fast Rebuild (Using Cached Data): python3 cli/doc_scraper.py --config configs/godot.json --skip-scrape Generates a new package in under 4 minutes by skipping the scraping phase.


Available Presets

Config File Framework Category
godot.json Godot Engine Game Development
react.json React UI Development
vue.json Vue.js Web Development
django.json Django Python Web Framework
fastapi.json FastAPI API Development
ansible-core.json Ansible Core Automation

Creating Custom Configurations

Interactive Mode: Run python3 cli/doc_scraper.py --interactive and follow the prompts.

Manual Configuration: Copy an existing config (e.g., cp configs/react.json configs/myskill.json) and edit the base_url, selectors, and url_patterns to match your target documentation.


Output Structure

  • output/name_data/: Contains raw JSON data for every scraped page.
  • output/name/: Contains the final skill files, including SKILL.md and categorized references.

Troubleshooting and Pro Tips

  • Testing: Always run a small test by setting max_pages: 20 in your config before scraping a massive site.
  • Content Issues: If no text is extracted, verify your main_content CSS selector (e.g., article, main, or div.content).
  • Optimization: Use the --parallel flag for PDF processing to significantly reduce wait times.
  • Fresh Starts: If the data becomes corrupted or outdated, delete the _data folder or use the --fresh flag to start over.