MCP cover image
See in Github
2025-01-27

MCP tool for Claude to download entire Windows working website url and assets and save to a library for AI use.

1

Github Watches

5

Github Forks

3

Github Stars

MCP Website Downloader

Simple MCP server for downloading documentation websites and preparing them for RAG indexing.

Features

  • Downloads complete documentation sites, well big chunks anyway.
  • Maintains link structure and navigation, not really. lol
  • Downloads and organizes assets (CSS, JS, images), but isn't really AI friendly and it all probably needs some kind of parsing or vectorizing into a db or something.
  • Creates clean index for RAG systems, currently seems to make an index in each folder, not even looked at it.
  • Simple single-purpose MCP interface, yup.

Installation

Fork and download, cd to the repository.

uv venv
./venv/Scripts/activate
pip install -e .

Put this in your claude_desktop_config.json with your own paths:

   "mcp-windows-website-downloader": {
     "command": "uv",
     "args": [
       "--directory",
       "F:/GithubRepos/mcp-windows-website-downloader",
       "run",
       "mcp-windows-website-downloader",
       "--library",
       "F:/GithubRepos/mcp-windows-website-downloader/website_library"
     ]
   },

alt text

Other Usage you don't need to worry about and may be hallucinatory lol:

  1. Start the server:
python -m mcp_windows_website_downloader.server --library docs_library
  1. Use through Claude Desktop or other MCP clients:
result = await server.call_tool("download", {
    "url": "https://docs.example.com"
})

Output Structure

docs_library/
  domain_name/
    index.html
    about.html
    docs/
      getting-started.html
      ...
    assets/
      css/
      js/
      images/
      fonts/
    rag_index.json

Development

The server follows standard MCP architecture:

src/
  mcp_windows_website_downloader/
    __init__.py
    server.py    # MCP server implementation
    core.py      # Core downloader functionality
    utils.py     # Helper utilities

Components

  • server.py: Main MCP server implementation that handles tool registration and requests
  • core.py: Core website downloading functionality with proper asset handling
  • utils.py: Helper utilities for file handling and URL processing

Design Principles

  1. Single Responsibility

    • Each module has one clear purpose
    • Server handles MCP interface
    • Core handles downloading
    • Utils handles common operations
  2. Clean Structure

    • Maintains original site structure
    • Organizes assets by type
    • Creates clear index for RAG systems
  3. Robust Operation

    • Proper error handling
    • Reasonable depth limits
    • Asset download verification
    • Clean URL/path processing

RAG Index

The rag_index.json file contains:

{
  "url": "https://docs.example.com",
  "domain": "docs.example.com", 
  "pages": 42,
  "path": "/path/to/site"
}

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

MIT License - See LICENSE file

Error Handling

The server handles common issues:

  • Invalid URLs
  • Network errors
  • Asset download failures
  • Malformed HTML
  • Deep recursion
  • File system errors

Error responses follow the format:

{
  "status": "error",
  "error": "Detailed error message"
}

Success responses:

{
  "status": "success",
  "path": "/path/to/downloaded/site",
  "pages": 42
}

相关推荐

  • NiKole Maxwell
  • I craft unique cereal names, stories, and ridiculously cute Cereal Baby images.

  • Bora Yalcin
  • Evaluator for marketplace product descriptions, checks for relevancy and keyword stuffing.

  • Andris Teikmanis
  • Latvian GPT assistant for developing GPT applications

  • https://jgadvisorycpa.com
  • This GPT assists in finding a top-rated business CPA - local or virtual. We account for their qualifications, experience, testimonials and reviews. Business operators provide a short description of your business, services wanted, and city or state.

  • https://suefel.com
  • Latest advice and best practices for custom GPT development.

  • Yusuf Emre Yeşilyurt
  • I find academic articles and books for research and literature reviews.

  • https://maiplestudio.com
  • Find Exhibitors, Speakers and more

  • Carlos Ferrin
  • Encuentra películas y series en plataformas de streaming.

  • Contraband Interactive
  • Emulating Dr. Jordan B. Peterson's style in providing life advice and insights.

  • Joshua Armstrong
  • Confidential guide on numerology and astrology, based of GG33 Public information

  • Jan Meindl
  • Builds new GPTs

  • rustassistant.com
  • Your go-to expert in the Rust ecosystem, specializing in precise code interpretation, up-to-date crate version checking, and in-depth source code analysis. I offer accurate, context-aware insights for all your Rust programming questions.

  • apappascs
  • Discover the most comprehensive and up-to-date collection of MCP servers in the market. This repository serves as a centralized hub, offering an extensive catalog of open-source and proprietary MCP servers, complete with features, documentation links, and contributors.

  • ShrimpingIt
  • Micropython I2C-based manipulation of the MCP series GPIO expander, derived from Adafruit_MCP230xx

  • OffchainLabs
  • Go implementation of Ethereum proof of stake

  • modelcontextprotocol
  • Model Context Protocol Servers

  • Mintplex-Labs
  • The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

  • huahuayu
  • A unified API gateway for integrating multiple etherscan-like blockchain explorer APIs with Model Context Protocol (MCP) support for AI assistants.

    Reviews

    4 (1)
    Avatar
    user_rsBNsKAF
    2025-04-16

    NetworksDB-MCP is an exceptional tool for anyone needing robust network database solutions. Its seamless integration and user-friendly interface make network management a breeze. Created by MorDavid, it offers top-notch performance and reliability. I highly recommend it to professionals seeking efficiency in their network operations!