Cover image
Try Now
2025-03-26

FastAPI server implementing MCP protocol Browser automation via browser-use library.

3 years

Works with Finder

1

Github Watches

9

Github Forks

31

Github Stars

MCP server w/ Browser Use

smithery badge

MCP server for browser-use.

Browser Use Server MCP server

Overview

This repository contains the server for the browser-use library, which provides a powerful browser automation system that enables AI agents to interact with web browsers through natural language. The server is built on Anthropic's Model Context Protocol (MCP) and provides a seamless integration with the browser-use library.

Features

  1. Browser Control
  • Automated browser interactions via natural language
  • Navigation, form filling, clicking, and scrolling capabilities
  • Tab management and screenshot functionality
  • Cookie and state management
  1. Agent System
  • Custom agent implementation in custom_agent.py
  • Vision-based element detection
  • Structured JSON responses for actions
  • Message history management and summarization
  1. Configuration
  • Environment-based configuration for API keys and settings
  • Chrome browser settings (debugging port, persistence)
  • Model provider selection and parameters

Dependencies

This project relies on the following Python packages:

Package Version Description
Pillow >=10.1.0 Python Imaging Library (PIL) fork that adds image processing capabilities to your Python interpreter.
browser-use ==0.1.19 A powerful browser automation system that enables AI agents to interact with web browsers through natural language. The core library that powers this project's browser automation capabilities.
fastapi >=0.115.6 Modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints. Used to create the server that exposes the agent's functionality.
fastmcp >=0.4.1 A framework that wraps FastAPI for building MCP (Model Context Protocol) servers.
instructor >=1.7.2 Library for structured output prompting and validation with OpenAI models. Enables extracting structured data from model responses.
langchain >=0.3.14 Framework for developing applications with large language models (LLMs). Provides tools for chaining together different language model components and interacting with various APIs and data sources.
langchain-google-genai >=2.1.1 LangChain integration for Google GenAI models, enabling the use of Google's generative AI capabilities within the LangChain framework.
langchain-openai >=0.2.14 LangChain integrations with OpenAI's models. Enables using OpenAI models (like GPT-4) within the LangChain framework. Used in this project for interacting with OpenAI's language and vision models.
langchain-ollama >=0.2.2 Langchain integration for Ollama, enabling local execution of LLMs.
openai >=1.59.5 Official Python client library for the OpenAI API. Used to interact directly with OpenAI's models (if needed, in addition to LangChain).
python-dotenv >=1.0.1 Reads key-value pairs from a .env file and sets them as environment variables. Simplifies local development and configuration management.
pydantic >=2.10.5 Data validation and settings management using Python type annotations. Provides runtime enforcement of types and automatic model creation. Essential for defining structured data models in the agent.
pyperclip >=1.9.0 Cross-platform Python module for copy and paste clipboard functions.
uvicorn >=0.22.0 ASGI web server implementation for Python. Used to serve the FastAPI application.

Components

Resources

The server implements a browser automation system with:

  • Integration with browser-use library for advanced browser control
  • Custom browser automation capabilities
  • Agent-based interaction system with vision capabilities
  • Persistent state management
  • Customizable model settings

Requirements

  • Operating Systems (Linux, macOS, Windows; we haven't tested for Docker or Microsoft WSL)
  • Python 3.11 or higher
  • uv (fast Python package installer)
  • Chrome/Chromium browser
  • Claude Desktop

Quick Start

Claude Desktop

On MacOS: ~/Library/Application\ Support/Claude/claude_desktop_config.json On Windows: %APPDATA%/Claude/claude_desktop_config.json

Installing via Smithery

To install Browser Use for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @JovaniPink/mcp-browser-use --client claude
Development Configuration
"mcpServers": {
  "mcp_server_browser_use": {
    "command": "uvx",
    "args": [
      "mcp-server-browser-use",
    ],
    "env": {
      "OPENAI_ENDPOINT": "https://api.openai.com/v1",
      "OPENAI_API_KEY": "",
      "ANTHROPIC_API_KEY": "",
      "GOOGLE_API_KEY": "",
      "AZURE_OPENAI_ENDPOINT": "",
      "AZURE_OPENAI_API_KEY": "",
      // "DEEPSEEK_ENDPOINT": "https://api.deepseek.com",
      // "DEEPSEEK_API_KEY": "",
      // Set to false to disable anonymized telemetry
      "ANONYMIZED_TELEMETRY": "false",
      // Chrome settings
      "CHROME_PATH": "",
      "CHROME_USER_DATA": "",
      "CHROME_DEBUGGING_PORT": "9222",
      "CHROME_DEBUGGING_HOST": "localhost",
      // Set to true to keep browser open between AI tasks
      "CHROME_PERSISTENT_SESSION": "false",
      // Model settings
      "MCP_MODEL_PROVIDER": "anthropic",
      "MCP_MODEL_NAME": "claude-3-5-sonnet-20241022",
      "MCP_TEMPERATURE": "0.3",
      "MCP_MAX_STEPS": "30",
      "MCP_USE_VISION": "true",
      "MCP_MAX_ACTIONS_PER_STEP": "5",
      "MCP_TOOL_CALL_IN_CONTENT": "true"
    }
  }
}

Environment Variables

Key environment variables:

# API Keys
ANTHROPIC_API_KEY=anthropic_key

# Chrome Configuration
# Optional: Path to Chrome executable
CHROME_PATH=/path/to/chrome
# Optional: Chrome user data directory
CHROME_USER_DATA=/path/to/user/data
# Default: 9222
CHROME_DEBUGGING_PORT=9222
# Default: localhost
CHROME_DEBUGGING_HOST=localhost
# Keep browser open between tasks
CHROME_PERSISTENT_SESSION=false

# Model Settings
# Options: anthropic, openai, azure, deepseek
MCP_MODEL_PROVIDER=anthropic
# Model name
MCP_MODEL_NAME=claude-3-5-sonnet-20241022
MCP_TEMPERATURE=0.3
MCP_MAX_STEPS=30
MCP_USE_VISION=true
MCP_MAX_ACTIONS_PER_STEP=5

Development

Setup

  1. Clone the repository:
git clone https://github.com/JovaniPink/mcp-browser-use.git
cd mcp-browser-use
  1. Create and activate virtual environment:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
uv sync
  1. Start the server
uv run mcp-browser-use

Debugging

For debugging, use the MCP Inspector:

npx @modelcontextprotocol/inspector uv --directory /path/to/project run mcp-server-browser-use

The Inspector will display a URL for the debugging interface.

Browser Actions

The server supports various browser actions through natural language:

  • Navigation: Go to URLs, back/forward, refresh
  • Interaction: Click, type, scroll, hover
  • Forms: Fill forms, submit, select options
  • State: Get page content, take screenshots
  • Tabs: Create, close, switch between tabs
  • Vision: Find elements by visual appearance
  • Cookies & Storage: Manage browser state

Security

I want to note that their are some Chrome settings that are set to allow for the browser to be controlled by the server. This is a security risk and should be used with caution. The server is not intended to be used in a production environment.

Security Details: SECURITY.MD

Contributing

We welcome contributions to this project. Please follow these steps:

  1. Fork this repository.
  2. Create your feature branch: git checkout -b my-new-feature.
  3. Commit your changes: git commit -m 'Add some feature'.
  4. Push to the branch: git push origin my-new-feature.
  5. Submit a pull request.

For major changes, open an issue first to discuss what you would like to change. Please update tests as appropriate to reflect any changes made.

相关推荐

  • https://maiplestudio.com
  • Find Exhibitors, Speakers and more

  • Emmet Halm
  • Converts Figma frames into front-end code for various mobile frameworks.

  • Yusuf Emre Yeşilyurt
  • I find academic articles and books for research and literature reviews.

  • https://suefel.com
  • Latest advice and best practices for custom GPT development.

  • Carlos Ferrin
  • Encuentra películas y series en plataformas de streaming.

  • Joshua Armstrong
  • Confidential guide on numerology and astrology, based of GG33 Public information

  • https://zenepic.net
  • Embark on a thrilling diplomatic quest across a galaxy on the brink of war. Navigate complex politics and alien cultures to forge peace and avert catastrophe in this immersive interstellar adventure.

  • Elijah Ng Shi Yi
  • Advanced software engineer GPT that excels through nailing the basics.

  • https://reddgr.com
  • Delivers concise Python code and interprets non-English comments

  • 林乔安妮
  • A fashion stylist GPT offering outfit suggestions for various scenarios.

  • 1Panel-dev
  • 💬 MaxKB is a ready-to-use AI chatbot that integrates Retrieval-Augmented Generation (RAG) pipelines, supports robust workflows, and provides advanced MCP tool-use capabilities.

  • Mintplex-Labs
  • The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.

  • ShrimpingIt
  • Micropython I2C-based manipulation of the MCP series GPIO expander, derived from Adafruit_MCP230xx

  • GLips
  • MCP server to provide Figma layout information to AI coding agents like Cursor

  • open-webui
  • User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

  • Dhravya
  • Collection of apple-native tools for the model context protocol.

  • patchy631
  • In-depth tutorials on LLMs, RAGs and real-world AI agent applications.

    Reviews

    1 (1)
    Avatar
    user_NmWQMikA
    2025-04-17

    As a dedicated user of mcp-browser-use, I'm truly impressed by its seamless integration and user-friendly interface. The ease of navigation and the robust features put forth by JovaniPink make browsing a delight. Highly recommend for anyone looking for a reliable and efficient browser solution. Check it out: https://github.com/JovaniPink/mcp-browser-use