I craft unique cereal names, stories, and ridiculously cute Cereal Baby images.

mcp-server-whisper
An MCP Server for audio transcription using OpenAI
3 years
Works with Finder
1
Github Watches
1
Github Forks
11
Github Stars
MCP Server Whisper
A Model Context Protocol (MCP) server for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models.
Overview
MCP Server Whisper provides a standardized way to process audio files through OpenAI's latest transcription and speech services. By implementing the Model Context Protocol, it enables AI assistants like Claude to seamlessly interact with audio processing capabilities.
Key features:
- 🔍 Advanced file searching with regex patterns, file metadata filtering, and sorting capabilities
- 🔄 Parallel batch processing for multiple audio files
- 🔄 Format conversion between supported audio types
- 📦 Automatic compression for oversized files
- 🎯 Multi-model transcription with support for all OpenAI audio models
- 🗣️ Interactive audio chat with GPT-4o audio models
- ✏️ Enhanced transcription with specialized prompts and timestamp support
- 🎙️ Text-to-speech generation with customizable voices, instructions, and speed
- 📊 Comprehensive metadata including duration, file size, and format support
- 🚀 High-performance caching for repeated operations
Installation
# Clone the repository
git clone https://github.com/arcaputo3/mcp-server-whisper.git
cd mcp-server-whisper
# Using uv
uv sync
# Set up pre-commit hooks
uv run pre-commit install
Environment Setup
Create a .env
file with the following variables:
OPENAI_API_KEY=your_openai_api_key
AUDIO_FILES_PATH=/path/to/your/audio/files
Usage
Starting the Server
To run the MCP server in development mode:
mcp dev src/mcp_server_whisper/server.py
To install the server for use with Claude Desktop or other MCP clients:
mcp install src/mcp_server_whisper/server.py [--env-file .env]
Exposed MCP Tools
Audio File Management
-
list_audio_files
- Lists audio files with comprehensive filtering and sorting options:- Filter by regex pattern matching on filenames
- Filter by file size, duration, modification time, or format
- Sort by name, size, duration, modification time, or format
- All operations support parallelized batch processing
-
get_latest_audio
- Gets the most recently modified audio file with model support info
Audio Processing
-
convert_audio
- Converts audio files to supported formats (mp3 or wav) -
compress_audio
- Compresses audio files that exceed size limits
Transcription
-
transcribe_audio
- Advanced transcription using OpenAI's models:- Supports
whisper-1
,gpt-4o-transcribe
, andgpt-4o-mini-transcribe
- Custom prompts for guided transcription
- Optional timestamp granularities for word and segment-level timing
- JSON response format option
- Supports
-
chat_with_audio
- Interactive audio analysis using GPT-4o audio models:- Supports
gpt-4o-audio-preview-2024-10-01
,gpt-4o-audio-preview-2024-12-17
, andgpt-4o-mini-audio-preview-2024-12-17
- Custom system and user prompts
- Provides conversational responses to audio content
- Supports
-
transcribe_with_enhancement
- Enhanced transcription with specialized templates:-
detailed
- Includes tone, emotion, and background details -
storytelling
- Transforms the transcript into a narrative form -
professional
- Creates formal, business-appropriate transcriptions -
analytical
- Adds analysis of speech patterns and key points
-
Text-to-Speech
-
create_claudecast
- Generate text-to-speech audio using OpenAI's TTS API:- Supports
gpt-4o-mini-tts
(preferred) and other speech models - Multiple voice options (alloy, ash, coral, echo, fable, onyx, nova, sage, shimmer)
- Speed adjustment and custom instructions
- Customizable output file paths
- Handles texts of any length by automatically splitting and joining audio segments
- Supports
Supported Audio Formats
Model | Supported Formats |
---|---|
Transcribe | flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm |
Chat | mp3, wav |
Note: Files larger than 25MB are automatically compressed to meet API limits.
Example Usage with Claude
Basic Audio Transcription
Claude, please transcribe my latest audio file with detailed insights.
Claude will automatically:
- Find the latest audio file using
get_latest_audio
- Determine the appropriate transcription method
- Process the file with
transcribe_with_enhancement
using the "detailed" template - Return the enhanced transcription
Advanced Audio File Search and Filtering
Claude, list all my audio files that are longer than 5 minutes and were created after January 1st, 2024, sorted by size.
Claude will:
- Convert the date to a timestamp
- Use
list_audio_files
with appropriate filters:-
min_duration_seconds: 300
(5 minutes) -
min_modified_time: <timestamp for Jan 1, 2024>
-
sort_by: "size"
-
- Return a sorted list of matching audio files with comprehensive metadata
Batch Processing Multiple Files
Claude, find all MP3 files with "interview" in the filename and create professional transcripts for each one.
Claude will:
- Search for files using
list_audio_files
with:-
pattern: ".*interview.*\\.mp3"
-
format: "mp3"
-
- Process all matching files in parallel using
transcribe_with_enhancement
-
enhancement_type: "professional"
-
model: "gpt-4o-mini-transcribe"
(for efficiency)
-
- Return all transcriptions in a well-formatted output
Generating Text-to-Speech with Claudecast
Claude, create a claudecast with this script: "Welcome to our podcast! Today we'll be discussing artificial intelligence trends in 2025." Use the shimmer voice.
Claude will:
- Use the
create_claudecast
tool with:-
text_prompt
containing the script -
voice: "shimmer"
-
model: "gpt-4o-mini-tts"
(default high-quality model) -
instructions: "Speak in an enthusiastic, podcast host style"
(optional) -
speed: 1.0
(default, can be adjusted)
-
- Generate the audio file and save it to the configured audio directory
- Provide the path to the generated audio file
Configuration with Claude Desktop
Add this to your claude_desktop_config.json
:
UVX
{
"mcpServers": {
"whisper": {
"command": "uvx",
"args": [
"--with",
"aiofiles",
"--with",
"mcp[cli]",
"--with",
"openai",
"--with",
"pydub",
"mcp-server-whisper"
],
"env": {
"OPENAI_API_KEY": "your_openai_api_key",
"AUDIO_FILES_PATH": "/path/to/your/audio/files"
}
}
}
}
Recommendation (Mac OS Only)
- Install Screen Recorder By Omi (free)
- Set
AUDIO_FILES_PATH
to/Users/<user>/Movies/Omi Screen Recorder
and replace<user>
with your username - As you record audio with the app, you can transcribe large batches directly with Claude
Development
This project uses modern Python development tools including uv
, pytest
, ruff
, and mypy
.
# Run tests
uv run pytest
# Run with coverage
uv run pytest --cov=src
# Format code
uv run ruff format src
# Lint code
uv run ruff check src
# Run type checking (strict mode)
uv run mypy --strict src
# Run the pre-commit hooks
pre-commit run --all-files
CI/CD Workflow
The project uses GitHub Actions for CI/CD:
- Lint & Type Check: Ensures code quality with ruff and strict mypy type checking
- Tests: Runs tests on multiple Python versions (3.10, 3.11)
- Build: Creates distribution packages
- Publish: Automatically publishes to PyPI when a new version tag is pushed
To create a new release version:
git checkout main
# Make sure everything is up to date
git pull
# Create a new version tag
git tag v0.1.1
# Push the tag
git push origin v0.1.1
How It Works
For detailed architecture information, see Architecture Documentation.
MCP Server Whisper is built on the Model Context Protocol, which standardizes how AI models interact with external tools and data sources. The server:
- Exposes Audio Processing Capabilities: Through standardized MCP tool interfaces
- Implements Parallel Processing: Using asyncio and batch operations for performance
- Manages File Operations: Handles detection, validation, conversion, and compression
- Provides Rich Transcription: Via different OpenAI models and enhancement templates
- Optimizes Performance: With caching mechanisms for repeated operations
Under the hood, it uses:
-
pydub
for audio file manipulation -
asyncio
for concurrent processing - OpenAI's latest transcription models (including gpt-4o-transcribe)
- OpenAI's GPT-4o audio models for enhanced understanding
- OpenAI's gpt-4o-mini-tts for high-quality speech synthesis
- FastMCP for simplified MCP server implementation
- Type hints and strict mypy validation throughout the codebase
Contributing
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a new branch for your feature (
git checkout -b feature/amazing-feature
) - Make your changes
- Run the tests and linting (
uv run pytest && uv run ruff check src && uv run mypy --strict src
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Model Context Protocol (MCP) - For the protocol specification
- pydub - For audio processing
- OpenAI Whisper - For audio transcription
- FastMCP - For MCP server implementation
- Anthropic Claude - For natural language interaction
相关推荐
Converts Figma frames into front-end code for various mobile frameworks.
Oede knorrepot die vasthoudt an de goeie ouwe tied van 't boerenleven
A unified API gateway for integrating multiple etherscan-like blockchain explorer APIs with Model Context Protocol (MCP) support for AI assistants.
Mirror ofhttps://github.com/suhail-ak-s/mcp-typesense-server
本项目是一个钉钉MCP(Message Connector Protocol)服务,提供了与钉钉企业应用交互的API接口。项目基于Go语言开发,支持员工信息查询和消息发送等功能。
Micropython I2C-based manipulation of the MCP series GPIO expander, derived from Adafruit_MCP230xx
Short and sweet example MCP server / client implementation for Tools, Resources and Prompts.
Reviews

user_HPkUjQLw
I've been using mcp-server-whisper and it's truly fantastic! Thanks to the incredible work by arcaputo3, this server has streamlined my projects. The seamless integration and efficient design make it a powerful tool in managing connections effortlessly. Highly recommend checking it out on GitHub!