Website | Documentation | Blog | Get API Key
Free and Open Source Software: A modern voice-controlled AI interface powered by Google Gemini and Anthropic MCP (Model Control Protocol). Transform how you interact with AI through natural speech and multimodal inputs.
If you like this project, please consider starring it on GitHub and sharing it. It helps me get more visibility and support for this project and keep the lights on.
A modern Vite + TypeScript application that enables voice-controlled AI workflows through MCP (Model Control Protocol). This project revolutionizes how you interact `with AI systems by combining Google Gemini's multimodal capabilities with MCP's extensible tooling system.
- Demo & Showcase
- Why Systemprompt MCP?
- Core Features
- Architecture
- Getting Started
- Tech Stack
- Testing & Quality
- Version History
- Contributing
- Security
- Support
- License
- Acknowledgments
- Resources
- Extensions
- Voice-controlled AI interactions
- Multimodal input processing
- Tool execution and workflow automation
- Real-time voice synthesis
Watch our video demonstration to see Systemprompt MCP Client in action:
- Voice CMS and Agent creation: Watch Demo Video
- Voice Agent with Google: Watch Demo Video
Transform your AI interactions with a powerful voice-first interface that combines the best of:
- Google Gemini's Multimodal AI: Understand and process text, voice, and visual inputs naturally
- MCP (Model Control Protocol): Execute complex AI workflows with a robust tooling system
- Voice-First Design: Control everything through natural speech, making AI interaction more intuitive
Perfect for:
- Developers building voice-controlled AI applications
- Teams needing a flexible AI workflow orchestration system
- Organizations wanting to leverage Google Gemini's capabilities with extensible tooling
- Natural Voice Control: Speak naturally to control AI workflows and execute commands
- Multimodal Understanding: Process text, voice, and visual inputs simultaneously
- Real-time Voice Synthesis: Get instant audio responses from your AI interactions
- Extensible Tool System: Add custom tools and workflows through MCP
- Workflow Automation: Chain multiple AI operations with voice commands
- State Management: Robust handling of complex, multi-step AI interactions
- Modern Tech Stack: Built with Vite, React, TypeScript, and NextUI
- Type Safety: Full TypeScript support with comprehensive type definitions
- Hot Module Replacement: Fast development with instant feedback
- Comprehensive Testing: Built-in testing infrastructure with high coverage
- Secure: Built-in security best practices for API key management
- Scalable: Modular architecture supporting multiple LLM providers
- Configurable: Extensive configuration options for different environments
The system follows a modular, feature-based architecture:
graph TD
A[Web Interface] --> B[Feature Modules]
B --> C[Multimodal Agent]
B --> D[LLM Registry]
B --> E[Server Management]
C --> F[Voice Control]
C --> G[Prompt Execution]
D --> H[Model Configuration]
D --> I[LLM Integration]
E --> J[Server Config]
E --> K[Prompt Management]
style A fill:#f9f,stroke:#333,stroke-width:2px
style B fill:#bbf,stroke:#333,stroke-width:2px
style C,D,E fill:#ddf,stroke:#333,stroke-width:2px
- Multimodal Agent: Handles voice recognition, synthesis, and multimodal processing
- LLM Registry: Manages different language models and their configurations
- Server Management: Handles MCP server connections and tool orchestration
- Voice Control: Processes natural language commands and converts them to actions
- Prompt Management: Handles system prompts and their execution
- Node.js 16.x or higher
- npm 7.x or higher
- A modern browser with Web Speech API support
-
Clone the repository:
git clone https://github.com/Ejb503/multimodal-mcp-client.git cd multimodal-mcp-client
-
Install dependencies:
npm install
-
Set up configuration files:
# Navigate to config directory cd config # Create local configuration files from templates cp mcp.config.default.json mcp.config.json cp agent.config.default.json agent.config.json cp llm.config.default.json llm.config.json
Required Configuration:
- Get a Gemini API key from Google AI Studio
- Add it to
llm.config.json
in theapiKey
field - The app will not start without a valid API key in
llm.config.json
Edit the other configuration files to add your specific settings:
mcp.config.json
: Configure MCP server connectionsagent.config.json
: Set up agent configurations
Optional: You can get a free Systemprompt API key from systemprompt.io/console or configure any custom MCP server of your choice in
mcp.config.json
. With an API key, you can also use the systemprompt-mcp-core extension which provides additional agent management and prompt versioning capabilities. -
Start the development server:
npm run dev
The development server will be available at
http://localhost:5173
-
Build for production:
npm run build npm run preview # Preview the production build locally
- Frontend: React 18, TypeScript, Vite 6
- UI Components: NextUI, Tailwind CSS, Framer Motion
- State Management: Zustand
- Testing: Vitest, Testing Library
- AI Integration: Google Generative AI SDK
- MCP Protocol: @modelcontextprotocol/sdk
- Development: ESLint, TypeScript 5.6
- Natural language command processing
- Real-time voice synthesis
- Multi-language support
- Voice activity detection
- Google Gemini integration
- Multimodal input processing
- Real-time AI responses
- Custom prompt management
- SSE and stdio server support
- Custom tool creation
- Workflow automation
- State persistence
- Secure API key management
- Multiple server configurations
- Extensible architecture
- Comprehensive logging
# Run tests
npm test
# Watch mode
npm run test:watch
# Coverage report
npm run test:coverage
- v0.3.6 - Current release
- Enhanced voice processing
- Updated to Vite 6
- Improved TypeScript support
- New UI components
We welcome contributions! See our Contributing Guide for details.
- Secure API key handling
- Environment-based configuration
- Regular security updates
- Protected server endpoints
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini team for their powerful multimodal AI capabilities
- Model Control Protocol (MCP) community
- React and TypeScript communities
- NextUI and Tailwind CSS teams
- All contributors and maintainers
This project is proudly sponsored and maintained by Systemprompt. We're committed to advancing the field of AI tooling and making powerful AI interfaces accessible to everyone.
We're actively working on expanding the capabilities of Systemprompt MCP Client with exciting extensions:
- Custom Tool Builder: Create and deploy your own MCP tools
- Enterprise Workflow Templates: Pre-built workflows for common business scenarios
- Advanced Voice Processing: Enhanced voice recognition and synthesis capabilities
- Team Collaboration Features: Multi-user support and shared workflows
Stay tuned for updates and new releases! Follow us on GitHub or join our Discord community for the latest news.
To install extensions, follow these steps:
-
Navigate to the
extensions
folder:cd extensions
-
Clone the desired extension repository:
git clone <repository-url>
-
Follow the installation instructions provided in the cloned repository.
-
Update the configuration:
- Add a link to Node/Python in the
config/mcp.config.json
orconfig/mcp.config.default.json
.
- Add a link to Node/Python in the
Website | Documentation | Blog | Get API Key
A specialized Model Context Protocol (MCP) server that enables you to create, manage, and extend AI agents through a powerful prompt and tool management system. This server integrates with systemprompt.io to provide seamless creation, management, and versioning of system prompts through MCP. It works in conjunction with the multimodal-mcp-client to provide a complete voice-powered AI workflow solution.
An API KEY is required to use this server. This is currently free, although this may change in the future. You can get one here.
A specialized Model Context Protocol (MCP) server that integrates Google services (Gmail, Calendar, etc.) into your AI workflows. This server enables seamless access to Google services through MCP, allowing AI agents to interact with Gmail, Google Calendar, and other Google services.
- Systemprompt API Key: Sign up at systemprompt.io/console and create a new API key.
- MCP-Compatible Client: Use the Systemprompt MCP Client or any other MCP-compatible client.
- Google Cloud Project: Set up a Google Cloud account, enable API access, and configure OAuth2 credentials.
-
Google Cloud Setup:
- Create a project in Google Cloud Console.
- Enable Gmail, Calendar, and Drive APIs.
- Create OAuth2 credentials and download the JSON file as
credentials/google-credentials.json
.
-
Server Configuration:
- Install the package:
npm install systemprompt-mcp-google
. - Create the credentials directory:
mkdir -p credentials
. - Run the authentication script:
npm run auth-google
.
- Install the package:
- Gmail Integration: Read, send, and manage emails.
- Calendar Integration: Create and manage events.
- MCP Integration: Standard MCP interface with structured command responses.
- Through MCP Client: Use any MCP client to send commands to this server for Gmail and Calendar operations.
For detailed setup and usage instructions, refer to the systemprompt-mcp-google documentation.