This OpenAI-compatible MCP server implementation provides a bridge between AI assistants and large language models, supporting both OpenAI and Anthropic models. It features a robust architecture with prompt templating, streaming responses, efficient caching, and comprehensive error handling. The server exposes endpoints for health checks, context generation, and prompt management, while also offering advanced features like token usage tracking and Prometheus metrics integration. Ideal for applications requiring reliable, high-performance access to LLMs with the flexibility to customize prompts and manage response caching.
arthurcolle