File size: 3,814 Bytes
551ae1a
 
 
 
 
 
 
 
 
 
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fb723d2
551ae1a
fb723d2
551ae1a
 
fb723d2
 
 
 
551ae1a
fb723d2
 
 
 
 
551ae1a
 
fb723d2
 
 
 
551ae1a
 
fb723d2
 
 
 
 
 
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
 
 
 
 
 
fb723d2
551ae1a
fb723d2
551ae1a
fb723d2
551ae1a
 
 
 
fb723d2
551ae1a
 
 
fb723d2
551ae1a
fb723d2
 
551ae1a
 
 
 
fb723d2
551ae1a
 
 
fb723d2
 
551ae1a
fb723d2
551ae1a
 
 
 
fb723d2
551ae1a
fb723d2
 
551ae1a
 
fb723d2
 
 
 
551ae1a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: Inference Providers MCP Server
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.34.2
app_file: app.py
pinned: false
---

# πŸ€– Inference Providers MCP Server

A streamlined **Model Context Protocol (MCP) Server** that provides LLMs with access to Hugging Face Inference Providers through a single, focused tool.

## ✨ What is this?

This MCP server exposes a `chat_completion` tool that allows LLMs and AI assistants to chat with language models across 14+ inference providers including Cerebras, Cohere, Fireworks, Groq, and more.

**Why use this?** Instead of manually switching between different AI providers, your LLM can automatically access the best model for each task through a unified interface.

## πŸš€ Supported Providers

| Provider | Chat | Vision | Provider | Chat | Vision |
|----------|------|--------|----------|------|--------|
| Cerebras | βœ… | ❌ | Nebius | βœ… | βœ… |
| Cohere | βœ… | βœ… | Novita | βœ… | βœ… |
| Fal AI | βœ… | βœ… | Nscale | βœ… | βœ… |
| Featherless AI | βœ… | βœ… | Replicate | βœ… | βœ… |
| Fireworks | βœ… | βœ… | SambaNova | βœ… | βœ… |
| Groq | βœ… | ❌ | Together | βœ… | βœ… |
| HF Inference | βœ… | βœ… | Hyperbolic | βœ… | βœ… |

## πŸ› οΈ Quick Setup

### 1. Get HF Token
1. Visit [HF Settings](https://huggingface.co/settings/tokens)
2. Create token with **Inference Providers** scope
3. Copy the token (starts with `hf_`)

### 2. Configure Your MCP Client

#### Cursor IDE
Add to `.cursor/mcp.json`:
```json
{
  "mcpServers": {
    "inference-providers": {
      "url": "YOUR_URL/gradio_api/mcp/sse"
    }
  }
}
```

#### Claude Desktop
Add to MCP settings:
```json
{
  "mcpServers": {
    "inference-providers": {
      "command": "npx",
      "args": ["mcp-remote", "YOUR_URL/gradio_api/mcp/sse", "--transport", "sse-only"]
    }
  }
}
```


### 3. Server URLs

**HF Spaces:** `https://username-spacename.hf.space/gradio_api/mcp/sse`

**Local:** `http://localhost:7860/gradio_api/mcp/sse`

## 🎯 How to Use

Once configured, your LLM can use the tool:

> "Use chat completion with Groq and Llama to explain Python best practices"

> "Chat with DeepSeek V3 via Novita about machine learning concepts"

## πŸ› οΈ Available Tool

**`chat_completion`** - Generate responses using multiple AI providers

**Parameters:**
- `provider`: Provider name (novita, groq, cerebras, etc.)
- `model`: Model ID (e.g., `deepseek-ai/DeepSeek-V3-0324`)
- `messages`: Input text or JSON messages array
- `temperature`: Response randomness (0.0-2.0, default: 0.7)
- `max_tokens`: Max response length (1-4096, default: 512)

**Environment:** Requires `HF_TOKEN` environment variable

## 🎯 Popular Models

**Text Models:**
- `deepseek-ai/DeepSeek-V3-0324` (Novita)
- `meta-llama/Llama-3.1-70B-Instruct` (Groq)
- `mistralai/Mixtral-8x7B-Instruct-v0.1` (Together)

**Vision Models:**
- `meta-llama/Llama-3.2-11B-Vision-Instruct` (Together)
- `microsoft/Phi-3.5-vision-instruct` (HF Inference)

## πŸ’» Local Development

```bash
# Clone and setup
git clone <repository-url>
cd inference-providers-mcp
pip install -r requirements.txt

# Set token and run
export HF_TOKEN=hf_your_token_here
python app.py
```

## πŸ”§ Technical Details

- **Built with:** Gradio + MCP support (`gradio[mcp]`)
- **Protocol:** Model Context Protocol (MCP) via Server-Sent Events
- **Security:** Environment-based token management
- **Compatibility:** Works with Cursor, Claude Desktop, and other MCP clients

## πŸ”— Resources

- [Cursor MCP Docs](https://docs.cursor.com/context/model-context-protocol)
- [Gradio MCP Guide](https://huggingface.co/blog/gradio-mcp)
- [Inference Providers Docs](https://huggingface.co/docs/inference-providers)
- [Get HF Token](https://huggingface.co/settings/tokens)

## πŸ“ License

MIT License - see the code for details.