MCP (Model Context Protocol)

Definition

MCP (Model Context Protocol) ist Anthropics offener Standard fuer LLM-Tool-Kopplung. Statt jedes Werkzeug im Modell selbst zu kodieren, beschreibt ein MCP-Server seine Faehigkeiten als JSON-Schema; ein MCP-Client vermittelt zwischen LLM-Host und Server ueber JSON-RPC-Nachrichten. Anthropic hat MCP Ende 2024 offen veroeffentlicht; OpenAI, Google und andere haben es seitdem als Quasi-Standard uebernommen.

Mechanik

Drei Komponenten:

Host — das LLM (Claude, GPT, ...) entscheidet ob und welches Tool gerufen wird, basierend auf den vom Server gemeldeten Tool-Beschreibungen.
Client — die App, die Host und Server koppelt (z.B. Claude Code, Cursor, eine Inference-Engine deiner Wahl).
Server — stellt Tools, Resources oder Prompts bereit; lokal als stdio-Subprozess oder remote ueber HTTP+SSE.

Standard-Methoden des Protokolls: - initialize — Handshake, Capability-Austausch - tools/list — Server meldet seine Tools mit JSON-Schema-Definition - tools/call — Client ruft ein Tool mit konkreten Argumenten auf - resources/list / resources/read — Server stellt Daten zur Verfuegung - prompts/list / prompts/get — wiederverwendbare Prompt-Vorlagen

Tool-Calling-Loop in der Praxis: Der Host bekommt die Tool-Beschreibung als Teil des System-Prompts, gibt eine strukturierte Tool-Anfrage zurueck (Modell-spezifisches Format — XML-Tags, JSON-Wrapper, Funktion-Call-Token), der Client parst sie, ruft den Server, gibt das Ergebnis als naechste User-Message zurueck. Der Host formuliert die finale Antwort.

Selber ausprobieren

Diese Demo nutzt Llama-3.1-8B-Instruct (Meta) und implementiert eine Mini-Variante des MCP-Protokolls in Python. Du siehst den vollstaendigen Two-Turn-Tool-Call-Loop: Frage -> Tool-Auswahl -> Tool-Ausfuehrung -> finale Antwort.

⚠ Achtung — Download-Groesse: Llama-3.1-8B in Q4 ist ~4-5 GB. Beim ersten Klick ladet der Browser das Modell einmalig in den OPFS-Cache (1-5 min je nach Verbindung). Bei spaeteren Besuchen instant verfuegbar. Nutze WLAN/Festnetz, nicht Mobilfunk.

💡 Warum 8B und nicht 3.5B? Phi-3.5-mini (das wir vorher hier eingebaut hatten) ist 2 GB klein und scheitert reproduzierbar an drei Stellen: vertraut sich selbst mehr als dem Tool-Resultat (Math 137x42: rechnete falsch, lehnte korrektes Tool-Ergebnis ab); ruft Tools auf wenn keine noetig sind (Wissensfrage „capital of France"); driftet in Round 2 in fiktive Folgekonversationen. Llama-3.1-8B ist die kleinste Klasse mit echtem Tool-Calling-Fine-Tuning.

Pyodide laedt… Strg/⌘+Enter zum Ausführen

Nach dem Run erscheinen hier die Top-Level-Variablen.

import json
import re
from pyground import llm

# Llama-3.1-8B-Instruct: ~4-5 GB Erst-Download (einmalig in den Browser-Cache).
# Beim 2. Klick instant. Mit pyground.llm.load("llama-8b") ueber den Alias.
print("Lade Llama-3.1-8B-Instruct (~4-5 GB beim ersten Mal)...")
m = await llm.load("llama-8b")
print("Modell bereit.")
print()


# ---------- MCP-Server (Mini-Implementierung) ----------

class MCPServer:
    # In-Memory MCP-Server. Echte Server kommunizieren via JSON-RPC
    # ueber stdio oder HTTP. Hier laeuft alles im selben Pyodide-Process.

    def __init__(self):
        self.tools = {}

    def register(self, name, description, params_schema, func):
        self.tools[name] = {
            "name": name,
            "description": description,
            "inputSchema": params_schema,
            "_func": func,
        }

    def list_tools(self):
        return [
            {"name": t["name"], "description": t["description"],
             "inputSchema": t["inputSchema"]}
            for t in self.tools.values()
        ]

    def call_tool(self, name, arguments):
        if name not in self.tools:
            return {"error": f"Tool '{name}' unknown"}
        try:
            result = self.tools[name]["_func"](**arguments)
            return {"content": [{"type": "text", "text": str(result)}]}
        except Exception as e:
            return {"error": str(e)}


# ---------- Tools registrieren (English descriptions) ----------

server = MCPServer()

server.register(
    "calculator",
    "Computes a op b. Op can be '+', '-', '*', '/'.",
    {"type": "object", "properties": {
        "a": {"type": "number"},
        "b": {"type": "number"},
        "op": {"type": "string", "enum": ["+", "-", "*", "/"]}
    }, "required": ["a", "b", "op"]},
    lambda a, b, op: {"+": a+b, "-": a-b, "*": a*b, "/": a/b}[op]
)

server.register(
    "word_count",
    "Counts the words in a given text string.",
    {"type": "object", "properties": {
        "text": {"type": "string"}
    }, "required": ["text"]},
    lambda text: len(str(text).split())
)


# ---------- System Prompt ----------

tools_json = json.dumps(server.list_tools(), indent=2, ensure_ascii=False)

system_prompt = (
    "You are an assistant with access to the following tools:\n\n"
    + tools_json
    + "\n\nWhen you need a tool, respond EXACTLY in this format and NOTHING else:\n"
    + "<tool_call>\n"
    + '{"name": "TOOL_NAME", "arguments": {...}}\n'
    + "</tool_call>\n\n"
    + "RULES:\n"
    + "- ALWAYS use calculator for ANY arithmetic.\n"
    + "- ALWAYS use word_count for ANY word counting.\n"
    + "- If no tool fits the question (general knowledge, geography, "
    + "history, definitions), answer directly in prose. Do NOT call a "
    + "tool 'just to demonstrate'.\n"
    + "- The tool result is AUTHORITATIVE. Do NOT compute things "
    + "yourself. Do NOT contradict the tool.\n\n"
    + "Examples:\n\n"
    + "User: What is 5 plus 7?\n"
    + 'Assistant: <tool_call>\n{"name": "calculator", "arguments": {"a": 5, "b": 7, "op": "+"}}\n</tool_call>\n\n'
    + "User: How many words are in 'hello world'?\n"
    + 'Assistant: <tool_call>\n{"name": "word_count", "arguments": {"text": "hello world"}}\n</tool_call>\n\n'
    + "User: What is the capital of Italy?\n"
    + "Assistant: Rome is the capital of Italy.\n"
)


async def ask(question):
    # Stellt eine Frage an das LLM mit Tool-Calling-Loop.
    print(f"User: {question}")
    print("-" * 60)

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": question},
    ]

    # 1. LLM-Runde: Modell entscheidet ob Tool noetig
    reply = await m.chat(messages)
    print(f"LLM (Round 1): {reply.strip()}")

    # 2. Parse Tool-Call
    pattern = r"<tool_call>\s*(\{.*?\})\s*</tool_call>"
    match = re.search(pattern, reply, re.DOTALL)

    if not match:
        # Kein Tool noetig - LLM antwortete direkt
        print()
        print(f"=> Antwort: {reply.strip()}")
        return

    try:
        call = json.loads(match.group(1))
    except json.JSONDecodeError as e:
        print()
        print(f"LLM-JSON kaputt: {e}")
        return

    print()
    print(f"Tool-Call: {call['name']}({call.get('arguments', {})})")
    result = server.call_tool(call["name"], call.get("arguments", {}))
    print(f"Tool-Resultat: {result}")

    # 3. LLM-Runde 2: Modell formuliert finale Antwort mit Tool-Resultat
    messages.append({"role": "assistant", "content": reply})
    messages.append({
        "role": "user",
        "content": (
            f"Tool result: {json.dumps(result, ensure_ascii=False)}. "
            "Use ONLY this result to formulate the final answer to my "
            "original question in plain prose. Do not add commentary "
            "about whether the result is correct."
        ),
    })
    final = await m.chat(messages)
    print()
    print(f"LLM (Round 2 - final): {final.strip()}")


# ---------- Demo: three questions ----------

print("=" * 60)
await ask("What is 137 times 42?")
print()

print("=" * 60)
await ask("How many words are in 'The quick brown fox jumps over the lazy dog'?")
print()

print("=" * 60)
await ask("What is the capital of France?")

Was du im Output erwartest mit Llama-3.1-8B (anders als zuvor mit Phi-3.5-mini):

Math-Frage: Tool-Call -> 5754 -> Modell akzeptiert Tool-Resultat -> finale Prosa-Antwort
Word-Count: Tool-Call -> 9 -> Modell akzeptiert -> Antwort
Wissens-Frage: Direkte Antwort „Paris is the capital of France." — kein Tool-Call

Llama-3.1-8B-Instruct wurde explizit auf Tool-Calling-Beispiele fine-getuned (Meta hat das im Llama-3.1-Paper dokumentiert) und hat dadurch deutlich bessere Tool-Disziplin als Phi-3.5-mini.

In der Praxis

Echte MCP-Server kommunizieren ueber stdio (Subprozess-Kommunikation) oder HTTP+SSE (Server-Sent Events). Beispiele aus dem Anthropic-Repo: GitHub-MCP-Server liest und schreibt Issues; Filesystem-Server liest lokale Dateien; Postgres-Server fuehrt SQL aus.

In Claude Code, Cursor, oder via Anthropic-CLI configuriert man MCP-Server in einer claude_desktop_config.json oder mcp.json, der LLM-Client startet die als Subprozess und routet Tool-Calls automatisch.

Definition

Mechanik

Selber ausprobieren

In der Praxis

Quellen