MCP Developer Integration Guide
For LLM Developers and Tool Builders
This guide explains how to connect your LLM application to Qubinode Navigator’s MCP servers for infrastructure automation.
Table of Contents
- Overview
- Available MCP Servers
- Client Configuration
- Tool Reference
- LLM Interaction Patterns
- Authentication
- Troubleshooting
Overview
Qubinode Navigator exposes two MCP (Model Context Protocol) servers that enable LLMs to manage infrastructure:
| Server | Port | Purpose | Tools |
|---|---|---|---|
| Airflow MCP | 8889 | Workflow orchestration, VM management, RAG queries | 25 |
| AI Assistant MCP | 8081 | Documentation search, AI chat | 4 |
Protocol: SSE (Server-Sent Events) over HTTP Authentication: API Key via X-API-Key header Transport: mcp-remote npm package or direct SSE client
Available MCP Servers
Airflow MCP Server (Port 8889)
The primary server for infrastructure operations:
Endpoint: http://<YOUR_SERVER>:8889/sse
Health: http://<YOUR_SERVER>:8889/health
Capabilities:
- DAG management (list, trigger, status)
- VM lifecycle (create, delete, start, stop)
- RAG knowledge base queries
- Troubleshooting history
- Data lineage (OpenLineage/Marquez)
AI Assistant MCP Server (Port 8081)
Secondary server for documentation and chat:
Endpoint: http://<YOUR_SERVER>:8081/sse
Health: http://<YOUR_SERVER>:8081/health
Capabilities:
- RAG document search
- Context-aware AI chat
- Project status
Client Configuration
Claude Desktop
Location: ~/.config/claude/claude_desktop_config.json (Linux/Mac) or %APPDATA%\Claude\claude_desktop_config.json (Windows)
{
"mcpServers": {
"qubinode-airflow": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://YOUR_SERVER_IP:8889/sse",
"--header",
"X-API-Key:${MCP_API_KEY}"
],
"env": {
"MCP_API_KEY": "YOUR_AIRFLOW_API_KEY"
}
},
"qubinode-ai": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://YOUR_SERVER_IP:8081/sse",
"--header",
"X-API-Key:${MCP_API_KEY}"
],
"env": {
"MCP_API_KEY": "YOUR_AI_ASSISTANT_API_KEY"
}
}
}
}
After saving: Restart Claude Desktop completely.
Claude Code (CLI)
Add to your project’s .mcp.json or ~/.claude/mcp_servers.json:
{
"mcpServers": {
"qubinode-airflow": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://YOUR_SERVER:8889/sse", "--header", "X-API-Key:YOUR_API_KEY"]
},
"qubinode-ai": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://YOUR_SERVER:8081/sse", "--header", "X-API-Key:YOUR_API_KEY"]
}
}
}
Or configure via CLI:
claude mcp add qubinode-airflow --transport sse --url "http://YOUR_SERVER:8889/sse"
Cursor IDE
Add to Cursor’s MCP settings (~/.cursor/mcp.json):
{
"mcpServers": {
"qubinode": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://YOUR_SERVER:8889/sse",
"--header",
"X-API-Key:YOUR_API_KEY"
]
}
}
}
Continue.dev
Add to ~/.continue/config.json:
{
"experimental": {
"modelContextProtocolServers": [
{
"transport": {
"type": "sse",
"url": "http://YOUR_SERVER:8889/sse",
"headers": {
"X-API-Key": "YOUR_API_KEY"
}
}
}
]
}
}
Custom Python Client
Using the official MCP Python SDK:
import asyncio
from mcp import ClientSession
from mcp.client.sse import sse_client
async def connect_to_qubinode():
"""Connect to Qubinode MCP server."""
url = "http://YOUR_SERVER:8889/sse"
headers = {"X-API-Key": "YOUR_API_KEY"}
async with sse_client(url, headers=headers) as (read, write):
async with ClientSession(read, write) as session:
# Initialize connection
await session.initialize()
# List available tools
tools = await session.list_tools()
print(f"Available tools: {[t.name for t in tools.tools]}")
# Call a tool
result = await session.call_tool("list_dags", {})
print(f"DAGs: {result.content}")
# Call VM listing
vms = await session.call_tool("list_vms", {})
print(f"VMs: {vms.content}")
if __name__ == "__main__":
asyncio.run(connect_to_qubinode())
Install dependencies:
pip install mcp httpx-sse
Custom Node.js Client
Using @modelcontextprotocol/sdk:
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { SSEClientTransport } from "@modelcontextprotocol/sdk/client/sse.js";
async function main() {
const transport = new SSEClientTransport(
new URL("http://YOUR_SERVER:8889/sse"),
{
requestInit: {
headers: {
"X-API-Key": "YOUR_API_KEY"
}
}
}
);
const client = new Client({
name: "my-qubinode-client",
version: "1.0.0"
}, {
capabilities: {}
});
await client.connect(transport);
// List tools
const tools = await client.listTools();
console.log("Available tools:", tools.tools.map(t => t.name));
// Call a tool
const result = await client.callTool({
name: "list_dags",
arguments: {}
});
console.log("DAGs:", result.content);
await client.close();
}
main().catch(console.error);
Install dependencies:
npm install @modelcontextprotocol/sdk
Tool Reference
DAG Management (3 tools)
| Tool | Description | Parameters |
|---|---|---|
list_dags | List all Airflow DAGs with schedules and metadata | None |
get_dag_info | Get detailed info about a specific DAG | dag_id: string |
trigger_dag | Execute a DAG with optional config | dag_id: string, conf?: object |
VM Operations (5 tools)
| Tool | Description | Parameters |
|---|---|---|
list_vms | List all VMs managed by kcli/virsh | None |
get_vm_info | Get details about a specific VM | vm_name: string |
create_vm | Create a new VM | name: string, image?: string, memory?: int, cpus?: int, disk_size?: int |
delete_vm | Delete a VM | name: string |
preflight_vm_creation | CALL BEFORE create_vm - Validates resources, checks image availability, ensures success | name: string, image?: string, memory?: int, cpus?: int, disk_size?: int |
RAG Operations (6 tools)
| Tool | Description | Parameters |
|---|---|---|
query_rag | Semantic search in knowledge base | query: string, doc_types?: list, limit?: int, threshold?: float |
ingest_to_rag | Add documents to RAG | content: string, doc_type: string, source?: string, metadata?: object |
manage_rag_documents | Manage document ingestion | operation: string, params?: object |
get_rag_stats | Get RAG statistics | None |
compute_confidence_score | Assess task confidence | task_description: string, doc_types?: list |
check_provider_exists | Check for Airflow provider | system_name: string |
Troubleshooting (2 tools)
| Tool | Description | Parameters |
|---|---|---|
get_troubleshooting_history | Retrieve past solutions | error_pattern?: string, component?: string, only_successful?: bool, limit?: int |
log_troubleshooting_attempt | Log a troubleshooting attempt | task: string, solution: string, result: string, error_message?: string, component?: string |
Lineage (4 tools)
| Tool | Description | Parameters |
|---|---|---|
get_dag_lineage | Get DAG dependencies | dag_id: string, depth?: int |
get_failure_blast_radius | Analyze failure impact | dag_id: string, task_id?: string |
get_dataset_lineage | Get dataset producers/consumers | dataset_name: string |
get_lineage_stats | Get lineage system stats | None |
System (2 tools)
| Tool | Description | Parameters |
|---|---|---|
get_airflow_status | Get Airflow health status | None |
get_system_info | Get comprehensive system info | None |
Workflow Orchestration (2 tools) - NEW
These tools dramatically improve multi-step operation success rates by providing structured guidance.
| Tool | Description | Parameters |
|---|---|---|
get_workflow_guide | CALL FIRST for multi-step tasks - Returns step-by-step execution plan with tool sequence | workflow_type?: string, goal_description?: string |
diagnose_issue | Structured troubleshooting with component-specific diagnostic checks | symptom: string, component?: string, error_message?: string, affected_resource?: string |
Available workflow types:
create_openshift_cluster- Full OpenShift deployment (45-90 min)setup_freeipa- FreeIPA identity management (20-30 min)deploy_vm_basic- Simple VM creation with validation (5-10 min)troubleshoot_vm- Systematic VM troubleshooting (10-20 min)
LLM Interaction Patterns
Pattern 1: Discovery First
Always start by understanding the system:
1. Call get_system_info() to understand capabilities
2. Call get_airflow_status() to verify system health
3. Call list_dags() to see available workflows
4. Call list_vms() to see existing infrastructure
Pattern 2: Check Before Acting
Before modifying infrastructure:
1. Call compute_confidence_score(task_description)
- If < 0.6: STOP and request documentation
- If >= 0.6: Proceed with caution
- If >= 0.8: Proceed confidently
2. Call get_troubleshooting_history(error_pattern)
- Check if similar issues were solved before
3. Call check_provider_exists(system_name)
- Use native providers instead of BashOperator
Pattern 3: Learn From Failures
After any operation:
1. If successful:
log_troubleshooting_attempt(task, solution, "success")
2. If failed:
log_troubleshooting_attempt(task, solution, "failed", error_message)
get_troubleshooting_history(error_message) # Check for solutions
Pattern 4: VM Lifecycle (High Success Rate)
IMPORTANT: Always use preflight_vm_creation() before create_vm() for 80%+ success rate:
1. list_vms() → Check existing VMs and naming conventions
2. preflight_vm_creation(name, cpus=4, memory=8192, disk_size=50)
→ If any check fails: FIX FIRST before proceeding
→ Shows: memory available, disk space, image status, libvirt health
3. create_vm(name, cpus=4, memory=8192, disk_size=50)
4. get_vm_info(name) → Verify creation
5. [Do operations via DAGs]
6. delete_vm(name) → Cleanup when done
7. log_troubleshooting_attempt() → Record for future reference
Pattern 5: Lineage Awareness
Before retrying failed tasks:
1. get_failure_blast_radius(dag_id, task_id)
- Understand downstream impact
- Check severity rating
2. If severity is HIGH:
- Get human approval before retrying
- Consider partial recovery strategies
Pattern 6: Multi-Step Workflows (80%+ Success Rate)
NEW: Use get_workflow_guide() for complex operations:
1. get_workflow_guide(workflow_type="deploy_vm_basic")
OR
get_workflow_guide(goal_description="I need to deploy OpenShift")
→ Returns ordered steps with exact tool calls
→ Shows required vs optional steps
→ Includes failure recovery guidance
2. Follow the returned steps IN ORDER
→ Each step shows the exact tool call to make
→ Required steps must pass before continuing
3. If any step fails:
diagnose_issue(symptom="<what went wrong>", component="<vm|dag|network>")
→ Gets component-specific diagnostic checks
→ Links to historical solutions
4. After completion:
log_troubleshooting_attempt(task, solution, "success")
Pattern 7: Structured Troubleshooting (80%+ Success Rate)
NEW: Use diagnose_issue() for systematic problem resolution:
1. diagnose_issue(
symptom="VM won't boot",
component="vm",
error_message="libvirt error code 1",
affected_resource="my-test-vm"
)
2. Follow the diagnostic plan in order:
- Step 1: Check historical solutions
- Step 2: Search knowledge base
- Step 3: Run component-specific checks
3. When resolved:
log_troubleshooting_attempt(task, solution, "success", component="vm")
→ Builds knowledge base for future issues
Authentication
Generating API Keys
API keys are generated during setup:
cd /opt/qubinode_navigator/airflow
./setup-mcp-servers.sh
Or generate manually:
# Generate a secure key
openssl rand -hex 32
# Add to environment
export AIRFLOW_MCP_API_KEY="your-generated-key"
export MCP_API_KEY="your-generated-key"
Key Storage
Keys are stored in:
/opt/qubinode_navigator/airflow/.env.mcp- Environment variables in docker-compose.yml
Security Best Practices
- Never commit keys to git - Use
.envfiles or secrets management - Rotate keys regularly - Regenerate every 90 days
- Use read-only mode in production - Set
AIRFLOW_MCP_TOOLS_READ_ONLY=true - Restrict network access - Only expose MCP ports to trusted networks
Troubleshooting
Connection Issues
# Check if server is running
curl http://YOUR_SERVER:8889/health
# Expected response:
# {"status": "healthy", "features": {...}}
# Check with API key
curl -H "X-API-Key: YOUR_KEY" http://YOUR_SERVER:8889/sse
Tools Not Appearing
- Verify server is running:
podman ps | grep mcp - Check logs:
podman logs airflow_airflow-mcp-server_1 - Restart client completely (Claude Desktop, Cursor, etc.)
- Verify API key is correct
Permission Errors
If tools return permission errors:
# Check read-only mode
grep READ_ONLY /opt/qubinode_navigator/airflow/.env
# Disable read-only for write operations
export AIRFLOW_MCP_TOOLS_READ_ONLY=false
SSE Connection Drops
If connections drop frequently:
- Check server logs for errors
- Increase timeout in client config
- Verify network stability between client and server
Quick Reference
Endpoints
| Service | SSE Endpoint | Health Check |
|---|---|---|
| Airflow MCP | http://SERVER:8889/sse | http://SERVER:8889/health |
| AI Assistant MCP | http://SERVER:8081/sse | http://SERVER:8081/health |
Environment Variables
# Airflow MCP Server
AIRFLOW_MCP_ENABLED=true
AIRFLOW_MCP_PORT=8889
AIRFLOW_MCP_API_KEY=<key>
AIRFLOW_MCP_TOOLS_READ_ONLY=false
# AI Assistant MCP Server
MCP_SERVER_ENABLED=true
MCP_SERVER_PORT=8081
MCP_API_KEY=<key>
Test Commands
After configuring your client, try these prompts:
1. "List all available Airflow workflows"
2. "Show me the running VMs"
3. "Search documentation for kcli usage"
4. "What's the confidence score for deploying FreeIPA?"
5. "Create a test VM with 2 CPUs and 4GB RAM"
Support
- Documentation:
docs/MCP-SERVER-DESIGN.md - Quick Start:
MCP-QUICK-START.md - Issues: GitHub Issues
- ADR Reference:
docs/adrs/adr-0038-fastmcp-framework-migration.md
Last Updated: December 2025