ADR-0027: CPU-Based AI Deployment Assistant Architecture

Status

Accepted - Implemented (2025-11-11)

AI Assistant has been fully implemented with containerized deployment, RAG system integration, and seamless terminal-based interaction through deploy-qubinode.sh.

Context

The PRD analysis identified a significant opportunity to integrate a CPU-based AI deployment assistant to enhance user experience and reduce deployment complexity. Current challenges include:

Complex deployment troubleshooting: Users struggle with error diagnosis and resolution
High barrier to entry: Non-experts find infrastructure automation intimidating
Manual support overhead: Common issues require human intervention
Limited real-time guidance: Static documentation insufficient for dynamic scenarios

The solution must run locally without GPU dependencies, ensure data privacy, and integrate seamlessly with existing container-first execution model. Modern small language models like IBM Granite-4.0-Micro enable CPU-based inference with sufficient capability for technical assistance.

Decision

Implement a CPU-based AI deployment assistant using the following architecture:

Core Components

Inference Engine: llama.cpp for high-performance CPU inference
Language Model: IBM Granite-4.0-Micro (3B parameters, GGUF format)
Agent Framework: LangChain or LlamaIndex for orchestration
Knowledge Base: RAG over project documentation and best practices
Tool Integration: System diagnostics and configuration validation

Deployment Architecture

┌─────────────────────────────────────────────────────────┐
│                 Qubinode Navigator                       │
│                   (Main System)                          │
└────────────────────┬────────────────────────────────────┘
                     │ REST API / CLI Interface
                     │
┌────────────────────▼────────────────────────────────────┐
│            AI Deployment Assistant                       │
│                (Podman Container)                        │
│                                                          │
│  ┌─────────────────────────────────────────────────┐   │
│  │  llama.cpp + Granite-4.0-Micro (3B) GGUF        │   │
│  └─────────────────────────────────────────────────┘   │
│                                                          │
│  ┌─────────────────────────────────────────────────┐   │
│  │  Agent Framework (LangChain/LlamaIndex)         │   │
│  │  - RAG over Qubinode documentation              │   │
│  │  - Tool-calling for system diagnostics          │   │
│  └─────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────┘

Integration Points

CLI Interface: qubinode-ai command for interactive assistance
Ansible Integration: Callback plugins for real-time monitoring
Setup Script Hooks: Pre/post deployment validation and guidance
Log Analysis: Automated error pattern recognition and resolution

Consequences

Positive

Enhanced User Experience: Interactive guidance and troubleshooting
Reduced Barrier to Entry: Assists non-experts through complex configurations
Operational Efficiency: Automated diagnostics and error resolution
Scalable Support: Handles common issues without human intervention
Competitive Differentiation: Unique AI-powered infrastructure automation
Data Privacy: All processing remains local with no external dependencies

Negative

Resource Requirements: Additional CPU and memory overhead for inference
Container Orchestration Complexity: Managing AI service alongside existing components
Model Maintenance: Need for periodic model updates and retraining
Integration Complexity: Coordinating AI service with existing workflows
Performance Impact: Inference latency during real-time assistance

Risks

Model Accuracy Limitations: AI may provide incorrect guidance for edge cases
Integration Stability: Potential issues with Ansible callback integration
Resource Contention: AI inference competing with deployment processes
Maintenance Overhead: Additional component to monitor and update

Alternatives Considered

Cloud-based AI services: Rejected due to data privacy and dependency concerns
GPU-accelerated inference: Rejected due to hardware requirements constraint
Rule-based expert system: Insufficient flexibility for complex scenarios
External chatbot integration: Lacks deep system integration capabilities
Human-only support: Does not scale with user growth

Implementation Plan

Phase 1: Core AI Service (Weeks 1-4)

Build llama.cpp-based container with Granite-4.0-Micro model
Implement basic REST API for inference
Create CLI interface (qubinode-ai) for testing
Validate CPU performance and resource requirements

Phase 2: Knowledge Integration (Weeks 5-8)

Structure existing documentation for RAG embedding
Implement vector database (ChromaDB) for knowledge retrieval
Create tool-calling framework for system diagnostics
Test knowledge accuracy and retrieval performance

Phase 3: Deployment Integration (Weeks 9-12)

Integrate with setup scripts for pre/post deployment guidance
Implement Ansible callback plugins for real-time monitoring
Create automated log analysis and error resolution
Comprehensive testing across deployment scenarios

Phase 4: Production Readiness (Weeks 13-16)

Performance optimization and resource tuning
Security hardening and access controls
Monitoring and observability integration
Documentation and user training materials

Technical Requirements

Hardware Requirements

CPU: x86_64 with AVX2 support (minimum for llama.cpp optimization)
Memory: Additional 4-8GB RAM for model loading and inference
Storage: 2-4GB for model files and knowledge base

Software Dependencies

Container Runtime: Podman (existing requirement)
Python Libraries: langchain, chromadb, fastapi
Model Format: GGUF-quantized Granite-4.0-Micro
Vector Database: ChromaDB for document embeddings

ADR-0001: Container-First Execution Model (containerized AI service)
ADR-0004: Security Architecture (local processing, no external APIs)
ADR-0007: Bash-First Orchestration (CLI integration points)
ADR-0011: Comprehensive Platform Validation (enhanced with AI analysis)

Date

2025-11-07

Stakeholders

AI/ML Engineering Team
DevOps Team
UX/Product Team
Infrastructure Team
Security Team