← Back to BlogApril 6, 2026

Running AI Locally: Lessons from The Coe Lab

Practical insights from running a self-hosted AI assistant, including hardware challenges, memory architecture, and when to choose local vs cloud AI.

After months of running OpenClaw as my personal AI assistant, I've learned a lot about what works—and what doesn't—when hosting AI infrastructure locally. Here are the key lessons.

Hardware Reality Check

The biggest misconception about local AI is that you can just run any model on any hardware. You can't. I learned this the hard way when I tried to run a 70B parameter model on my home server and watched it crash repeatedly.

The reality: Local inference requires serious hardware. For anything beyond small models (7B-13B parameters), you need:

GPU with 24GB+ VRAM for decent-sized models
64GB+ system RAM as a fallback
Fast NVMe storage for model loading

My solution? Use cloud models via Ollama Cloud. The local Ollama server acts as a relay, but the actual inference happens on cloud GPUs. This gives me access to powerful models like Qwen3.5 (397B parameters) without needing a $10,000 GPU setup.

Memory Architecture Matters

An AI assistant without memory is just a chatbot. The key insight: memory should be external to the model. I use Neo4j as a graph database to store:

People, projects, and relationships
Goals, tasks, and their status
Services, infrastructure, and configurations
Events, notes, and decisions

This allows the AI to have contextual conversations. When I ask "How's the media server doing?", it can query Neo4j and tell me about Radarr, Sonarr, and any stalled downloads—not just give a generic response.

When to Go Local vs Cloud

After experimenting with both, here's my framework:

Use Local Models When:

Privacy is critical (sensitive data never leaves your network)
Latency matters (no round-trip to cloud APIs)
You're doing simple tasks (classification, extraction, basic Q&A)
Cost is a concern (small models are free to run)

Use Cloud Models When:

You need advanced reasoning (complex problem-solving)
Vision capabilities are required (image analysis)
You want the latest models (cloud providers update frequently)
Hardware is a limitation (most home setups)

The Hybrid Approach

My current setup uses both:

Ministral 3 (8B) for heartbeats and simple cron jobs—fast, cheap, runs locally
GLM-5 (cloud) for heavy tasks like blog writing and complex analysis
Qwen3.5 (397B, cloud) as the primary model—handles vision, reasoning, and general tasks

This balances cost, performance, and capability. Simple tasks stay local; complex work goes to the cloud.

Automation is Where It Shines

The real power of a self-hosted AI isn't chat—it's automation. OpenClaw runs scheduled jobs that:

Post hourly status updates to Discord
Monitor media downloads and clean up stalled torrents
Check infrastructure health and reboot offline nodes
Generate daily reports and flush memory before reboots

These aren't scripted commands—they're AI-driven decisions. The media manager, for example, analyzes torrent health, decides what's dead, removes it, and triggers re-searches. All autonomously.

Lessons Learned

Start simple. Don't try to build everything at once. Begin with one use case (like status updates) and expand.
Invest in memory. An AI without context is frustrating. Graph databases make it possible to have meaningful conversations.
Embrace hybrid. Local + cloud gives you the best of both worlds.
Automation > Chat. The most valuable use case isn't answering questions—it's doing things autonomously.
Monitor everything. If your AI assistant goes down, you want to know. Health checks and alerts are essential.

What's Next

The roadmap includes expanding automation (email triage, calendar management), improving memory (better entity extraction from conversations), and exploring multi-agent workflows (specialized AI assistants for different domains).

If you're building your own AI infrastructure, I'm happy to share what I've learned. Reach out via the contact page or connect on Discord.