← Back to Blog

Running AI Locally: Lessons from The Coe Lab

Practical insights from running a self-hosted AI assistant, including hardware challenges, memory architecture, and when to choose local vs cloud AI.

After months of running OpenClaw as my personal AI assistant, I've learned a lot about what works—and what doesn't—when hosting AI infrastructure locally. Here are the key lessons.

Hardware Reality Check

The biggest misconception about local AI is that you can just run any model on any hardware. You can't. I learned this the hard way when I tried to run a 70B parameter model on my home server and watched it crash repeatedly.

The reality: Local inference requires serious hardware. For anything beyond small models (7B-13B parameters), you need:

My solution? Use cloud models via Ollama Cloud. The local Ollama server acts as a relay, but the actual inference happens on cloud GPUs. This gives me access to powerful models like Qwen3.5 (397B parameters) without needing a $10,000 GPU setup.

Memory Architecture Matters

An AI assistant without memory is just a chatbot. The key insight: memory should be external to the model. I use Neo4j as a graph database to store:

This allows the AI to have contextual conversations. When I ask "How's the media server doing?", it can query Neo4j and tell me about Radarr, Sonarr, and any stalled downloads—not just give a generic response.

When to Go Local vs Cloud

After experimenting with both, here's my framework:

Use Local Models When:

Use Cloud Models When:

The Hybrid Approach

My current setup uses both:

This balances cost, performance, and capability. Simple tasks stay local; complex work goes to the cloud.

Automation is Where It Shines

The real power of a self-hosted AI isn't chat—it's automation. OpenClaw runs scheduled jobs that:

These aren't scripted commands—they're AI-driven decisions. The media manager, for example, analyzes torrent health, decides what's dead, removes it, and triggers re-searches. All autonomously.

Lessons Learned

  1. Start simple. Don't try to build everything at once. Begin with one use case (like status updates) and expand.
  2. Invest in memory. An AI without context is frustrating. Graph databases make it possible to have meaningful conversations.
  3. Embrace hybrid. Local + cloud gives you the best of both worlds.
  4. Automation > Chat. The most valuable use case isn't answering questions—it's doing things autonomously.
  5. Monitor everything. If your AI assistant goes down, you want to know. Health checks and alerts are essential.

What's Next

The roadmap includes expanding automation (email triage, calendar management), improving memory (better entity extraction from conversations), and exploring multi-agent workflows (specialized AI assistants for different domains).

If you're building your own AI infrastructure, I'm happy to share what I've learned. Reach out via the contact page or connect on Discord.