I Told My AI to Monitor Everything. It Built a Better Stack Than I Would Have.

Disclosure: I’m a Cloudflare employee and shareholder. Opinions expressed here are my own and do not represent the views of Cloudflare.

Editor’s note (March 2026): Since writing this, I’ve replaced the monitoring container with a VM. What I described as a peripheral SSH quirk turned out to be a symptom of a deeper LXC limitation — unprivileged containers have unreliable SSH access and systemd dropouts that make them the wrong foundation for always-on infrastructure. If you’re building this stack, start with a VM. Everything else in this post holds.

The Prompt That Started It

I didn’t plan to have a comprehensive observability stack at home. I had several Linux containers running on a virtualisation host, AI agents generating logs around the clock, and a growing realisation that I was flying blind.

The prompt was deliberately vague — something about needing observability of the infra and systems that run my house and homelab. I didn’t name any tools. I didn’t suggest an approach. I wanted to see what would happen if I gave the OpenClaw agent that I’d designated my personal assistant the problem and nothing else.

No architecture diagram. No tool selection meeting. No Jira ticket. Just a goal and some trust.

What happened next took under 2 hours and minimal intervention on my part — I had to step in a couple of times to escalate permissions and work around an SSH issue, but the agent drove the process end to end.

Why I Didn’t Pick the Tools

This is the part that feels counterintuitive. I’m an infrastructure guy. I could have specified tooling that I already know; I have opinions about observability stacks, though I’m 2 years out of working directly in the space and a lot can change in that time.

But that would have missed the point.

The interesting question isn’t whether an AI agent can follow a runbook. It’s whether it can make reasonable technical decisions when given a goal instead of a checklist. Can it survey an environment, understand what’s running, choose appropriate tools, and wire everything together?

So I stepped back and let it work.

What It Found

The agent started by querying the hypervisor’s API — it already had access from earlier work. Within minutes, it had a complete inventory of the virtualisation host and every container and VM running on it.

A mix of workloads — AI agents, a home security camera recorder, utility boxes — each with different resource profiles. All needing visibility.

What It Chose

The agent came back with a proposal before touching anything:

Prometheus for metrics collection
Grafana for dashboards and alerting
Loki for log aggregation
Promtail as the log shipper on each container
Node Exporter on every target for system-level metrics

Nothing exotic. Nothing bleeding-edge. Just tools that work. It picked battle-tested software that composes well, is well-documented, and runs comfortably on modest hardware. Not the stack I would have chosen — I’d never tried Loki, Promtail, or Node Exporter before.

How It Built Everything

First, it needed a home. The agent called the hypervisor API and provisioned a new container — 2GB RAM, dedicated to monitoring. A separate box, isolated from the workloads it watches.

Then the deployment sequence:

1. Node Exporter everywhere. The agent connected to each container and installed Node Exporter. It also deployed it on the hypervisor host itself — a must.

2. Prometheus on the monitoring container. Configured with scrape targets for every machine it discovered, plus itself. All reporting UP on the first try.

3. Grafana alongside Prometheus. Imported the Node Exporter Full dashboard, and created a service account with an API key so it could manage dashboards programmatically going forward.

4. Loki for logs. This is where it got interesting. The agent didn’t just install Loki — it configured 90-day retention and set up Promtail on every container running OpenClaw. Agent logs, structured and searchable, flowing into a central store.

5. Promtail on each agent container. AI agents got Promtail instances, configured to ship OpenClaw logs to Loki with appropriate labels.

The whole thing was done in a single session. I stepped in twice — once to upgrade the agent’s API token permissions so it could provision containers, and once to help work around the SSH issue on the monitoring box. Both times, the agent told me exactly what to type. I copied and pasted its suggested commands into the shell. I didn’t need to think about what I was running — though I’ll admit I read each one before hitting enter. Old habits. Everything else was autonomous. The agent used the hypervisor API for container management and direct SSH for in-container work.

The SSH Problem

Not everything was smooth.

SSH into the monitoring container turned out to be unreliable — some interaction between authentication methods and DNS resolution that made connections flaky. Rather than spending hours debugging a peripheral issue, the agent documented the workaround: use the hypervisor’s exec command to shell into the container, or manage Grafana through its API.

This is the kind of pragmatic decision I appreciate. It didn’t block on a perfect solution. It found a path that works, documented why, and moved on.

What We Can See Now

Grafana dashboard showing CPU, memory, and system metrics across the homelab

The difference between “probably fine” and actual visibility is stark.

From a single Grafana dashboard, I can see CPU, memory, disk, and network across every container and the host. I can see that the NVR is chewing through its dedicated storage volume. I can see that the OpenClaw agents are surprisingly modest in their resource usage. I can see the host’s load patterns throughout the day.

And with Loki, I can search agent logs. What did the agent do at 3am? Pull up the logs. Did the backup job succeed? Check the structured output. When something goes wrong at 2am, I don’t have to SSH into boxes and grep through files — it’s all in one place.

What This Actually Demonstrates

I want to be careful about claims here. This isn’t artificial general intelligence. It’s not even particularly novel — plenty of DevOps engineers could set this up in an afternoon.

But that’s exactly the point.

The agent did competent infrastructure work. It assessed an environment, made reasonable tool choices, handled a deployment across multiple machines, worked around problems, and documented everything. It didn’t need a playbook. It needed a goal.

The shift isn’t from “impossible” to “possible.” It’s from “I need to find an afternoon” to “it’s already done.” For a solo operator managing a homelab — or a small team managing production infrastructure — that shift matters.

The Trust Gradient

There’s a spectrum here that I’ve been thinking about increasingly:

Agent follows exact instructions — basically a script with better error handling
Agent makes tactical decisions — chooses tools, handles edge cases
Agent makes strategic decisions — decides what needs doing, not just how

This project sat firmly at level two. I provided the strategic direction (we need monitoring), and the agent handled the tactics (which tools, which targets, which configuration). I didn’t have to review Prometheus scrape configs or debate Loki retention periods.

Level three is where it gets genuinely interesting. An agent that notices its own observability gap and proposes filling it. Looking beyond infra and systems, Alex Finn claims that his agent fleet has been proactively driving revenue growth for his business — as he put it on X: “My autonomous agent has now shipped multiple new features to Creator Buddy by itself”.

Cost

The monitoring container uses about 2GB of RAM and minimal CPU. Prometheus, Grafana, Loki, and Promtail all run on free, open-source software. The only real cost is the disk space for metrics retention (90 days) and log storage.

On a homelab that was already running, the marginal cost is effectively zero.

What I’d Do Differently

Honestly? Not much. But there are things I want to add:

Alerting rules — Grafana can push notifications when thresholds are breached. Right now we’re dashboard-only.
Cross-agent health checks — each agent monitoring the other, so there’s no single point of failure in the observation layer.
Trend analysis — the data’s there. Feeding it back to the agents for capacity planning is an obvious next step.

The Bigger Picture

If this works for a solo homelab operator, imagine what it means for teams stuck in month-long implementation cycles.

The barriers between intention and execution are falling fast. But autonomy without accountability is just risk with extra steps. The challenge ahead isn’t whether to trust agents — it’s how we introduce governance and guardrails without stifling the autonomy that makes them useful in the first place.

What becomes the bottleneck when the doing happens at the speed of thinking?

Agentically co-authored.

I Spent Nights Training My AI Assistant. Here's How I Made Sure I'd Never Lose Him.