My Bare-Metal Kubernetes Home Lab

Running Kubernetes in a cloud provider is easy. Running it on your own hardware at home, with real networking constraints, real storage, and real TLS certificates — that's where you actually learn how it works.

This is a rundown of the cluster I've been running and iterating on for the past year.

The Cluster

Three nodes, all bare metal:

k8s-master — control plane
k8s-worker1 and k8s-worker2 — worker nodes

All running Kubernetes v1.29 provisioned with kubeadm, with Ubuntu as the base OS.

Networking

CNI: Flannel for pod networking
LoadBalancer: MetalLB — assigns real IP addresses from my local subnet so services get actual IPs, not just NodePort hacks
Ingress: nginx-ingress-controller handles all inbound HTTP/HTTPS routing

This means I can hit any service at a real hostname on my network, with proper routing.

TLS Certificates

cert-manager with a Let's Encrypt ClusterIssuer and Cloudflare DNS validation. Every service gets a real, trusted HTTPS certificate automatically — including wildcard certs for *.rfox.net. No self-signed certs, no browser warnings.

Storage

Longhorn for distributed block storage. It replicates volumes across nodes, which means if a worker goes down I don't lose data. It also has a solid UI for volume management and snapshots.

What's Running

Some of the services I self-host on the cluster:

AWX (Ansible Tower) — automation and playbook management
Home Assistant — smart home control
Excalidraw — local diagramming
Homebox — home inventory management
Minecraft Bedrock — family server
Kubernetes Dashboard — cluster visibility
Ollama — local LLM inference

Plus the monitoring stack: Prometheus, Grafana, and New Relic for APM and log aggregation.

Observability

New Relic collects logs from all pods and gives me a dashboard across the cluster. For metrics I use both Prometheus/Grafana (for in-cluster dashboards) and New Relic's Kubernetes integration. metrics-server is also running for kubectl top support.

CI/CD

GitHub Actions builds and pushes container images to GHCR (GitHub Container Registry), then deploys to the cluster using kubectl. For static sites I use nginx ConfigMaps — no image builds needed.

Lessons Learned

Flannel restart storms are real. The kube-flannel pods accumulate restarts over the life of the cluster — this is usually harmless but worth watching.

cert-manager-cainjector is chatty. It restarts more than it should; it's a known issue. As long as certificates are issuing fine, it's noise.

Longhorn needs dedicated disk IOPS. Running storage replicas on the same drives as the OS creates contention under load. Separate disks (or at least separate partitions) helps.

MetalLB + Cloudflare Proxy = home.rfox.net. I point all public subdomains to a single Cloudflare-proxied IP that routes to the cluster. This keeps real IPs hidden and gives me DDoS protection for free.

What's Next

ArgoCD for proper GitOps instead of manual kubectl apply
Automated backups of Longhorn volumes to S3
More AI/LLM workloads on Ollama as local models improve

The home lab is never finished — that's kind of the point.