Skip to main content
Back to Blog

My Bare-Metal Kubernetes Home Lab

kuberneteshomelabdevopsselfhosted

My Bare-Metal Kubernetes Home Lab

Running Kubernetes in a cloud provider is easy. Running it on your own hardware at home, with real networking constraints, real storage, and real TLS certificates — that's where you actually learn how it works.

This is a rundown of the cluster I've been running and iterating on for the past year.

The Cluster

Three nodes, all bare metal:

  • k8s-master — control plane
  • k8s-worker1 and k8s-worker2 — worker nodes

All running Kubernetes v1.29 provisioned with kubeadm, with Ubuntu as the base OS.

Networking

  • CNI: Flannel for pod networking
  • LoadBalancer: MetalLB — assigns real IP addresses from my local subnet so services get actual IPs, not just NodePort hacks
  • Ingress: nginx-ingress-controller handles all inbound HTTP/HTTPS routing

This means I can hit any service at a real hostname on my network, with proper routing.

TLS Certificates

cert-manager with a Let's Encrypt ClusterIssuer and Cloudflare DNS validation. Every service gets a real, trusted HTTPS certificate automatically — including wildcard certs for *.rfox.net. No self-signed certs, no browser warnings.

Storage

Longhorn for distributed block storage. It replicates volumes across nodes, which means if a worker goes down I don't lose data. It also has a solid UI for volume management and snapshots.

What's Running

Some of the services I self-host on the cluster:

  • AWX (Ansible Tower) — automation and playbook management
  • Home Assistant — smart home control
  • Excalidraw — local diagramming
  • Homebox — home inventory management
  • Minecraft Bedrock — family server
  • Kubernetes Dashboard — cluster visibility
  • Ollama — local LLM inference

Plus the monitoring stack: Prometheus, Grafana, and New Relic for APM and log aggregation.

Observability

New Relic collects logs from all pods and gives me a dashboard across the cluster. For metrics I use both Prometheus/Grafana (for in-cluster dashboards) and New Relic's Kubernetes integration. metrics-server is also running for kubectl top support.

CI/CD

GitHub Actions builds and pushes container images to GHCR (GitHub Container Registry), then deploys to the cluster using kubectl. For static sites I use nginx ConfigMaps — no image builds needed.

Lessons Learned

Flannel restart storms are real. The kube-flannel pods accumulate restarts over the life of the cluster — this is usually harmless but worth watching.

cert-manager-cainjector is chatty. It restarts more than it should; it's a known issue. As long as certificates are issuing fine, it's noise.

Longhorn needs dedicated disk IOPS. Running storage replicas on the same drives as the OS creates contention under load. Separate disks (or at least separate partitions) helps.

MetalLB + Cloudflare Proxy = home.rfox.net. I point all public subdomains to a single Cloudflare-proxied IP that routes to the cluster. This keeps real IPs hidden and gives me DDoS protection for free.

What's Next

  • ArgoCD for proper GitOps instead of manual kubectl apply
  • Automated backups of Longhorn volumes to S3
  • More AI/LLM workloads on Ollama as local models improve

The home lab is never finished — that's kind of the point.