My Bare-Metal Kubernetes Home Lab
My Bare-Metal Kubernetes Home Lab
Running Kubernetes in a cloud provider is easy. Running it on your own hardware at home, with real networking constraints, real storage, and real TLS certificates — that's where you actually learn how it works.
This is a rundown of the cluster I've been running and iterating on for the past year.
The Cluster
Three nodes, all bare metal:
- k8s-master — control plane
- k8s-worker1 and k8s-worker2 — worker nodes
All running Kubernetes v1.29 provisioned with kubeadm, with Ubuntu as the base OS.
Networking
- CNI: Flannel for pod networking
- LoadBalancer: MetalLB — assigns real IP addresses from my local subnet so services get actual IPs, not just NodePort hacks
- Ingress: nginx-ingress-controller handles all inbound HTTP/HTTPS routing
This means I can hit any service at a real hostname on my network, with proper routing.
TLS Certificates
cert-manager with a Let's Encrypt ClusterIssuer and Cloudflare DNS validation. Every service gets a real, trusted HTTPS certificate automatically — including wildcard certs for *.rfox.net. No self-signed certs, no browser warnings.
Storage
Longhorn for distributed block storage. It replicates volumes across nodes, which means if a worker goes down I don't lose data. It also has a solid UI for volume management and snapshots.
What's Running
Some of the services I self-host on the cluster:
- AWX (Ansible Tower) — automation and playbook management
- Home Assistant — smart home control
- Excalidraw — local diagramming
- Homebox — home inventory management
- Minecraft Bedrock — family server
- Kubernetes Dashboard — cluster visibility
- Ollama — local LLM inference
Plus the monitoring stack: Prometheus, Grafana, and New Relic for APM and log aggregation.
Observability
New Relic collects logs from all pods and gives me a dashboard across the cluster. For metrics I use both Prometheus/Grafana (for in-cluster dashboards) and New Relic's Kubernetes integration. metrics-server is also running for kubectl top support.
CI/CD
GitHub Actions builds and pushes container images to GHCR (GitHub Container Registry), then deploys to the cluster using kubectl. For static sites I use nginx ConfigMaps — no image builds needed.
Lessons Learned
Flannel restart storms are real. The kube-flannel pods accumulate restarts over the life of the cluster — this is usually harmless but worth watching.
cert-manager-cainjector is chatty. It restarts more than it should; it's a known issue. As long as certificates are issuing fine, it's noise.
Longhorn needs dedicated disk IOPS. Running storage replicas on the same drives as the OS creates contention under load. Separate disks (or at least separate partitions) helps.
MetalLB + Cloudflare Proxy = home.rfox.net. I point all public subdomains to a single Cloudflare-proxied IP that routes to the cluster. This keeps real IPs hidden and gives me DDoS protection for free.
What's Next
- ArgoCD for proper GitOps instead of manual
kubectl apply - Automated backups of Longhorn volumes to S3
- More AI/LLM workloads on Ollama as local models improve
The home lab is never finished — that's kind of the point.