Writing
A self-tuning homelab
How I built a self-tuning Kubernetes homelab workflow
In this article series
3 articles- 01 Building a homelab kubernetes cluster
- 02 A self-tuning homelab Current
- 03 How I access the services I self-host in my cluster
Every now and then, I would receive pull requests like this.

The PR usually changes a few Kubernetes resource requests and limits in the HelmRelease manifests. The description tells me why each change was picked: which containers look underfed, which ones are sitting on capacity they do not use, and whether the proposed state still fits on the nodes I actually have. All of that happens automatically. I review it, merge it, and Flux rolls it out.
In this post, I will go through how I achieve this.
Resource constraints
My home cluster is not large. It is a Lenovo m720q mini PC and a Raspberry Pi 4 running about 20-30 services between them. That gives me enough room to run the things I care about, but not enough room to be casual about resource requests.
Kubernetes resource sizing is usually a guess at first. I know roughly what an app does, I give it a round number, and I move on. After a while the cluster drifts. Some pods sit on request budget they do not use. Others are too tight and end up throttled or OOM-killed at the worst possible time. In a cloud setup, the lazy answer is to add more nodes. In a small homelab, the ceiling is the ceiling.
What I built
I already have Prometheus up and running so I added a Python tool called resource-advisor. In a nutshell, it comprises of two Kubernetes CronJobs. One reads Prometheus data and writes recommendations into a ConfigMap daily. And then, the weekly job takes that report, checks whether the proposed changes still fit on the cluster, patches the HelmRelease files, and opens a GitHub PR. Kustomize generates the ConfigMap, a stock python:3.12-alpine container runs it, and the rest is schedule plus guardrails.
Reading the last couple of weeks
Collecting metrics
Collecting resource metrics from Prometheus is building the quantile_over_time promql query for each container.
cpu_query = (
f'quantile_over_time(0.95, rate(container_cpu_usage_seconds_total'
f'{{namespace="{namespace}",pod=~"{pod_regex}",'
f'container="{container_name}",image!=""}}[5m])'
f'[{metrics_window}:{metrics_resolution}])'
)
mem_query = (
f'quantile_over_time(0.95, container_memory_working_set_bytes'
f'{{namespace="{namespace}",pod=~"{pod_regex}",'
f'container="{container_name}",image!=""}})'
f'[{metrics_window}:{metrics_resolution}])'
)
I use p95 instead of averages because averages lie in a very polite way. A service like Sonarr can sit idle for most of the day, then spike during an indexer sync. The average makes that look cheap. The p95 is closer to the shape I care about: what does this thing need when it is actually doing work?
The script also tracks restart counts over the same window. That is important because a container that recently restarted is not something I want to “optimize” downward.
Recommendations
Recommendations are pretty straightforward. Start with observed usage, add a buffer, and do not let one run move too far.
# Target = p95 usage + 30% buffer for requests
target_req_cpu = max(min_cpu_m, cpu_p95_m * 1.30)
target_req_mem = max(min_mem_mi, mem_p95_mi * 1.30)
# Limits get a wider buffer (60%) and a floor relative to requests
target_lim_cpu = max(target_req_cpu * 2.0, cpu_p95_m * 1.60)
target_lim_mem = max(target_req_mem * 1.5, mem_p95_mi * 1.60)
# Cap the per-run adjustment to 25% of current value
rec_req_cpu = recommend(cur_req_cpu, target_req_cpu, max_step_percent=25)
The recommend function clamps the target around the current value:
def recommend(current, target, max_step_percent):
if current <= 0:
return target
step = max_step_percent / 100.0
low = current * (1.0 - step)
high = current * (1.0 + step)
return clamp(target, low, high)
So a container at 100m CPU with a 200m target does not jump straight to 200m. The first weekly run recommends 125m. The next run can move again if the data still supports it. I like that slower movement because the metrics need time to settle after a change lands.
Small differences are filtered out. If a recommendation is less than 10% away from the current value, or less than 25m CPU / 64Mi memory in absolute terms, it gets ignored. Otherwise the system would happily open PRs that change a service from 47m to 52m, which is technically precise and completely useless to review.
Guardrails
There are some guardrails logic that I incorporated.
Restarts are the obvious example. If a container restarted during the lookback window, the script refuses to reduce memory for it:
if restart_lookback > 0:
if rec_req_mem < cur_req_mem:
rec_req_mem = cur_req_mem
if rec_lim_mem < cur_lim_mem:
rec_lim_mem = cur_lim_mem
That rule is intentionally blunt. A restart might have nothing to do with memory, but if it might be an OOM kill, shaving memory is the wrong bet. CPU is less scary, so the script can still adjust it.
I also treat a few workloads as bursty. Jellyfin transcoding and Immich machine-learning jobs spend plenty of time looking idle, but their idle shape is not the shape I care about. They can still be upsized; they just do not get downsized because they happened to have a quiet week.
New deployments get similar caution. Normal upsizes and downsizes need the full 14 days of data. The exception is memory upsizing after restarts, because waiting two more weeks for a pod that keeps dying is not discipline. It is just delay.
Finally, the weekly PR stops at five changes. That keeps review boring in the good way. If something behaves differently after rollout, I only have a few suspects.
Node-fit simulation
My two nodes are not interchangeable. The miniPC has much more memory than the Pi, and most heavier workloads should be deployed to miniPC.
Before the script chooses what to put in a PR, it asks the Kubernetes API for live node capacity and current pod placement. Then it checks the projected request totals against a conservative budget: no more than 60% of allocatable CPU and 65% of allocatable memory, both for the cluster as a whole and for each node.
def check_fit(projected_by_node):
total_cpu, total_mem = totals(projected_by_node)
ok = True
# Cluster-wide budget
if total_cpu > cpu_budget_m or total_mem > mem_budget_mi:
ok = False
# Per-node budget
for name in node_alloc:
cpu = projected_by_node[name]["cpu_m"]
mem = projected_by_node[name]["mem_mi"]
if cpu > node_cpu_budget[name] or mem > node_mem_budget[name]:
ok = False
return ok
That catches the cases where a change looks fine globally but makes one node too tight. The budget is conservative on purpose. Kubernetes still needs room for the kubelet, system daemons, rollouts. So when an upsize does not fit, the planner looks for downsizes that would free enough room. It scores candidates by the overage they reduce, with extra weight for changes on the same node. If, for instance, Jellyfin needs more memory and the miniPC is already close to its budget, the PR might pair that upsize with safe reductions from other services like Radarr or Prowlarr. I wanted the PR to include the tradeoff context, not just say that more memory would be nice.
Patching and PR creation
Because almost everything in rangoonpulse is a Flux HelmRelease, the weekly job only has one place to touch: the YAML in git. Resource requests and limits live under values.controllers.main.containers.<name>.resources, so the job patches those files in GitHub, creates a branch, commits the changes, and opens a PR.

The PR body includes the policy constraints, a node-fit table, the selected changes, and the candidates that were skipped. I wrote it for tired-maintenance-me, not for the script that generated it.

After I merge, Flux reconciles the new HelmRelease values and the pods roll out with the updated resources. The next daily report sees the new baseline, and the loop starts again.
Observability
There is also a small exporter deployment. It polls the report ConfigMap and serves a readable summary, along with Prometheus metrics for recommendation counts, upsize/downsize actions, and current request utilization as a percentage of allocatable capacity.

Those metrics feed a Grafana dashboard. I mostly care about the trend. If budget utilization keeps creeping up, I know I am running out of room before Kubernetes tells me in a more dramatic way.

Why not VPA?
Vertical Pod Autoscaler was the obvious thing to compare against, but it did not fit how I run this cluster.
In Auto mode, VPA restarts pods to apply changes, which is not great for stateful or interactive services. It also works at the workload level rather than planning around my cluster’s capacity budgets and node placement. Most importantly for this setup, it mutates live resources. That fights the GitOps model I use everywhere else.
I wanted the slower thing with a paper trail. A PR is less automatic than a controller, but it gives me the diff, the reason, and the option to say no.
Results
After a few weekly runs, the noisy parts became obvious. Some of the *arr services were carrying requests they never came close to using. Jellyfin and a smaller set of heavier services needed more care, and the restart guard caught those cases before the script tried to “optimize” them in the wrong direction. Because each change is capped, the values move over a few cycles instead of swinging around after one sample window.
The win is more boring than the name makes it sound: I now get a small PR before resource drift turns into a weekend cleanup.
The whole thing is one Python file, a few RBAC manifests, and two CronJobs. No operator, no CRDs, no extra platform to run. For this homelab, that is about the right amount of machinery.