Kubernetes Resource Rightsizer

What this skill does

Combines kubectl top snapshots with VerticalPodAutoscaler recommendations to suggest right-sized CPU and memory requests/limits per workload. Output is a YAML patch and a savings estimate.

Inputs

namespace: target namespace.
kubeconfig: path to a kubeconfig with read access.
Optional window_minutes: how long to sample kubectl top (default 30, sampled every 30s).
Optional safety_margin: percent above observed p95 (default 25).

Steps

Confirm metrics-server is reachable: kubectl top pods -n <namespace> must return rows.
Sample kubectl top pod -n <namespace> --containers --no-headers every 30s for window_minutes. Persist to /tmp/top-samples.tsv.
Compute per container: p50, p95, max for CPU (millicores) and memory (Mi).
Fetch any existing VPA recommendations: kubectl get vpa -n <namespace> -o json and pull recommendation.containerRecommendations.
For each container, propose:
- requests.cpu = max(p95_cpu, vpa_target_cpu) * (1 + safety_margin/100).
- requests.memory = max(p95_mem, vpa_target_mem) * (1 + safety_margin/100).
- limits.cpu = requests.cpu * 2 (or omit if the team uses cpu burst policy).
- limits.memory = requests.memory * 1.5.
Round CPU to nearest 50m and memory to nearest 32Mi.
Read each Deployment/StatefulSet manifest currently in the cluster: kubectl get deploy -n <namespace> -o yaml.
Diff existing requests/limits vs proposed; emit a YAML patch per workload.
Estimate savings: sum (current - proposed) requests across workloads and translate to node-cost savings using a configurable hourly cost.

Output

rightsize-report.md with a per-workload table (current vs proposed CPU/mem) and a YAML patches directory with one <workload>.patch.yaml per change. Stdout prints aggregate cluster savings estimate.

Verification

Apply the patch to one non-critical workload first via kubectl apply -f and watch kubectl describe pod for OOMKilled or throttling events for 24h. If observed throttling exceeds 5%, increase the safety margin and rerun. Compare actual usage post-rollout against proposals — if p95 usage now hits the new request, the proposal was too tight.

Edge cases

Workloads with bursty initialization (e.g., warmup): use max within the first 5 minutes for memory request; CPU can stay at p95.
StatefulSets with strict QoS requirements: keep limits == requests to enforce Guaranteed QoS.
Pods evicted during the sampling window: skip them, mark "insufficient samples".
Workloads without metrics-server data (CRD-based custom workloads): rely solely on VPA, otherwise abstain.

amitte/k8s-resource-rightsizer