Problem
General-purpose code models often perform poorly on Kubernetes workflows, and string-based metrics fail to capture whether generated YAML actually works in real clusters.
MIDS Capstone Thesis
KubeBench is a Kubernetes-native benchmark and serving platform for code-generating LLMs. The project combines domain fine-tuning, runtime validation against real clusters, and cloud deployment infrastructure to evaluate and serve practical DevOps assistants.
Problem
General-purpose code models often perform poorly on Kubernetes workflows, and string-based metrics fail to capture whether generated YAML actually works in real clusters.
Approach
Built a domain benchmark that evaluates generated manifests through operational checks against Kubernetes APIs, then combined those checks with quality scoring and model comparisons.
Outcome
Produced an end-to-end pipeline for data, fine-tuning, runtime evaluation, and deployment, plus a web interface to interact with specialized Kubernetes models.
The serving stack uses a two-tier architecture: a FastAPI proxy layer on DigitalOcean App Platform for request orchestration and validation, plus a GPU-backed FastAPI model server on GCP for inference.
Infrastructure and deployment are automated through Terraform modules and GitHub Actions, enabling repeatable provisioning and model service rollout.