Back to projects

MLOps Coursework Project

End-to-End PyTorch Sentiment API

Built for MIDS W255, this project packages a DistilBERT sentiment model behind a production-style FastAPI service with Redis caching, containerized deployment, Kubernetes orchestration, and performance monitoring under sustained load.

MLOps sentiment API architecture overview

Problem

Move beyond notebook-level ML by deploying a reliable API that can handle real traffic, maintain low latency, and scale correctly in Kubernetes.

Approach

Implemented FastAPI endpoints with typed request/response schemas, baked model artifacts into the container image, added Redis caching, and deployed to AKS with service routing.

Outcome

Delivered a full MLOps lifecycle project with automated tests, containerized inference, and load-test observability using k6 and Grafana.

System Components

Inference Service

  • FastAPI app for batch sentiment prediction requests.
  • PyTorch + Hugging Face DistilBERT inference pipeline.
  • Pydantic models for strict input/output contracts.
  • Unit tests with pytest for endpoint and schema behavior.

Platform Layer

  • Docker image with model artifact included at build time.
  • Redis caching to improve throughput and repeated-request latency.
  • Kubernetes deployment on Azure AKS with service routing.
  • k6 load tests and Grafana dashboards for performance analysis.

Performance Artifacts

k6 and Grafana load test results artifact 1
k6 and Grafana load test results artifact 2

Artifacts