MLOps Coursework Project

End-to-End PyTorch Sentiment API

Built for MIDS W255, this project packages a DistilBERT sentiment model behind a production-style FastAPI service with Redis caching, containerized deployment, Kubernetes orchestration, and performance monitoring under sustained load.

MLOps sentiment API architecture overview

Problem

Move beyond notebook-level ML by deploying a reliable API that can handle real traffic, maintain low latency, and scale correctly in Kubernetes.

Approach

Implemented FastAPI endpoints with typed request/response schemas, baked model artifacts into the container image, added Redis caching, and deployed to AKS with service routing.

Outcome

Delivered a full MLOps lifecycle project with automated tests, containerized inference, and load-test observability using k6 and Grafana.

System Components

Inference Service

FastAPI app for batch sentiment prediction requests.
PyTorch + Hugging Face DistilBERT inference pipeline.
Pydantic models for strict input/output contracts.
Unit tests with pytest for endpoint and schema behavior.

Platform Layer

Docker image with model artifact included at build time.
Redis caching to improve throughput and repeated-request latency.
Kubernetes deployment on Azure AKS with service routing.
k6 load tests and Grafana dashboards for performance analysis.

Performance Artifacts

k6 and Grafana load test results artifact 1

k6 and Grafana load test results artifact 2

Artifacts

GitHub Repository