GenAI Final Project

Persona-Aware RAG Assistant

Final project for UC Berkeley DATASCI 267. I designed and evaluated a retrieval-augmented generation system that answers the same query differently for engineering and marketing users, while staying grounded in source context.

Architecture diagram for the persona-aware RAG pipeline

Problem

Internal teams needed a Q&A assistant that could support technical research and non-technical marketing workflows from the same knowledge base.

Approach

Built a LangChain RAG pipeline with Qdrant retrieval, Cohere reranking, and persona-specific prompts to control depth, tone, and answer length.

Outcome

Improved quality over baseline runs with strong groundedness/relevance scores and clearer role-specific responses across validation examples.

Stack and Configuration

Retrieval Layer

Embedding model: multi-qa-mpnet-base-dot-v1
Chunking: paragraph splits (no overlap)
Qdrant MMR: k=20, fetch_k=30, lambda=0.5
Reranking: Cohere reranker (top_n=12)

Generation Layer

LLM: mistralai/Mistral-7B-Instruct-v0.2
Temperature: 0.4
Top-p: 0.95
Max new tokens: 500

Evaluation and Key Learnings

I evaluated outputs with persona-aware criteria: groundedness and relevance for both personas, plus accuracy for engineering responses and coherence for marketing responses. I also compared generated answers against gold responses with semantic similarity scoring.

The strongest setup combined MMR retrieval, reranking, and separate prompts. One practical takeaway was that similarity metrics can overrate answers when tone or technical depth diverges from user expectations, even when semantic content overlaps.

Artifacts

GitHub Repository Project README