Multilingual education chatbot — grounded retrieval, PT/EN

A bilingual (PT/EN) education chatbot. I built the backend and data layer as an external consultant: grounded retrieval, frozen-eval regression checks, precision measured on held-out questions before every rollout. Azure OpenAI + LangChain + pgvector.

PythonAzure OpenAILangChainpgvectorFrozen-eval harness

My role: backend + data layer (consultant)
Scale: 699K+ enrollment records
Release gate: frozen-eval — no silent regression
Retrieval: grounded, bilingual PT/EN

The problem

A multilingual education assistant can't regress silently. A prompt or model change that helps one language can quietly degrade another, and nobody notices until the users do.

The solution

Grounded retrieval anchors answers to the knowledge base, not the model's memory. A frozen-eval suite measures precision on a held-out question set before any rollout, so a change that improves one cohort can't silently degrade another. Azure OpenAI for inference, pgvector for retrieval.

fig. 01 — decision record

Constraint: Across many languages, a change that helps one cohort can silently degrade another. You find out from the users.
Decision: Ground every answer in the knowledge base and gate rollouts behind a frozen evaluation set: precision measured on held-out questions before shipping. Rejected: ungated prompt iteration (silent regressions), ungrounded generation (hallucination at scale).
Outcome: The frozen-eval gate blocks silent regressions before they reach users. Precision is measured every rollout, never assumed.

Overview

A retrieval-augmented education chatbot over 699K+ enrollment records, bilingual PT/EN. I built the backend and data layer as an external consultant. Grounded retrieval keeps answers tied to the knowledge base, and a frozen evaluation set gates every rollout: precision gets measured on held-out questions before a change ships, so the knowledge base improves instead of regressing silently. Azure OpenAI + LangChain + pgvector under the hood.