Skip to main content
Back to projects

Legal-domain RAG — −60% manual documentation, citations that hold

Per-jurisdiction retrieval with a deterministic citation validator. Every answer grounds in a real source. AWS Bedrock + Lambda + pgvector.

PythonAWS BedrockAWS Lambdapgvector · per-jurisdictionCitation validator
Corpus
15M documents · 5 jurisdictions
Manual documentation
−60% effort
Citations
deterministically validated, never from memory
Cost
serverless, usage-tracked

The problem

In legal work a fabricated or mis-attributed citation is malpractice. An answer grounded in the wrong jurisdiction is worse than no answer. Doing it by hand doesn't scale.

The solution

Per-jurisdiction pgvector indexes keep retrieval inside the correct legal regime. A deterministic citation validator verifies every cited span exists and supports the statement before generation returns. The model is never trusted to cite from memory. AWS Bedrock handles inference; Lambda keeps it serverless and cost-bounded.

fig. 01decision record
Constraint
A fabricated citation is malpractice. A cross-jurisdiction answer is worse than none. Manual documentation doesn't scale.
Decision
Isolate retrieval per jurisdiction so a query can't pull from the wrong regime. Gate generation behind a deterministic citation validator that confirms every cited span exists and supports the claim. Rejected a single shared index (cross-regime bleed) and trusting model-generated citations (hallucination risk).
Outcome
−60% manual documentation effort across a 15M-document, 5-jurisdiction corpus. Every citation is deterministically validated against the source, never generated from memory.

Overview

A legal-tech RAG pipeline over a 15M-document corpus spanning 5 jurisdictions. Each jurisdiction gets its own pgvector index, so retrieval never bleeds across legal regimes. A deterministic citation validator checks that every cited passage exists and supports the claim before the answer returns. Serverless on AWS Bedrock + Lambda, so cost tracks usage instead of idle capacity.