Skip to main content
Back to projects

exec-job-board — multi-source data aggregation pipeline

Four divergent public APIs normalized behind one Pydantic schema, deduped by content hash, and served as a zero-backend static site — refreshed daily by cron.

PythonhttpxPydantic v2Next.js 16Fuse.jsGitHub ActionsDokku
Sources
4 public APIs, one schema
Pipeline cadence
daily 06:00 UTC
Search latency
< 200 ms client-side
Runtime cost
$0 (SSG)

The problem

Aggregating data across 4 third-party APIs with divergent response schemas, rate limits, and auth models — without a backend that has to be kept warm — is the canonical 'glue code nobody wants to maintain' problem.

The solution

One adapter per source isolates response-shape drift. Pydantic v2 enforces the unified schema at the seam. SHA-256 over (title, employer, location, posted_date) handles dedupe. A GitHub Actions cron runs the collector daily, commits the JSON, and redeploys the static site. The site falls back to a curated 30-row seed when the API output is missing — the demo never breaks.

Overview

An automated daily pipeline collects executive-tier listings from 4 public data APIs (JSearch, Adzuna, The Muse, USAJobs), normalizes them into a unified Pydantic schema, deduplicates via SHA-256 content hashing, and emits a single `jobs.json` consumed by a Next.js static site at build time. Fuse.js client-side fuzzy search + 4-dimension filtering, sub-200ms response, zero runtime backend cost.