Data Model
Modeling Strategy
The platform uses a medallion model with domain-scoped tables:
- Bronze: source-aligned ingestion tables
- Silver: normalized/cleaned business entities
- Gold: analytics-ready aggregates and serving entities
Primary domain currently implemented: odp_staffing_demand.
Lakehouse Entities (odp_staffing_demand)
Bronze
cbs_vacancy_rate_rawcbs_vacancy_rate_dim_sic2008cbs_vacancy_rate_dim_periodsadzuna_job_ads_rawuwv_open_match_raw
Silver
cbs_vacancy_rateadzuna_job_adsuwv_open_match
Gold
it_market_snapshotit_market_top_skills
Gold tables are exported to the warehouse for BI consumption.
Warehouse Serving Schema (odp_staffing_demand)
The Postgres pipeline creates and refreshes these serving tables:
it_market_snapshot
| Column | Type | Notes |
|---|---|---|
period_key | TEXT | Required |
period_label | TEXT | |
sector_name | TEXT | |
vacancies | DOUBLE PRECISION | |
vacancy_rate | DOUBLE PRECISION | |
job_ads_count | INTEGER | Required |
loaded_at | TIMESTAMPTZ | Default now() |
it_market_top_skills
| Column | Type | Notes |
|---|---|---|
skill | TEXT | Required |
count | INTEGER | Required |
loaded_at | TIMESTAMPTZ | Default now() |
it_market_region_distribution
| Column | Type | Notes |
|---|---|---|
region | TEXT | Required |
job_ads_count | INTEGER | Required |
share_pct | DOUBLE PRECISION | Required |
latitude | DOUBLE PRECISION | Required |
longitude | DOUBLE PRECISION | Required |
loaded_at | TIMESTAMPTZ | Default now() |
it_market_job_ads_geo
| Column | Type | Notes |
|---|---|---|
job_id | TEXT | Required |
region | TEXT | Required |
latitude | DOUBLE PRECISION | Required |
longitude | DOUBLE PRECISION | Required |
location_label | TEXT | |
loaded_at | TIMESTAMPTZ | Default now() |
dbt Parallel Model Layer
dbt/ provides SQL-native models over serving sources:
- Model:
job_market_snapshot<- sourceodp_staffing_demand.it_market_snapshot - Model:
job_market_top_skills<- sourceodp_staffing_demand.it_market_top_skills
This enables dbt testing/snapshots and supports parity checks against Python/Spark flows.
Governance Metadata
Schema and governance artifacts live in schema/:
warehouse.dbml: physical schema baselineglossary.yaml: business termsmetrics.yaml: canonical KPI definitionsdata_quality_rules.yaml: centralized rule definitionsstandards.md: modeling quality conventions
Data Quality and Contracts
Contract and policy checks are config-driven:
- Dataset contracts:
tests/configs/datasets/*.yml - Governance policies:
tests/configs/policies/governance_policies.yml - Environment behavior:
tests/configs/environments.yml
Execution paths:
make dq-check DATASET=odp_staffing_demand.job_market_snapshot
make qa-test
make test-e2e
Logical Lineage
flowchart LR
CBS["CBS OData"] --> B1["bronze.cbs_vacancy_rate_raw"]
Adzuna["Adzuna API"] --> B2["bronze.adzuna_job_ads_raw"]
UWV["UWV feed"] --> B3["bronze.uwv_open_match_raw"]
B1 --> S1["silver.cbs_vacancy_rate"]
B2 --> S2["silver.adzuna_job_ads"]
B3 --> S3["silver.uwv_open_match"]
S1 --> G1["gold.it_market_snapshot"]
S2 --> G1
S2 --> G2["gold.it_market_top_skills"]
G1 --> W1["warehouse.odp_staffing_demand.it_market_snapshot"]
G2 --> W2["warehouse.odp_staffing_demand.it_market_top_skills"]
W1 --> D1["dbt model: job_market_snapshot"]
W2 --> D2["dbt model: job_market_top_skills"]