Data Model

Modeling Strategy

The platform uses a medallion model with domain-scoped tables:

  • Bronze: source-aligned ingestion tables
  • Silver: normalized/cleaned business entities
  • Gold: analytics-ready aggregates and serving entities

Primary domain currently implemented: odp_staffing_demand.

Lakehouse Entities (odp_staffing_demand)

Bronze

  • cbs_vacancy_rate_raw
  • cbs_vacancy_rate_dim_sic2008
  • cbs_vacancy_rate_dim_periods
  • adzuna_job_ads_raw
  • uwv_open_match_raw

Silver

  • cbs_vacancy_rate
  • adzuna_job_ads
  • uwv_open_match

Gold

  • it_market_snapshot
  • it_market_top_skills

Gold tables are exported to the warehouse for BI consumption.

Warehouse Serving Schema (odp_staffing_demand)

The Postgres pipeline creates and refreshes these serving tables:

it_market_snapshot

ColumnTypeNotes
period_keyTEXTRequired
period_labelTEXT
sector_nameTEXT
vacanciesDOUBLE PRECISION
vacancy_rateDOUBLE PRECISION
job_ads_countINTEGERRequired
loaded_atTIMESTAMPTZDefault now()

it_market_top_skills

ColumnTypeNotes
skillTEXTRequired
countINTEGERRequired
loaded_atTIMESTAMPTZDefault now()

it_market_region_distribution

ColumnTypeNotes
regionTEXTRequired
job_ads_countINTEGERRequired
share_pctDOUBLE PRECISIONRequired
latitudeDOUBLE PRECISIONRequired
longitudeDOUBLE PRECISIONRequired
loaded_atTIMESTAMPTZDefault now()

it_market_job_ads_geo

ColumnTypeNotes
job_idTEXTRequired
regionTEXTRequired
latitudeDOUBLE PRECISIONRequired
longitudeDOUBLE PRECISIONRequired
location_labelTEXT
loaded_atTIMESTAMPTZDefault now()

dbt Parallel Model Layer

dbt/ provides SQL-native models over serving sources:

  • Model: job_market_snapshot <- source odp_staffing_demand.it_market_snapshot
  • Model: job_market_top_skills <- source odp_staffing_demand.it_market_top_skills

This enables dbt testing/snapshots and supports parity checks against Python/Spark flows.

Governance Metadata

Schema and governance artifacts live in schema/:

  • warehouse.dbml: physical schema baseline
  • glossary.yaml: business terms
  • metrics.yaml: canonical KPI definitions
  • data_quality_rules.yaml: centralized rule definitions
  • standards.md: modeling quality conventions

Data Quality and Contracts

Contract and policy checks are config-driven:

  • Dataset contracts: tests/configs/datasets/*.yml
  • Governance policies: tests/configs/policies/governance_policies.yml
  • Environment behavior: tests/configs/environments.yml

Execution paths:

make dq-check DATASET=odp_staffing_demand.job_market_snapshot
make qa-test
make test-e2e

Logical Lineage

flowchart LR
  CBS["CBS OData"] --> B1["bronze.cbs_vacancy_rate_raw"]
  Adzuna["Adzuna API"] --> B2["bronze.adzuna_job_ads_raw"]
  UWV["UWV feed"] --> B3["bronze.uwv_open_match_raw"]

  B1 --> S1["silver.cbs_vacancy_rate"]
  B2 --> S2["silver.adzuna_job_ads"]
  B3 --> S3["silver.uwv_open_match"]

  S1 --> G1["gold.it_market_snapshot"]
  S2 --> G1
  S2 --> G2["gold.it_market_top_skills"]

  G1 --> W1["warehouse.odp_staffing_demand.it_market_snapshot"]
  G2 --> W2["warehouse.odp_staffing_demand.it_market_top_skills"]

  W1 --> D1["dbt model: job_market_snapshot"]
  W2 --> D2["dbt model: job_market_top_skills"]