CDGC → lakehouse → consumption
How trust signals move across the modern data stack
End-to-end flow
DQ Execution in CDGC
CDGC profiles data, validates rules, and computes dataset and rule-level scores.
Score Extraction
Extract results through APIs, CDI pipelines, or event-driven triggers for downstream processing.
Bronze Landing
Land raw DQ payloads in cloud object storage for traceability and replayability.
Databricks Standardization
Normalize CDGC payloads into a standard enterprise score model.
Unity Catalog Persistence
Persist curated score history into governed Delta tables inside Unity Catalog.
Governance Enrichment
Add tags, lineage, and metadata context to strengthen discoverability and trust.
Business Consumption
Expose scorecards to business, governance, and engineering teams.
Source → standardize → govern
Source → Standardize → Govern → Consume
- Source system — CDGC runs rules and calculates scores
- Integration layer — APIs or CDI move results to the lakehouse
- Databricks layer — Transforms raw payloads into enterprise score models
- Governance layer — Unity Catalog stores governed Delta tables and tags
- Consumption layer — Dashboards and scorecards expose trust signals
Detailed process
The pattern below is designed for enterprise-scale implementations where Informatica remains the system of execution for quality rules, while Databricks and Unity Catalog provide standardization, governed persistence, and consumption.
DQ Execution in CDGC
CDGC profiles data, validates rules, and computes dataset and rule-level scores.
- Profiling and validation
- Dimension-level scores
- Results stored in Informatica repositories
Score Extraction
Extract results through APIs, CDI pipelines, or event-driven triggers for downstream processing.
- REST API extraction
- CDI push to storage
- Optional event/webhook pattern
Bronze Landing
Land raw DQ payloads in cloud object storage for traceability and replayability.
- ADLS / S3 / GCS
- JSON or Parquet
- Partition by run date and source
Databricks Standardization
Normalize CDGC payloads into a standard enterprise score model.
- Map score dimensions
- Align domain semantics
- Prepare silver and gold models
Unity Catalog Persistence
Persist curated score history into governed Delta tables inside Unity Catalog.
- Delta tables in UC
- Historical tracking
- Reusable enterprise score layer
Governance Enrichment
Add tags, lineage, and metadata context to strengthen discoverability and trust.
- Quality status tags
- Lineage visibility
- Stewardship-ready metadata
Business Consumption
Expose scorecards to business, governance, and engineering teams.
- Databricks SQL dashboards
- BI scorecards
- Trend and SLA monitoring
DrivenByData at a glance
CDGC remains the execution engine for quality rules. Databricks becomes the integration and standardization layer. Unity Catalog becomes the governed trust layer for enterprise-wide visibility.
Code examples
Extract from CDGC API
import requests
url = "https://<informatica-instance>/api/v2/dq/results"
headers = {"Authorization": "Bearer <token>"}
response = requests.get(url, headers=headers)
payload = response.json()
Standardize in Databricks
df_clean = df_raw.selectExpr(
"assetName as table_name",
"score as overall_score",
"dimensionScores.completeness as completeness_score",
"dimensionScores.accuracy as accuracy_score",
"runDate as score_date"
)
Persist in Unity Catalog
CREATE TABLE governance.dq.table_quality_scores (
catalog_name STRING,
schema_name STRING,
table_name STRING,
score_date DATE,
overall_score DECIMAL(5,2),
status STRING,
source_system STRING,
run_id STRING
) USING DELTA;
Apply tags
ALTER TABLE main.sales.customer
SET TAGS (
'dq_status' = 'gold',
'dq_score' = '97'
);
What CDGC owns
Rule execution, profiling logic, score calculation, and source-level quality assessment.
What Databricks owns
Ingestion, transformation, standardization, historical persistence, and downstream analytics.
What Unity Catalog owns
Governance, access control, discoverability, metadata context, and trust signaling across the lakehouse.
About the author
Subramanian Gopalkrishnan is a Data Governance and Data Engineering leader with 18+ years of experience across regulated industries, helping enterprises build trusted, modern data ecosystems across cloud, analytics, governance, and AI transformation.