Data Quality Is Not Absolute Anymore

Static certification vs. contextual guarantee

Why the old definition breaks

Traditional data quality assumed that a dataset could be inspected, certified, and trusted as a finished object. That model worked when data moved in slower batch cycles and governance lived primarily around warehouses, reports, and curated master records.

Modern platforms changed the operating model. In cloud-native pipelines, medallion architectures, dynamic transformations, AI-assisted workflows, and distributed data products, quality is no longer a static label. It is a governed guarantee that depends on:

Context — who is using the data and for what decision
Time — whether it is reliable inside the required decision window
Transformation logic — what happened between source and outcome
Provenance — whether the trust signals can be explained and traced

My view: the next phase of governance is not about cataloging more assets. It is about governing the quality of transformation itself.

Static · Dynamic · In-flight · Trust

Old model

Static

Validate after data lands

New model

Dynamic

Validate while data moves

Primary gap

In-flight

Intermediate states are barely governed

Leadership need

Trust

Explainable, auditable, context-aware

Informatica · Databricks · Unity Catalog

In enterprise programs spanning Informatica, Databricks, Unity Catalog, metadata governance, and stewardship operating models, I repeatedly see that organizations are strong at assessing outputs but weak at governing the transformations that create them.

Traditional vs. modern model

The shift from static quality to contextual trust

The point is not that standards like accuracy and completeness stop mattering. The point is that they no longer operate alone. They only become trustworthy when attached to decision context, timing, and explainable transformation history.

Endpoints and scorecards

We validate the endpoints

Most enterprise scorecards still focus on source profiling or published dashboards. That creates a false sense of coverage. The riskiest defects often emerge between those checkpoints.

Transformation path

We ignore transformation integrity

Business rules can be correctly designed and still behave incorrectly under real workload conditions, schema drift, late-arriving records, or orchestration gaps.

Generative outputs

We trust AI outputs too quickly

AI-generated or AI-shaped data introduces probabilistic behavior. That means governance must look beyond validity and into provenance, traceability, and semantic fit.

“The same dataset can be high quality for one use case and unsafe for another. That does not weaken governance. It makes governance more honest.”

Transformed truth

The missing dimension is transformation integrity

Enterprises have spent years institutionalizing the classic dimensions: accuracy, completeness, consistency, validity, uniqueness, timeliness. Those are still foundational. But in modern architectures, they are not enough on their own because the business does not consume raw source truth. It consumes transformed truth.

That means the actual governance question becomes:

Can I prove that the transformation logic, orchestration path, and decision window preserved the level of trust required for this specific use case?

This is exactly where data governance and data engineering need to stop operating as parallel functions. In my work, the strongest implementations are the ones where stewardship expectations, metadata semantics, pipeline logic, and access governance all converge into one operational trust layer.

Integrity signals

Transformation integrity checks

Rule execution coverage across critical transformation steps
Volume reconciliation before and after business rule application
Late-arriving data tolerance and business impact thresholds
Semantic drift between source meaning and output meaning
Lineage completeness for critical decision datasets
Human stewardship checkpoints for exception pathways

Informatica quality rules Databricks medallion flow Unity Catalog trust signals Metadata-driven accountability

Lifecycle risk map

Where modern data quality actually fails

This is the exact gap I see in enterprise programs: quality frameworks are mature at the source and visible at the endpoint, but transformation behavior, time-sensitive reconciliation, and in-flight drift are governed inconsistently.

Area	Traditional assumption	Modern reality	What leaders should do
Completeness	All required fields are present	Completeness depends on event time, late-arriving data, and decision cutoff	Define completeness with an explicit time window and allowable lag
Accuracy	Source-to-target match proves correctness	Transformation logic can reshape data without obvious defects	Measure transformation integrity, not just source conformance
Consistency	Systems agree after batch load	Distributed pipelines can temporarily disagree while still being operationally valid	Differentiate transient inconsistency from control failure
Trust	A certified dataset is trustworthy	Trust now depends on lineage, contracts, runtime behavior, and context	Publish trust signals alongside the data, not after the fact
AI-readiness	Schema-valid data is sufficient	AI workflows require provenance, prompt awareness, and semantic controls	Treat generated outputs as governed artifacts

Context · Motion · Metadata

The contextual trust framework I would apply

This is the model I believe modern enterprises need—especially those already investing in governed lakehouse architectures, metadata platforms, and enterprise-quality operating models.

1

Define the decision context

Start with the business decision, not the table. The trust threshold for regulatory reporting, supply chain alerts, and executive analytics should not be assumed to be the same.

2

Govern the data in motion

Introduce in-flight checkpoints inside pipelines, not only after publishing. Intermediate datasets should have explicit validation and exception behavior.

3

Publish trust as metadata

Trust signals should travel with the asset: quality score, steward, lineage state, transformation version, freshness window, and policy context.

Where I run this pattern

This model aligns naturally with the kinds of platforms I work across: Informatica for quality execution and metadata governance, Databricks for scalable transformation and data product implementation, and Unity Catalog for governed access, discoverability, and reusable trust signaling.

Rules, profiling, stewardship

Informatica

Remains powerful for rule execution, profiling, stewardship workflows, glossary alignment, and metadata-driven accountability. But its real value expands when those trust outputs are operationalized downstream.

Medallion & in-flight DQ

Databricks

Provides the transformation and standardization layer where in-flight quality must be observed, measured, reconciled, and persisted in a way the enterprise can act on.

Trust signals at consumption

Unity Catalog

Becomes more valuable when governance is not just about access and lineage, but about surfacing trustworthy context that consumers can actually interpret at the point of use.

The real evolution in data quality is not from rules to dashboards. It is from passive measurement to operational trust engineering. That is the space where governance, quality, metadata, and platform design finally start working as one discipline.

About the author

I help organizations turn governance from a policy layer into an operating model—connecting data quality, metadata, stewardship, platform architecture, and trusted consumption across modern cloud ecosystems.

My work has consistently focused on the point where business trust breaks down: not only in bad source data, but in weak transformation controls, disconnected metadata, and ungoverned decision pipelines.

Connect Read My Story More Writings