Why Data Quality Isn't the Bottleneck It Used to Be

If you have been within fifty meters of an ERP rollout in the last three decades, you have heard the sentence: 'Garbage in, garbage out.' It was the iron law of industrial software. Before you could trust the system, the master data had to be impeccable. Routings had to be exact. Setup times had to be measured. BOMs had to be reconciled. The planning logic was deterministic. It processed exactly what you fed it, and if you fed it wrong, it gave you wrong, faster.

So the ritual went like this: a six- or twelve- or twenty-four-month master-data project, before the actual system could go live. Consultants. Workshops. Excel sheets walked from one office to the next. Operators interviewed about how long a changeover really takes. And then, after all that, the system would go live with data that was correct for about three months, until the world moved on.

This was the unsolved problem at the heart of industrial software. The system was rigid; reality was not. The data was a snapshot; the floor was a stream. Nothing in the standard MRP/APS architecture had any mechanism to keep the two in sync after launch.

Four data sources reporting different setup times are reconciled by a data agent into a single resolved value, with the outlier flagged. — Truth becomes a consensus across several noisy signals, not a single field.

What Has Actually Changed

Several things changed quietly, and at roughly the same time.

First, language models read messy inputs. An ill-named field, a German-English-abbreviation mix, two part numbers for the same thing with no explanation: none of these are blockers anymore. Given the context of the surrounding system, an LLM disambiguates, normalizes, and reconciles in ways no deterministic ETL pipeline could. The cost of interpreting dirty data has collapsed.

Second, agentic systems triangulate. A classical APS asks what the setup matrix says. An agentic system asks what the setup matrix says, what the MES history shows, what the operator logged in the comment field, and what similar parts on similar machines actually take. When three of those agree and one is wrong, the system flags the outlier and proposes a correction. Truth becomes a consensus the system assembles from several noisy signals.

Third, data agents repair master data as a by-product of operation. In an agentic twin, every plan that runs, every schedule that executes, and every disruption that gets handled generates evidence. The system surfaces the discrepancies as patches, often automatically.

Setups booked at a nominal 45 minutes that consistently land at 32.
Routings that exist in the ERP but have not run in two years.
Qualifications listed in the matrix that operators say they never use.

Master-data quality stops being a project and becomes a continuous output of running production.

What This Means for the Old Advice

The advice was never wrong, given the tools of the time. With a rigid system, you needed rigid inputs. The constraint was real.

What changed is the system. An agentic twin does not require pristine inputs because it does not assume them. It treats data as evidence that is multi-source, sometimes contradictory, and always incomplete, then applies judgment the way a senior planner does.

This has a concrete consequence. You no longer need to finish a master-data project before you deploy intelligent planning. You start with what you have, the system gets to work, surfaces gaps, and proposes fixes, and the data improves as a function of being used. The order of operations inverts, so the data gets cleaned through use rather than ahead of it.

For operations leaders this is genuinely new. The risk-adjusted timeline of a 'real' deployment used to be eighteen months minimum. Most of that was data. If your supplier's pitch still leads with 'first we cleanse your master data,' they are still selling you 2005's architecture.

What Didn't Change

A few things did not change, and saying so honestly matters.

Critical safety, traceability, and regulatory data still has to be exact. An LLM will not disambiguate a torque spec or a calibration record. The new tolerance for messiness covers operational data such as setup times, throughputs, qualifications, and routing variants. It does not extend to anything that touches certification.

Boundary conditions still need to be modeled. If a machine physically cannot run a part, the agent should not propose that it does. The structure of the world is real. The tolerance for noise lives in the parameters, and the constraints still have to be right.

Within those limits, the rule that defined the first thirty years of industrial software is genuinely retiring. The next decade's leaders will be the ones who stop running master-data projects as preconditions and start running them as outputs.

This shift is core to how Zentio is built. The Data agent treats master data as a continuous concern, cleaning, reconciling, and surfacing gaps as a side effect of every shift it plans. We built it this way because no manufacturer we have worked with had clean enough data to deploy on the old terms, so the system had to meet them where they are.

Why Data Quality Isn't the Bottleneck It Used to Be

What Has Actually Changed

What This Means for the Old Advice

What Didn't Change

What an Agentic Twin Is

Why Real-Time Rescheduling Is the New Floor for High-Mix Manufacturing

How European Manufacturing Can Come Out on Top

Start with a proof of value on your own data.