Will We Never Learn?

Every decade, mainframe application modernization elevates another false prophet. And every decade, the false prophet fails in the same way.

Rules extraction failed in the 1990s. Lift-and-shift in the 2000s failed to move the modernization needle. Model-driven generation failed to deliver at scale in the 2010s. Many of today’s refactoring and language-translation tools still trace their lineage to these foundations.

Today, the savior is behavioral observation. Instrument the running system. Record inputs and outputs over some period of time. Treat the captured data flows as a specification. Generate new code that replicates what was observed.

The technology changes. The abstraction model changes. The failures persist.

Every abstraction silently destroys critical information in the gap between the old system and the new one.

The Silent Filter

It is easy to see why you might think behavioral observation is the most compelling version of this pattern yet. It uses actual production data rather than retired subject matter experts or outdated documentation. It sidesteps the perceived talent crisis. It is incremental.

The fundamental flaw? It cannot tell you what it did not capture.

A system that has been in production for 30 or 40 years encodes decades of institutional logic. Record a month of production data flows, and you have captured a month of behavior, but not the holistic system nor the theory behind the system.

The logic that handles a loan type not currently being originated? Not captured. The disaster recovery path added after a specific regulatory finding in 2007? Not captured. The wartime emergency payment handler that has not been triggered since 1982 but remains critical for national contingency planning? Not captured.

During Y2K remediation, the UK Department of Social Security discovered dormant code for wartime emergency payments that had not been activated since the Falklands War. Programmers found it only because they examined every line of code for date dependencies. No observation window would have surfaced it.

This is not a temporary limitation waiting for larger context windows. Even practitioners who build modernization workflows around AI tools acknowledge the structural constraint.

Thoughtworks, in a recent assessment of Claude Code’s modernization capabilities, noted that current models cannot ingest tens of millions of lines of code simultaneously and reliably enough to draw a modernization roadmap — requiring external scaffolding, static analysis, and human-in-the-loop engineering to compensate.

The gap is architectural, not incremental.

The first time any of these dormant paths is activated in the new system, the system fails. Not because the translation was wrong, although that happens too, but because the specification was incomplete. It was always going to be incomplete.

From One Black Box To Another

Organizations modernize because they are stuck. They have systems that work, but nobody fully understands them, nobody can confidently change them, and they constrain the business. The goal is not just to run the same workloads on cheaper infrastructure. It is to regain the ability to build, extend, and innovate.

Behavioral observation does not deliver that. It delivers probabilistic AI-generated code that produces the same outputs for the inputs that were observed. Your developers did not write it. They do not understand the reasoning behind it. They cannot look at an interest calculation routine and know which parts are regulatory requirements, which are bug-compatibility workarounds from a 1990s platform migration, and which are fossilized business logic from an acquisition that closed before they were born.

Peter Naur (of Backus-Naur form) understood this in 1985 when he wrote “Programming as Theory Building.” A program is not its behavior. A program is the theory that produced the behavior.

Observing the behavior cannot reconstruct the theory. Only the code itself can.

And behavioral observation destroys that code. Once the mainframe is decommissioned, the source that contained the actual business logic is gone. Permanently. The behavioral test corpus tells you what the system did for observed inputs, but it cannot tell anyone why. You have erased institutional knowledge, and you have destroyed the only artifact that contained it.

The Coexistence Trap

Beyond comprehension, there is an immediate practical danger during the migration itself.

Incremental cutover sounds prudent. Migrate one function at a time, run both systems in parallel, confirm the new one works, and move on. It is essentially the strangler fig pattern, and it works well for stateless services where you can simply route a request to either system.

However, mainframe application workloads are not stateless services. Batch jobs modify shared databases, VSAM files, IMS DB segments, and message queues. Multiple jobs read and write the same data stores in carefully sequenced runs. Every function cut over to the new platform that still interacts with data on the mainframe requires a synchronization bridge. Every function still on the mainframe that depends on data now modified by the new platform requires another.

You are not reducing risk incrementally. You are accumulating it in the bridging layer: change data capture, synchronization layers, dual-write mechanisms. This layer is itself brittle, difficult to test, and a reliable source of subtle data inconsistency.

The more you cut over, the more fragile the bridge becomes. At some point, the bridging infrastructure is as opaque as the system you set out to replace.

The Performance Gap

Mainframe performance is not accidental. It is the product of decades of tuning: sort algorithms optimized for specific data volumes, I/O patterns designed around channel architecture, batch sequencing that minimizes contention on shared data stores, buffer pool configurations refined over years of production operation.

What makes the system fast is not the results it produces. It is the computational structure that produces them.

Behavioral observation captures the results. It does not, and cannot, capture the computational structure. When AI regenerates an implementation from observed behavior, it has no understanding of why the original was fast. It produces functionally correct outputs using whatever algorithmic approach the model happens to generate.

That distinction might not matter if the target platform worked the same way. It does not. The original code was shaped, often implicitly, through decades of tuning, to exploit the mainframe’s specific capabilities. A behavioral replica running on horizontally scaled, network-connected infrastructure must achieve comparable throughput on a completely different architecture, without any knowledge of what made the original performant.

For high-throughput workloads that are typically computationally intensive and process billions of database interactions each day, the AI-generated code will almost certainly underperform.

And when it does, you are performance-tuning in the dark. You have no structural blueprint to guide optimization. You have outputs and nothing more.

Performance equivalence is not a nice-to-have. It is a hard requirement. Behavioral observation gives you no foundation to achieve it.

The Solution Is Not More Observation

The answer is not a wider observation window. It is not three months instead of one. It is not a set of more sophisticated capture agents. The answer is to stop interposing abstractions between source and target entirely.

If every abstraction between source and target destroys information, the answer is a transformation that introduces no abstraction at all.

A deterministic compiler does not observe behavior. It does not extract rules. It does not sample data flows. It transforms source code into target code with provable semantic equivalence. Every code path is preserved, including the ones that have not executed in years. Every edge case handler, every regulatory layer, every piece of dormant logic that represents institutional knowledge accumulated over decades. Nothing passes through a filter. Nothing is lost.

It preserves computational structure. Sort logic, I/O sequencing, batch flow, and algorithmic shape. You are not hoping an AI reimagines an efficient implementation. You are transforming the existing business processes into structurally equivalent ones. That is a fundamentally different starting point for performance engineering on the target platform.

There Are No Shortcuts

You cannot observe a system into understanding. You cannot sample your way to completeness. You cannot abstract without loss.

The organizations that will successfully modernize their mainframe application estates are the ones that stop trying to shortcut around the code and start working through it. Deterministically, at compiler speed, with guaranteed accuracy, and zero information loss.

Everything else is an abstraction tax you pay in risk. And somewhere in your system, there is a wartime payments handler waiting to prove it.