Why translating code is really about reconstructing lost theories — Graham Cunningham, CTO
You’d think translating COBOL to Java would be easy. They’re both programming languages, right? Just map the syntax and you’re done.
But if you’ve ever tried it, you know the truth: the hard part isn’t the language, it’s the meaning. Those COBOL programs aren’t just instructions for computers. They’re theories about how to run a business, encoded in ancient runes that few understand.
When you’re translating legacy code, you’re not just converting syntax. You’re trying to reconstruct the theories in the original programmers’ heads. And those theories exist nowhere in the code itself.
I was thinking about this in the context of AI-assisted translation systems. These systems usually have three components that mirror the three systems in the human brain:
The Chimp: a large language model that does rapid pattern matching, like our instinctive brain.
The Human: a server that parses code slowly and carefully, like our deliberative reasoning.
The Computer: a database of previous translations, like our stored knowledge.
What we need is a way to orchestrate these systems to work together, just like our three brains do.
I had a breakthrough when I started thinking about this in terms of semiotics. Programming languages are sign systems. But signs aren’t meanings. The same COBOL statement can mean different things in different contexts.
What matters is the interpretant, the implicit theory that links signs to meanings. In human communication, interpretants come from shared context. In legacy code, they come from the business logic in the programmer’s head.
So the real challenge in legacy translation isn’t converting signs. It’s reconstructing theories. You have to infer the original programmer’s intentions from artifacts, the code they left behind.
This is a fundamentally different kind of problem from what most AI tackles. It’s not about finding patterns in data. It’s about finding theories in artifacts.
Theory reconstruction is hard because most of the context is lost. The original programmers are gone. The documentation is outdated. The business has changed. You’re left with code that works, but no one knows why.
Most translation projects fail because they focus on syntax and ignore semantics. They generate Java that compiles but doesn’t capture the business logic. Then, six months later, someone needs to modify it and realizes the theory is gone.
The key insight is that we need to teach machines to do what humans do when we read code. We use rapid pattern matching to get oriented, slow, careful analysis to understand logic, and stored knowledge to infer intentions.
If we can architect translation systems around this insight, if we can build interpretant-reconstructing machines, then we have a shot at solving the legacy code problem for real.
In the next article, we’ll explore how these theoretical insights translate into practice through a deeper examination of the mental models that guide human code comprehension—and how we can teach machines to think the same way.
It’s not just about translating languages. It’s about translating theories. That’s a much more interesting challenge. And the potential impact is enormous.
But it’s going to take a fundamentally different approach than what the industry is doing today.
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |