Building AI Where the Playbook Doesn't Exist
The way people in AI talk about hard problems has always bothered me. It's always the frontier stuff: reasoning, agents, scaling laws. And those are hard. But they're hard in a way that's almost comfortable. You have tools. You have benchmarks. You have a thousand other people working on the same thing. You're renovating a house that's already standing.
I spent ten years building AI where there was no house at all. And the most important thing I learned is not any particular technique, but something about where the field's attention goes, and where it doesn't.
In 2016 I started building a natural language understanding engine for Burmese. Not a research project — a product. Businesses in Myanmar needed to understand what customers were telling them on Messenger, and nothing existed to do it. No NLP libraries. No tokenizer. No word vectors. No datasets. Not even papers. Fifty-four million speakers, centuries of written tradition, and as far as computational linguistics was concerned, the language might as well not have existed.
People who've only worked in English don't quite get what "no foundation" means. They hear it and think "limited tools." But it's worse than that. The prerequisites for the tools didn't exist.
Take tokenization. English gives you spaces between words for free. Burmese doesn't have spaces. A sentence is one long stream of characters, and figuring out where words begin and end is itself an open research problem. So before I could think about intent classification, I had to solve word segmentation. Before word segmentation, syllable segmentation. And before syllable segmentation, I had to deal with the fact that people type combining characters in whatever order they feel like.
That sounds minor, but it breaks everything. A Burmese syllable is assembled from combining characters — consonant, medials, vowels, tone marks — stacked in a specific sequence. People ignore the sequence. They type whatever comes to mind first. You end up with two strings that look identical on screen but are different bytes underneath. Your string comparison fails. Your search index misses. Your eval metrics are wrong and you don't know it until you're reading hex.
Then there's encoding. For years, over 90% of phones in Myanmar ran a nonstandard font encoding called Zawgyi instead of Unicode. Same code points, different meanings. You couldn't tell which encoding you were looking at without building a classifier just for that. So I built that too.
Everything English NLP gets for free, I had to make. And all of it had to work well enough that a shop owner messaging about a delivery never noticed any of it.
This eventually processed over a hundred million conversations for Samsung, Unilever, Coca-Cola, about 1,500 other businesses. None of it required any kind of ML breakthrough. It was all infrastructure — encoding detection, script normalization, syllable boundaries, error correction for how people actually type. Boring problems. Nobody wanted them. 1
I spent a long time puzzled by why nobody wanted them. Eventually I figured it out, and I think the answer is important.
When you work in English, or any well-resourced language, the foundation is so solid you forget it's there. You can build a useful product from off-the-shelf parts: a pre-trained model, an open tokenizer, a standard eval framework, a clean API. All your effort goes into the application layer. And after enough time working at that level, you start believing the application layer is where the hard problems are, because that's where all your effort goes.
The foundation becomes like gravity. You don't notice gravity.
This creates a systematic blind spot. The field undervalues the work of building foundations and overlooks the places where they're missing. Which is most of the world.
Here's what "missing" looks like concretely. When a big AI lab says it "supports" Burmese, that usually means the tokenizer doesn't crash. The actual tokenization is a disaster. Researchers at Oxford measured this: a Burmese sentence burns through nearly 12x more tokens than the same meaning in English on GPT-4's tokenizer, and up to 17x on older ones. Khmer runs about 9x, Lao nearly 10x. 2
Think about what that means economically. A developer in Yangon building a chatbot on GPT-4 pays over ten times more per conversation than a developer in Austin building the same thing in English. At that price, whole product categories don't work. You can't build customer service automation, or search, or translation tools. The economics kill them before they start.
And it compounds. When AI costs ten times more, you don't just spend more money. You try fewer things. You run fewer experiments. You learn slower. The gap widens with every new model, because every new model was trained and optimized for the languages that already had infrastructure.
There's a pattern here that I keep seeing in other places: domains where the expertise is all tacit and has never been encoded, markets where the economics are different enough that the standard playbook falls apart, whole categories of work where the prerequisite infrastructure just doesn't exist. The data isn't there. The annotations aren't there. The people who have the knowledge are too busy using it to stop and write it down.
I think the really enormous opportunity in AI right now isn't better models. It's building the missing infrastructure. We have systems of record for everything — every transaction, every status update, every log entry. All the "what." We have almost nothing for the "why." Why this decision got made. Why this case is an exception. Why you handle this situation differently. That reasoning layer was never treated as data. It was just how people did their jobs.
Building AI for Burmese trained me to see this. When you spend years in a place with no infrastructure, you develop a kind of x-ray vision for missing foundations. You walk into a new domain and where other people see application opportunities, you see the absent layer underneath — the thing that doesn't exist yet that everything else needs to sit on top of.
That's a valuable thing to be able to see. The people working on these missing layers don't get invited to give keynotes. Their work isn't on the leaderboards. But the hardest problems in AI have always been in these places. They're just not where everyone is looking.
1 For one insurance company we built a claims system that cut processing time from 45 minutes to about 15. Not because we did anything clever with models. We just took a workflow that ran on phone calls and paper forms and built the boring connective tissue that let it run on chat. That kind of work has no prestige in AI, but it's where the actual value was.
2 See Petrov et al., "Language Model Tokenizers Introduce Unfairness Between Languages" (ICML 2023). Newer tokenizers like GPT-4o's o200k_base have narrowed the gap for some scripts — Indic languages saw roughly 3-4x improvements — but the penalty for Burmese and similar languages remains substantial. The problem is in the tokenizer, and the tokenizer was designed for languages with spaces.