Less Engineering, More AI: Lessons from Building GenAI Applications

tl;dr:

  • Today’s AI systems lack crucial capabilities (e.g. memory, learning from experience, …)
  • Developers find themselves repeatedly engineering workaround solutions for missing capabilities
  • Workaround solutions are short-lived, as they are quickly replaced by superior end-to-end versions.
  • The engineering effort required to build AI products drops as more capabilities are built directly into the models.

After 1.5 years of developing AI solutions, one conclusion has become increasingly clear: any engineering effort we put in today can quickly become outdated.

Lukas Peterson recently expressed a similar opinion in his blog post (https://lukaspetersson.com/blog/2025/bitter-vertical/), which served as inspiration for my own reflections. Let me share some experiences from projects my team and I delivered for clients at the AI consultancy I co-founded:

Insurance Claims Analysis

One of our clients, an insurance provider, had a team that manually reviewed long documents (20–500 pages) to assess claims. Our mission was to create a system that automatically extracts key information relevant to assessing the validity of claims.

We first tried a RAG approach, but we ended up designing a multi-stage, iterative AI pipeline to ensure the accuracy and completeness of extracted information. Building this solution took about three months. Just as we finished, OpenAI released “DeepResearch”. With this new feature, our heavily engineered AI pipeline would have been unnecessary. We could have built the same tool in just a few weeks with even better accuracy.

Key takeaway: the moment a more advanced AI feature or model arrives, any hand-engineered workaround is rapidly overshadowed.

Head Hunting Tool

A headhunting agency tasked us to build a tool that identifies potential candidates for a specific position and conducts extensive web research to collect key data points such as age, languages, experience, expertise, and seniority. It then evaluates each candidate against the criteria set by the headhunter.

Again, we missed the advantage of DeepResearch. We had to create our own version, which was not as good as the end-to-end solution from OpenAI.

Accounting Automation

Several accounting firms asked us to build AI tools that would automate booking of invoices, bank receipts, credit cards, etc. We started to develop AI solutions that plug into their existing accounting software and workflows. These solutions helped boost productivity, but the impact was limited.

Why? It turns out accountants spend most of their time on a small subset of particularly challenging cases. While 90% of invoices are straightforward and the AI's booking suggestions can be accepted as-is, the remaining 10% are much more complex. For example:

  • Invoices from foreign countries with complex tax implications
  • Missing information about a purchase → Accountant must track down and contact the person who was responsible for the purchase
  • Missing invoices → Accountant goes on a paper trail hunt

Truly automating accounting means solving the hard 10%. I believe developing such a system with today’s AI is quite challenging and would require a considerable amount of engineering.

The Crucial Capabilities for AI Utility

Here is a table of capabilities that I found myself engineering workarounds for again and again, simply because these capabilities were or still are not integrated end-to-end into AI models. Clearly, many other developers face the same limitations, which is why platforms like LangChain have emerged.


Capability When It's Needed Example in Accounting Workaround Solutions End-To-End AI Solutions
Reasoning Any non-trivial task (e.g., coding, math, medicine) Complex decision, e.g. choosing the correct tax account. Manual chain-of-thought prompting Reasoning models (o1, grok-3, R1, …)
(Re)search Finding specific information in vast amounts of data Finding the correct invoice related to a given bank transaction. (agentic) RAG Deep research (OpenAI)
Memory Retaining context beyond the model's context length AI agent asks a human for context about a purchase. If the same purchase occurs again next month, the AI should remember the conversation from last month instead of asking again. - Summarizing/compressing past messages

- Detecting key pieces of information and writing it to an external memory (e.g., a text file) that is later reinjected into the context (ChatGPT’s memory feature works like this)
- Infinite context

- Memory augmented models

- Mamba architecture
Learning from Experience As I mentioned in a previous post, there exists a kind of knowledge (tacit knowledge) that cannot simply be captured in words. Usually, this is knowledge that comes through experience or from trial and error. AI makes a mistake in a booking, which gets noticed and corrected by a human. - Reflection prompting + adding result to memory

- Fine-tuning
- End-to-end version of reflection prompting and memorization

- Inference-time learning
Computer Use Automating tasks where the involved software tools lack complete APIs. Many (older) accounting systems offer no API. Feeding screenshots to vision models and prompting them to come up with the next best action. Models specifically trained with RL for computer use, e.g. OpenAI Operator, Anthropic Computer Use
Learning from Demonstration Learning to complete a task by getting a step-by-step explanation from the human (e.g. via screen share and audio) Accountant demonstrates to the AI where to archive a receipt and which naming conventions to use for the files. One model analyzes the human demo and outputs a text description which is fed into the context of another model which performs the task regularly. Probably automatically solved by having end-to-end memory and learning from experience.

My guess is that model providers are racing to create AIs that incorporate all of the above capabilities natively. This means the continuation of the following two trends that I have observed so far:

  1. Workaround solutions are short-lived, as they are quickly replaced by superior end-to-end versions.
  2. The engineering effort required to build AI products drops as more capabilities are built directly into the models.

What does this mean for AI application layer startups?

I think most AI startups today are essentially building these 3 things:

  • A nice UI
  • (Boilerplate) backend code (databases, APIs, workflow logic, …)
  • Engineering workarounds for missing AI capabilities

What if all of the capabilities I listed above become available out of the box, which is likely the case in 1-2 years, does that mean AI application layer startups are doomed? What we know is that in the short term these startups provide significant value. Take Cursor, for example. I use it daily, and it has made me ~5× faster at coding. However, once capabilities like computer use become both reliable and cheap enough, tools like Cursor will be obsolete. Instead, I will simply task my computer use AI agent to code directly using my favorite development environment.

So, how can AI application layer startups remain relevant? 

One way I see is to leverage proprietary data gathered during this short transitional period, when end-to-end AI agents were not yet available. With that unique data, they could either enhance future third party AI agents or build their own AI agents from scratch. The feasibility of the latter option hinges on whether capable AI eventually becomes a commodity, which currently does seem possible.

But there is one caveat: Emerging research increasingly suggests that pure RL approaches to train agents may eventually outperform hybrid methods that incorporate data with human demonstrations. It remains to be seen just how valuable proprietary data will ultimately be.

Conclusion

The AI landscape is changing fast. My experience over the past 1.5 years suggests that while engineering workarounds can bridge today’s gaps, the future belongs to AI systems that integrate crucial capabilities end-to-end. Less manual engineering isn’t just a design preference, it’s essential for unlocking AI’s full potential. In the long run, only true end-to-end AI solutions will thrive, while today’s patchwork approaches will quickly become relics of the past.

For me the key question still remains: How can we build lasting value when foundation models become more and more powerful?

OpenAI's Operator


I just tested OpenAI’s long-awaited AI agents (called Operator), and I’m impressed. In the video, I tasked the agent to create a new user on our Microsoft 365 as part of onboarding a new employee — it worked like a charm.

Why does this matter?
So far, GenAI has been limited to chatbots, like having a highly knowledgeable person locked in a basement with access only to chat. Useful, but limited. AI agents change the game: they can interact with any digital system (Email, MS office, ERP, CRM, …).

To put it simply:
Chatbots = Remote advisors
Agents = Remote employees

OpenAI’s “Operator” is still an early research preview with many limitations (e.g. it runs only in a virtual browser, it is not allowed to use user+password information to login autonomously, …). But at the current pace, I expect many limitations gone in 2–3 months.

Implications for Businesses: I am convinced that any company which fails to implement this technology ASAP, will have zero chance to compete and is going out of business in the next 1-2 years.

Why Tacit Knowledge Could Be the Next Frontier in AI

Imagine a brilliant lawyer who’s memorized every law, statute, and precedent, but has never stepped into a courtroom. They’ve read every contract template, yet have no idea which clauses actually hold up in a messy, real-world dispute. This is the gap between theory and practice - or between explicit and tacit knowledge.

Tacit knowledge is this elusive expertise that comes from experience and is difficult to capture in writing.Today’s models are unparalleled at processing explicit knowledge (facts, data, etc.). However, they might be capped in scenarios that require more tacit knowledge.

The trillion-dollar question: How do we teach AI the unwritten knowledge? My guess is that it will require AI systems that learn not just from datasets and simulated environments, but learn through continuous interaction in real-world environments, feedback loops with experts, or even “apprenticeships” alongside humans.Imagine an AI that hasn't just read every legal textbook and documented legal case but has participated in millions of courtroom sessions and negotiations.

OpenAI and others are likely racing to crack this.