By Max Pavlov — Sep 30, 2024

Unfulfilled Potential of LLM-Based Products

Every new AI feature seems to launch in “beta,” signaling incomplete products and a lack of confidence. This represents a missed opportunity. But why did we end up here?

When GPT-3 launched, it seemed like magic. You could give it any problem, and it would solve it effortlessly. With each new version, these models outperformed humans in technical benchmarks and even aced exams, pushing the boundaries of AI capabilities. The world was captivated. ChatGPT became the fastest-growing consumer product ever, creating an air of excitement and limitless possibilities.

The tech world was buzzing with potential. The logic was simple: break GPT down into its core components, build custom applications for specific use cases, and monetize these innovations “one API token at a time.”

Fast-forward two years, and that excitement has cooled. Today's top consumer AI products are essentially variations of the same GPT-based chatbot. Apple’s most recent AI-related announcements no longer inspire the awe they once did.

Every new AI feature seems to launch in “beta,” signaling incomplete products and a lack of confidence. This represents a missed opportunity. But why did we end up here?

Failed promise of 10x improvement

The magic of ChatGPT lies in its versatility—it feels like it can do anything. Startups are traditionally built by finding a product that solves a specific job. The intuition is that the product needs to solve a significant problem (so people are willing to pay) for a large number of users, and ideally, it needs to be 10x better than the existing solutions.

On paper, using LLMs seems to be a universal 10x value-adding cheat code for any use case. Their promise of universal applicability makes them seem unstoppable.

But here’s the catch. If you break it down, that universal applicability is a mirage. What LLMs are really doing is handling a long tail of small, specific tasks and user contexts. This doesn’t usually translate into solving big, critical problems at a 10x improvement rate. It’s a subtle bait-and-switch. Your product didn't improve at solving the core problem; it just got worse at solving many adjusting use cases.

It’s Hard to Build High-Quality LLM-Based Products

Creating successful products or features based on LLMs is deceptively hard. The ease with which LLMs can solve a broad range of tasks masks the real difficulty of creating user-friendly, high-quality solutions. Let’s break down a few of the major challenges:

1. Evaluations Are Tough

It is challenging to evaluate LLMs' performance on specific tasks accurately. While benchmarks and tests show them to be technically impressive, their real-world application often reveals gaps in performance, particularly in edge cases and nuanced scenarios. And just by introducing an LLM into the product you probably added an order of magnitude more specific cases you need to test, but your team just doesn't know about it yet.

2. Guardrails Can Kill Use Cases

LLMs' flexibility is both their strength and weakness. To make them safe and reliable for a broad audience, teams introduce guardrails—safety mechanisms that filter out inappropriate, harmful, or incorrect outputs. But these restrictions, while targeting one class of behaviors, often eliminate others, preventing users from solving their adjusting use-cases with your product.

3. RAG (Retrieval-Augmented Generation) Is Also Hard

Integrating retrieval systems (pulling in real-time, relevant information) with LLMs to improve their responses is complex and often doesn’t work as smoothly as expected. There is a notion in a community that you can throw a vector database and calculate distances to a bunch of chunks, and the problem is solved. That's just a dangerous lie. Building a high-quality RAG is as complicated as building a high-quality search engine.

The Interface Isn’t There Yet

To replace programmers with Robots, clients will have to accurately describe what they want.
We're safe.
— Young Elon (@BUDESCODE) July 20, 2020

Another major stumbling block is the user interface. Interacting with LLMs often requires prompt engineering. This adds complexity, especially for casual users. If your product requires users to type out three paragraphs of text and then go through multiple rounds of clarification, you’ve added a serious friction point to the onboarding process. The learning curve can be steep, and that’s where many users drop off.

Where Do We Go From Here?

The untapped potential of LLMs isn’t a sign of failure—it’s a reflection of the industry learning to manage these powerful tools. The AI landscape is still maturing, and while we haven’t yet seen the new killer applications that deliver the promised 10x improvements across critical user jobs, there’s a massive opportunity for innovators who can figure out how to harness LLMs effectively.

I am excited about how well LLMs fit in coding productivity and writing tools. Other fields will follow for sure.

The road ahead is challenging, but as the dust settles, those who can build thoughtful, user-centric LLM applications may still capture the magic once promised.

Failed promise of 10x improvement

It’s Hard to Build High-Quality LLM-Based Products

The Interface Isn’t There Yet

Where Do We Go From Here?

Subscribe to Max Pavlov's Blog