Embrace the Chatter

Figuring out where LLMs belong.


2023-07-07

If the cascade of Large Language Model demonstrations on social media in the last six months shows one thing, it’s how many different tasks one or a small set of models can do reasonably well. ChatGPT, for instance, is not the best chess player or the best equation solver. Nor is it the best available tool in language-related areas where an LLM seems likely to have a better shot, like translating or proofreading text. What’s remarkable is its ability to do all of it. And LLMs often turn out to be useful. LLMs might be unreliable at translation or proofreading, but will still catch things that other products wouldn’t. While they’re currently poor at playing chess or solving equations, they can at least try to explain in natural language what they’re doing and why.

Seems useful - how to make the most of such a model in terms of product? The breadth itself is a nice property because of the all-in-one interface. But in some areas a general-purpose product isn’t enough - users need something purpose-built and the ability to combine LLMs with other tools. In the next few years, we’ll see what belongs in which category.

It’s a reoccurring theme for software products. On one end there’s the team building for a specific problem with the advantage of focus, making almost no trade-offs when it comes to solving that problem. On the other end, there’s the inferior product with the advantage of being integrated into a larger user context. Sometimes there’s a product advantage to that due to feature synergies. Often though, it’s just a matter of convenience, since users avoid the very real mental cost of adopting an additional product and making a habit of using it. It’s a constantly ongoing experiment to figure out how tools ought to be bundled to maximize user experience and synergies while minimizing that mental cost.

For some use cases, it will likely be general-purpose LLM direct to the user because people don’t bother with anything tailored to a particular problem. There will also be existing products that incorporate LLMs to improve them, making it convenient for existing users. And finally, there will be new companies that overcome user inertia by using LLMs to build great products for particular use cases.

Out of the three, one might believe that general-purpose LLMs will be used for most tasks. Just like Google is the destination for most search regardless of the topic, the inertia of using purpose-built products could turn out to be too large. But I believe it will be different this time. LLMs do more than retrieve information - they do tasks and solve problems. It’s not merely about providing some information that you are looking for, it can help you with the task that you are ultimately trying to do. Helping with that might be a very different experience depending on the problem.

Midjourney, for instance, uses LLMs as part of generating images from natural language. It all happens through chat via a Discord bot, which means that users lack the control that comes with other tools. The user experience is cumbersome for those who are used to working with images in other software since it’s separate from the existing workflow. Several companies are figuring this problem out and it seems clear that eventually the two will be integrated. Or, it turns into a new product that combines the best of both worlds in a new way. By starting from scratch, some teams might create a much better user experience to create images through a natural language interface while preserving full control over the image-creation process.

A related problem is figuring out how general the model itself should be. If a model is purpose-built for a particular setting, it doesn’t need to be able to do everything. Startups are working on this too, narrowing the breadth of LLMs slightly by defining the model’s context or adding data to it. This tunes the model to a particular domain of problems, sacrificing generality but maintaining some general abilities, such as the ability to get to the essence of a prompt and see it in a larger context, while making the model better at a particular task.

Now that LLMs are available and clearly appear to be useful, the experiment carried out by startups and companies is figuring out how broad they should be and where they belong.