Building UX for LLM-based tools

The Jinn, the machine and the human

Jun 20, 2025

There’s a belief in certain circles that it’s possible for humans to enslave a jinn (a supernatural being or a genie) to make it perform miraculous tasks and gain access to infinite knowledge. I wonder if today’s LLMs could count as those jinns.

Today’s large language models sound human enough that it’s only been natural to subconsciously anthropomorphize them. And that’s led to all sorts of UX challenges.

After all, how does one design the UX of a product that relies on LLMs as its building blocks? How does one build a product that appears so human-like in its capabilities and yet breaks users’ conceptual models of humans so regularly?

One solution would be to anchor the conceptual model to specific human types: my friend Alyssa Bilinski describes it as a worker that never gets tired or bored but has an Amelia Badelia complex. I’ve also heard Anthony Deighton describe it as an MBA intern: confident in its assertions, expert in all, bullshiter extraordinaire.

Clearly, we’re trying to make sense of something that sounds human (it can form complete sentences that sound plausible), and yet something is always amiss.

Here’s my addition to that chorus:

Almost 11 years ago, I worked in Albania on a project that eventually became my thesis. The first morning in Tirana I wandered out to get breakfast before I went to the office. My hunger led me to the first thing that looked like a restaurant and I sat down on one of the chairs assembled outside. A waiter duly popped up and looked at me, head askew. I stared back. Eventually I said “menu”. He stared back. No feedback. I opened and closed my hands to mime my interpretation of menu. He shook his head and said something I didn’t understand. But I picked up the word “cafe”. I nodded to signal that yes I was aware I’m in a cafe. He promptly vanished for a couple of minutes, returned with an espresso that was plonked down in front of me, as he swiftly retreated. I stared at the espresso, gave up on communicating and had the espresso (coffee is kafe in Albanian). I’ve been hooked on caffeine since then.

I had approached that interaction knowing broadly, but not specifically, what I wanted. I wasn’t sure how to communicate properly, but I tried something. I thought it worked. He thought he understood. I didn’t get exactly what I wanted. Maybe if I had gone there another day and got another server, I would’ve successfully gotten a menu. But the coffee wasn’t the worst thing ever.

LLMs in a nutshell: their precision is a function of both the instructions and the task at hand.

However, since this is a post, I need an odd-numbered list. So here are some more specific UX challenges that are germane to LLMs:

The gulf of (and in) evaluation: Good UX lets users know what’s happened, and if their goal has been achieved. Evaluating an LLM’s response is hard work: nobody enjoys doing code reviews, or reviewing the bland word-salads that LLMs can spew. And the burden is on the users to figure out if the LLM has actually helped them achieve their goal. It’s neither fun nor easy.
Maximum affordances, no signifiers: How do users work with something that could potentially do anything but there’s no way to figure out how to do the specific thing they need? Simply trial and error? Since it’s impossible to write infinite documentation, do users fumble around the tool by themselves, building highly localized conceptual models of what tasks may be possible and how? Here’s an experiment: if someone cloned ChatGPT and put a sticker on it that marketed it as “Specialist recipe-generator for Mughlai cuisine”, would users have a better experience with it even if it produced similar quality output? Can less be more?
Chatbots suck: I understand that chat is the most obvious interface for an LLM, but they’re so boring. Surely there’s a better way to build better user experiences, because who talks like a chatbot in real life? And if we expect users to resort to the medium (i.e., speech) they use subconsciously everyday, they will always pick up if something sounds artificial but won’t be able to explain why. It will always feel off.

Next post: some ideas on how to deal with these UX challenges!

On The Verge

Discussion about this post