What you should know about chatbots and conversational UI
I keep reading hot takes about “theories of conversation” by tech people who appear to have no linguistics training, and I’ll be honest that it really grates on my nerves. Why? Because there is an enormous body of literature in linguistics and psychology regarding how conversation works and how people make meaning out of words. There is no need to invent a theory that’s not based on any kind of science or evidence other than the author’s intuition. Remember, the plural of anecdote is not data.
That said, I would like to present some of the theory around how we make meaning in conversation, and then I’ll talk a little about why this isn’t appropriate to slap onto human-computer interaction.
Primary source for this post:
H.P. Grice developed a theory about how people use language that describes how we engage in conversation and know what messages are being conveyed even when we don’t mean literally what we say. His co-operative principle (as cited by Levinson, p. 101) states that people follow general guidelines in conversation that advance it according to the purpose, direction, and context of the conversation.
Within this principle, he developed 4 maxims, which I will describe according to how Levinson presents them (pp. 101–102).
- Quality: do not say things you believe to be false or for which you lack adequate evidence.
- Quantity: Be as informative as is required for the purpose/context of the conversation and no more.
- Relevance: Be relevant to the purpose/context of the conversation.
- Manner: Avoid obscurity and ambiguity; be brief and orderly.
You can see that there’s some overlap here, and we don’t always follow all of these at all times. However, the basic premise is that when we are engaging in conversation, we will adhere as much as possible to these maxims in order to maintain the purpose and direction of a conversation in the interest of advancing it and maintaining order.
Consider the following interaction:
A: “What time is it?”
B: “The school bus just dropped the kids off.”
Conversational implicatures allow us to fill in the gaps between these two seemingly unrelated utterances to figure out what’s going on. Because A ostensibly knows what time school lets out, they will likely know approximately what time it is when B responds in this way. To an outsider, this pair of utterances may seem unrelated, but because of this principle of cooperation, A is able to figure out what B means by their answer. A knows that B wouldn’t say something unrelated to their question. Communication is successful.
This is a very simplified explanation of conversational implicatures — much more can be said and has been said about them. The next theory I’d like to discuss is regarding Speech Acts.
Levinson explains Speech Acts, described and detailed in J.L. Austin’s How to Do Things with Words (1962). The basic premise of Speech Act theory is that we don’t merely say things, we are also doing things with words. According to Austin (as cited by Levinson), there are three sub-acts contained within an utterance: locutionary, ilocutionary, and perlocutionary.
Locutionary act: the actual sentence/phrase/utterance. It has a specific meaning and referent.
Ilocutionary act: the meaning the speaker puts behind the utterance. What they hope to accomplish by making the utterance.
Perlocutionary act: The effect created by the utterance on the recipient or hearer of the utterance.
It’s cold in here.
Locutionary act: the declarative sentence “It’s cold in here.”
Ilocutionary act: the speaker is cold and wishes their partner would close the window/turn down the air conditioning/bring them a blanket.
Perlocutionary act: the hearer understands that the speaker would like to feel warmer and feels compelled to close the window/turn down the air conditioning/bring them a blanket, and does so.
As you can see, that one little utterance is loaded with a lot of stuff that we don’t even necessarily think about at the time of the interaction. And again, this is extremely simplified for the sake of space. (Levinson devotes over 60 pages of his book to this topic alone!)
How does this come together?
Conversational implicatures and Speech Acts come together to help us figure out how to communicate with one another and make meaning out of seemingly inappropriate utterances. I’ve talked before about how communication and meaning-making is culturally situated, and here is where that comes into play.
In the previous example under conversational implicatures, let’s imagine that
- A is from another country where kids get out of school at 5pm.
- A and B are in a country where school lets out at 3pm.
- A doesn’t have children so doesn’t know what time school lets out in this country.
When B responds that the kids were just let out, A might infer that it’s 5pm, when it’s actually 3. We suddenly have unsuccessful communication in that A doesn’t know what time it is because they weren’t aware of cultural knowledge B assumes A has.
If participants in an interaction do not share the same cultural knowledge, these theories can break down and frustration can be the result.
What does this have to do with conversational UI?
Tons. Who is designing it? Where/from whom is the machine learning conversation patterns? How important is the user’s goal? Will the user share cultural knowledge with the machine? Will the machine be able to understand what a human user is “doing” with language? We have a lot of ways to mitigate communicative breakdowns when the two participants are people. How will a machine know how to do this?
The theories I presented above are based on human-human interactions. Humans and AI/machines are not the same and do not act the same. AI depends wholly on the data you feed into it and the people who make it. If you have a small number of homogenous people making/choosing/feeding, you have a huge potential for miscommunication because the data will necessarily be informed by the culture of the person dealing with it. This is part of the problem regarding the lack of diversity/inclusivity in tech writ large.
This means that, if someone designs a chatbot or conversational UI that is intended for diverse audiences and doesn’t understand audience variability in both culture and language use, there is a huge potential for miscommunication and breaking of trust between the brand and the user.
I am hearing that people want to replace customer service agents with chatbots/AI. One of the startups I recently read about wanted to replace customer service for helping select health insurance with a chatbot/AI. This is way too high stakes to risk communicative breakdown and is a phenomenally bad idea. When people are in high-stress/high-emotion situations, you need a person on the other end to help the user feel heard, feel understood, and obtain the results they want.*
*By the way, this is also why it’s bad to outsource your high-stakes customer service.
By inappropriately using AI/chatbots, you are potentially breaking down any trust the user had with your brand. When they don’t feel heard, when they don’t feel understood, when they don’t feel cared for, you are not saving money. You are losing it.
Can we theorize about human-machine interaction?
Sure, but it’s not the same. Machines are only as good as their input. They can’t replicate a person, so any theories of conversation and interaction that were developed based on human-human interactions will not work. Machines could possibly be programmed to account for all the responses possible for a given utterance, but that would take a very long time. It would be an enormous amount of work, and we would likely still miss something. Dialectal variation and cultural variation are vast.
We need to think about human-machine interactions as something that is much more one-sided. We need to take the limitations of AI into account when we talk about chatbots and conversational UI. Machines are not and never will be people, and we shouldn’t be pretending they are or that they can exactly replicate human-human interaction.
I return to one of my most favorite sayings: Just because you can, doesn’t mean you should.
Just like anything else we do in design and tech, we need to think about all the possible consequences of the choices we make, and we need to draw upon the expertise of the areas we want to tap into to make sure we aren’t causing more harm than good.