Reflections on Conversation / Voice User Interface

The rapidly evolving Conversational User Interface

(top) "ELIZA", ELIZA, an early natural language processing computer program created from 1964 to 1966[1] at the MIT Artificial Intelligence Laboratory by Joseph Weizenbaum

In 1968, Stanley Kubrick envisioned an interface that was capable of carrying out a detailed conversation with humans in his film "2001: A Space Odyssey"[1], however he also envisioned that the same machine could take human lives. The machine was the HAL 5000, and conversing with the technology was effortless, eliminating friction associated with the traditional "dialogue" between human and machine.

Photo: (top) the "eye" of "HAL 5000" from Stanley Kubrick's "2001: A Space Odyssey"
(Bottom) an aerial view of Apple's "Homepod" 


50 years later, we are inching closer to the reality of a Hal 5000 system with the development of Conversational / Voice User Interfaces (CUI / VUI).[2] These programs lie at the intersection of computer programming and linguistics (or computational linguistics), and rely on machine deep learning algorithms — a category associated with artificial intelligence — to carry out a "fluid" conversation with an individual.

The current, most popular incarnation of CUI / VUI is the virtual assistant in your pocket and around your home. Apple's Siri, Amazon's Alexa and Google's Google Home are each viable case studies actively battle-testing the notion of a dialogue between human and machine. It's likely that you've interacted with one of these programs, and if so, you may have noticed that the assistants frequently refer you to the web or simply acknowledge that they "don't understand what you mean by that." While this is obviously frustrating, it is a part of the process of machine deep learning, and specialists are actively working to refine these issues. Christopher Manning, machine learning specialist and professor of computer science at Stanford University[3] describes these assistants in another way:


"We call these dialogue agents – programs that try to understand the meaning of your spoken language and to perform tasks based on what you’ve said. It’s easy to create a chatbot that just repeats random stuff that’s half-connected to what you asked about, but how do you build dialogue agents that can understand and do tasks that people are asking for?"

It's clear that virtual assistants, chatbots, or "dialogue agents" are in their early stages, but what is the future of this technology? If machines are able to sense language, and language becomes the primary interface between human and machine, a new wave of ethical concerns will surface.

Gender for example is an interesting topic considering that most of the virtual assistants on the market today are female. This opens up a discourse on the gender politics of virtual assistants, and eventually, will steer a similar dialogue once artificial intelligence enters the mainstream. Another consideration is examining how machines interpret language and act accordingly. What if an Apple Homepod detects domestic violence in the household? Should the device notify local authorities? Of course, but how could the device be sure it understood the dialogue? In addition, is it necessary to have devices that are theoretically always "listening"? Surely this contributes to a repository of data and enhances the machines ability to learn the subtle nuances of a complex language, but this raises the question of data storage / surveillance, trigger words or language monitoring, and commerce (for example, I mention that I want to buy something and Alexa orders it to my doorstep).

As these technologies become more sophisticated, their duties will be considered for services beyond setting reminders and calling mom, and it's important to consider these topics before we create machines that can make decisions based on a deep understanding of language. After all, we don't want a machine to make a life or death decision.

Notes:

1. Kubrick, Stanley, director. 2001: a space odyssey.

2. McTear, Michael F., and T. V. Raman. Spoken dialogue technology: toward the conversational user interface. Springer, 2004.

3. Myers, Andrew. “Chris Manning: How computers are learning to understand language​.” Stanford ENGINEERING, 18 July 2017, engineering.stanford.edu/magazine/article/chris-manning-how-computers-are-learning-understand-language.

Comments

Popular posts from this blog