A11yAdvent Day 23: Oral interfaces

December 23th, 2020 · ~2 minutes

Before we wrap up tomorrow, I wanted to tackle a bit of an off-topic: oral interfaces. The web being primarily a written medium, there is very little room for supporting oral commands. Nevertheless, it is likely that the platform will go into that direction sooner or later, as many native applications already rely on oral interfaces as a possible way to interact with the software.

Vocal interfaces can be tremendously useful. They enable people who cannot necessarily physically interact with a device to be able to. Over the last few years, there has been dozens of inspiring stories of people having gotten out of difficult situations thanks to being able to quickly interact with Siri, Alex or Cortana.

Nevertheless, it is important to remember that not everyone can benefit from oral interfaces the same way—starting with mute people for whom this is not an option. So the first thing to consider when designing software which is controled through voice commands is that it should not be the only way. The same way soundtracks need captions, oral interfaces need physical alternatives.

Besides people without the ability to speak, people who stutter can also considerably struggle emitting voice commands in the adequate frequence. In her piece Stuttering in the Age of Alexa, Rachel G. Goss says:

Because I don’t speak in the standard cadence virtual assistants like Alexa have been taught to recognize, I know this will happen every time, giving me pangs of anxiety even before I open my mouth.
Me: Alexa, set timer for f-f-f-f…
Alexa: Timer for how long?
— …f-f-f-f-fifteen minutes
— Timer for how long?
— F-f-f-f-f-f-f-f-f…
— [cancellation noise]

Of course, Alexa—or any other voice assistant—is not doing it on purpose. It’s nothing but a program. It simply has not been trained for stuttering speeches. The same way facial recognition software produces racist outcomes because it is predominently trained on white faces, the “algorithm” is not the problem. Lack of diversity and ethics in the tech industry is.

A good way to accommodate people with a stutter is to make the voice trigger (such as “Ok Google”) customisable. Some sounds are more difficult to produce than others, and if the main command starts with such sound, it can make using the technology very stressful. In the great piece Why voice assistants don’t understand people who stutter, Pedro Pena III says about Google Assistant, Alexa and Siri:

“[I] don’t think I can do it with all the g’s, the a’s, the s’s. They need to start with letters I can actually say.”

Besides people who stutter, people born deaf often have a different speech than those having being used to hearing voices since childhood. These speech differences, and even non-native accents, are usually not accounted for in voice interface design, which can be exclusive and further alienating.