You put on the S-Town podcast to go for a run (John B is an addicting character). Ask Alexa to turn on Jay-Z’s new 4:44 album while you’re doing the dishes (hopefully you’ve got Tidal). Driving, walking, working. Music, podcasts, audiobooks. Any activity in our lives can be supplemented with a voice in our ear aiding, entertaining, informing.
That’s the input side of things. And then there is the output. We ask Siri about the weather and get told that it might rain. We ask our TV to turn on Game of Thrones, and we get dragons. Samsung is working on ways for us to talk to our refrigerators, wheelchairs, and smartwatches, from which we would expect immediate results as well.
But we’re still in the early stages. As Stas Tushinskiy, CEO of Instreamatic said, “it’s the era of the PC industry before windows.” Currently, voice is just about leaving the era of interesting but not clearly useful technology. Once we’re able to understand and interpret voice via technology, the opportunities are endless. And we’re likely on the precipice of this moment.
According to Tushinskiy, the first thing that will be updated is our customer service. Tushinskiy believes that the power of Voice AI will change our world most quickly by replacing IVR (Interactive Voice Responsive), that annoying communication that asks us to press 1 for English and 2 for Spanish before launching into an unhelpful rabbit hole that gets us begging for an operator. “Today the tech is ready to provide close to human interaction over the phone,” he says.
But this is a huge topic with a diverse set of ramifications. Based in part on Adriel Lubarsky’s interview with Stas Tushinskiy on Adriel’s Curious City podcast (check it out on iTunes, Soundcloud, or any podcasting app), here is 30 Second’s To Fly look at the future of Voice AI.
How Voice AI is Used Today
We see the power of voice all around us. Since Siri came out in 2011, almost everyone (98% of iPhone users, to be exact) have at least tried the voice assistant. The artificial intelligence ingrained into the technology helps voice learn to be smarter, to learn about us as individuals, and to be able to respond to more situations and dialects every week. Voice exists in a few different mediums.
Mobile Voice Assistants
Perhaps the most relevant application of voice in our modern lives is the mobile assistant. Asking Siri for weather updates and sports news has gone from a gimmick to a standard. Google Now can set meetings, write text messages, and wake you up for work. Samsung’s Bixby can get you restaurant recommendations based on your location and tastes. These capabilities are exploding and evolving outrageously fast.
The short life span and fast growth of these products is reflective of fast change and many more improvements coming our way.
In Home Voice Assistants
As a more recent introduction into our worlds, products like Amazon Echo and Google Home can be seen in many homes, for prices as low as $99. These remarkable pieces of hardware are capable of much more complex interactions than our phones, like Alexa’s ability to order products from Amazon, and Google Home recognizing multiple voices and responding based on the speaker’s personal calendar and preferences. These dynamic features are going to continue to develop.
As Voice AI assistants become smarter and better trained, they will become consistently more interactive and capable of two-sided and interesting communication with their owners. Sort of like the movie Her. Which, if you haven’t seen it, is an Oscar-winning look at the technological potential and ethical ramifications of advanced voice AI.
Recognizing that voice is everywhere is essential to appreciating how important investing in Voice AI is. Televisions play commercials, grocery stores announce sales, and airports blast flight details.
Any aspect of the use of voice can, theoretically, become smarter and offer users a more interactive and valuable message.
Not all voices are created equal
As the Voice AI that is driving the revolution sees more players and investment, there will remain pros and cons to each (just like in any product.) The 2017 Voice Report draws some conclusions and makes predictions about how specialization will factor into product development and company branding.
- Google is going to excel at mining the web and providing intelligent responses to general knowledge questions.
- Amazon is going to excel at commerce.
- Google and Microsoft should excel at email, contacts and calendar.
- Microsoft has a huge opportunity to excel at gaming.
- Google and Amazon are going to battle for hands-free TV and home automation.
- Apple is betting on AirPods for on-the-go use cases, and should have an Apple TV voice strategy.
- All players will battle to become the go to controller of the kitchen, living room and bedroom.
What are challenges in voice?
Understanding what we’re saying
My sister’s name is Bashel–it’s a made up name. My friend’s name is Spyros–he’s Greek. Siri has trouble understanding both of these, unless I emphasize every possible syllable.
My mom has an accent–she’s Russian. My uncle can’t speak English–he’s Ukranian. Both of them have trouble getting Alexa to understand anything they’re saying.
The troubles of getting Voice AI to hear enough samples of diverse languages, accents, and dialects is depicted well by Sonia Paul in her article for Wired. Or if you don’t feel like reading, this 2010 comedy sketch (Scottish men trying to speak to a voice-enabled American made elevator) makes the point just as well, and with some laughs.
The technology that led Google to world domination is, above all else, search. When Google created the ability to index the internet, it blew open the very idea of access to information, and revolutionized the web.
Voice needs a similar development in capability. As David Loughlin writes for the Huffington Post, while touch screens are second nature and interactive, “Voice-based interaction requires users to retain much more information without reference to visual cues.” So voice-based AI will need to learn how to provide more context to information, delivering information in ‘manageable chunks.’
Other challenges–from building personalities to becoming useful enough to be adopted–contribute to making Voice AI a reach goal, more than a sure thing. Even as voice improves, it will likely need help from a more traditional information provider–display.
Will voice replace display?
No. Just like robots won’t replace human jobs, but rather complement them. However, Voice will make things much more convenient.
Voice will complement display. So if you want a quick weather update, voice is the right medium. If you want to view a moving storm front, you want to see it on a screen. So neither medium should act in a silo.
And the big players are beginning to appreciate that, as shown by Amazon’s recent launching of the Echo Show which features both Alexa enabled Voice AI and a 7-inch display screen. As Andrew Ng, chief strategist at Baidu, explained to MIT Technology Review, while “speech input is three times quicker than typing on mobile devices, ‘the fastest way for a machine to get information to you is via a screen.’” So marrying the two will lead to more effective tools.
So what is the vision of voice advertising?
An enormous impact Voice AI will have that isn’t talked about as often is revolutionizing the advertising industry.
So, let’s talk about it. The first and most important problem AI can fix is that there is currently no such thing as ‘clicks’ in audio. And no clicks means that unlike online banner advertisements (however annoying they may be) or even TV ads (at least those can be shared if they’re funny enough) there is no way to track their effectiveness.
Uncounted millions of dollars are spent on podcast advertising targeting the 46 million listeners, according to Business Insider, but it’s a black box. There is no information about whether the advertisement worked, how you liked it, or if it was even beneficial to you. It’s not even customizable to the listener like many online streaming video ads are. Besides the use of promo codes, audio ads are a dead-end. Today, advertisers are not as excited as they could be about audio ads, and they spend more nervously.
Stas Tushinskiy, CEO of the voice advertising company Instreamatic walks us through this example. Imagine you’re listening to Pandora. Your song ends. A promotion plays.“Hey, there is a new iPhone on the market. It’s got 128 gigs and a 4.8 inch display. Are you interested in learning more?” Then–and this part is revolutionary–you can respond. “No, I’m happy with the iPhone I just got,” which would lead the platform to know that they shouldn’t be serving you up iPhone ads, at least for the time being.
Or, let’s say your podcast goes on commercial break. Instead of a 30 second Squarespace commercial, you’re asked a question. A question that video ads–like on Hulu–have been asking us for years. “Which advertising experience would you prefer? Food or travel?” “Travel,” you respond. “Great! Disney Resorts is the coolest place to be this summer!” Now, voice advertising driven by AI can serve you ads that are actually interesting to you.
Or, even more exciting. You’re listening to Jay-Z again. It’s time for your advertisement. You happen to be walking by a Starbucks. Your Voice AI is connected to your phone’s locating software and knows this. It says: “Hey Jack. Thanks for listening to HOV. You’re probably in a pretty good mood. You’re about to pass a Starbucks–how about a cappuccino?” “Sure,” you respond. “That sounds excellent. Are there any discounts?” “Of course!”, responds Pandora advertising. “Take a screenshot of the code on your screen, and scan it at the register.” Then Jay-Z turns back on, and you go about your day.
See–voice and display working together!
According to Stas, this technology is a huge opportunity for advertisers and a chance for us–consumers–to build better relationships with companies. “What we call annoying advertisement is irrelevant advertising. Or it’s not the right time. If we can make ads relevant and providing value, there won’t be as much demand for ad blocking.”
Is there a chance that we are wrong?
It’s an important question to ask when contemplating the future of anything. If today is our starting point, we’ve got plenty of options for tomorrow. Some people, like Stas Tushinskiy, are investing heavily in voice. Others, like Elon Musk, are trying to skip that entire epoch of technology (Elon is working on directly connecting our brains to computer software through his company Neuralink.)
However, Stas and many others (including Amazon, Google, and Apple) believe that voice is the next generation of technology, and won’t be replaced by telepathy anytime soon.
Whoever is right, we can be pretty certain that voice will continue to play an ever-growing role in our lives in the coming years.
I’m a Brooklyn-born vagabond with a proud Russian heritage. I’m into startups of all kind, and anytime we can talk about the future of humanity’s interaction with robots, or just gab about good comedy and literature, I’m there. I also host the Adriel’s Curious City podcast, interviewing thought leaders about the future of their industries.