Voice Search is not the next big thing - Society isn't ready
Voice controlled technology is currently being hyped as the perfect solution to all of our problems from: recipes, to teaching children, to shopping. Advocates tout the natural interaction of our voices as the key to unlocking computing, removing frustration and expanding the universe of devices that are interlinked and constantly connected.
There is nothing natural about talking to a computer. A mediation exists to facilitate this communication through an interface that has been designed to helps us communicate, and the software that controls this interaction and the resulting conversation.
Although these interfaces are designed to be almost invisible, with voice technology they are evident in the most basic way: the wake-word. Created for technical purposes, to allow the device to connect to the server while at the same appearing to be always-listening, and also to enhance our feeling of domination over this strange box that has appeared in our house and begins to become the focus our of rooms, away from the more traditional visual focus of the TV.
Communication using a wake-word can never be natural. You can test this yourself, in your next conversation with your friend, use their name every time you want to talk. It doesn't flow, it feels stilted. Now imagine at the end of every single question or comment they stop listening to you, stop reacting to you conversation.
This is the reality of talking to your devices today. Every time we have to signal our intent to the device with ‘Alexa’, 'Okay Google', 'Hey Siri', it reminds us of the lie of natural interaction, the interface is laid bare before us and its inability to disappear betrays itself.
However, if your device was always listening then the wake word would not necessarily need to exist. A smart enough AI could be able to differentiate between you talking to each other and you talking to it. This would open the door for more natural interaction, but also opens up privacy concerns around Amazon having access to your other conversations.
Do you want your device to be on all the time, recording your every word?
What if I told you, your device was, in a sense already doing this? Audio input (you speaking) is sent to the piece of software that detects if you have spoken the wake-word. To reduce the occurrence of "false positives", that is where you speaking similar words but not the wake-word awakens the device, all spoken content is checked in the cloud. You can check this for yourself if you have an Amazon Alexa. You'll need access to the app as well as your device. Speak your assigned wake-word and issue a command. The light ring will illuminate for an extended period and the device will (hopefully) complete your request. This will then appear in the app as a completed action. Then speak the word Alexis and note that the light ring illuminates for a shorter period. In the app will appear a box similar to:
You can play back the recording of you speaking using the play button.
Now how has the recording appeared unless the device is already listening? Of course you can delete these recordings, but that is problematic as it relies on the user both knowing about the recording and going in to the app to delete them. It’s fairly safe to assume other devices work in a similar way.
Since you have already, albeit unknowingly, given up your privacy by getting one of these devices, now the question becomes how much more are you willing to trade for a seamless experience with voice that doesn't involve issuing commands prefaced by a wake-word? This is key question for the development of voice in future and because this debate has yet to be had in the public domain, be very careful of those promising voice to be the next big thing. It is far from certain that the technology will develop in a way users are happy with or even that the technological challenges of “are you talking to me?” can be overcome in a way that creates natural interaction.
What is Voice Search? - How does it work?
The technology behind Voice Search devices like Alexa and Siri is commonly referred to as Artificial Intelligence but it isn’t really, or at least not yet. The servers are referred to as a cloud but it isn’t really either. In reality the server is is a big stack of data drives with an architecture of code and rules that breaks down the spoken words into recognisable patterns, for which a reply has been pre-programmed to create the a call and response. The logic behind programming an integrated app is, “here is what they will ask ----> Here is what I want to happen”.
While Alexa and Siri are not true AI, they do have a machine learning feedback loop that corrects itself to improve over time. The architecture of the technology is best described as a tree of information. The trunk of the tree is reinforced by repetition of simple core call and response phrases, and is therefore sturdy. However the higher up the tree the response to rarer recognisable calls are, the thinner the branches as they are not as often reinforced by the feedback loop. This is why Alexa and Siri are very good at talking about news and the weather because it does it all the time, but not so good at answering questions about anything else.
Why is everyone so excited about it?
Brands and technophiles are very excited about voice search technology because it is a new way of doing things. Everyone wants to be an early adopter, to be the first to market. Tesco have had online grocery shopping since 1996. Although only half a dozen people would have used it in the age of dial-up internet, being the first to market creates an advantage that is seen today in their dominant share of the market. By being first off the starting blocks, Tesco were able to gain a major headstart on other retailers. Crucially when experimenting with creating the complex logistical infrastructure necessary for home deliveries of online ordered groceries, Tesco could learn from their mistakes along the way, to course-correct when only early adopters would be disrupted, to make sure the system ran smoothly when the public caught up.
However there is a drawback to the excitement drawing the focus onto the potential new way of doing things. Concentrating on a possible next big thing is a way of avoiding all of the contemporary problems that are not new and exciting and are far less sexy. No one gets excited anymore on their morning commute talking about ironing out the creases in their delivery service to improve their average delivery times, or wants to talk at conferences about finally fixing the basics on product listings, but these are crucial areas that many companies selling online have still not adequately got right. Without first establishing robust foundation it will not be possible to implement innovation to grow with scale.
Why should people really be talking about it?
Voice search technology is all about the interaction between humans and tools. Adoption of a tool is a human trait. We have always done it and will continue to do so to improve our lives with increasingly advanced tools, like the apes that use bones as weapons in the start of Stanley Kubrick’s 2001 A Space Odyssey, all the way through the technological timeline to humans chatting to HAL the rogue AI.
Brands think that voice search might be the next big tool to be adopted, like smart phones. People today can barely imagine their lives without their phones. A study by Paypal and Google estimates that two thirds of all eCommerce sales will be carried out through smart phones by 2020. Brands want to establish themselves in the voice search space in case it becomes as ubiquitous in our everyday lives as smart phones.
Why voice search will not be the next big thing?
Talking to machines is not natural. It is not a conversation but a singular call and response.
It is not possible to advertise because would just be noise if at the same time as voice content, or would be intrusive if interrupting actual content (like the free version of Spotify), unlike with visual media where people can tune it out easily.
Even when Alexa or Siri are answering questions and fulfilling their purpose, people have very little patience for waiting around for the audio content. The information provided has to be brief; Alexa cannot list things because people will want her to stop the monotone voice. Human brains do not work like in the same way as a computer, so a machine reading the first paragraph of a Wikipedia article or listing all the possible options is not particularly helpful. Since we are used to having conversations with other people and not machines, we associate a verbal answer to a question to be one that specifically answers the query. However with machines that is often not possible due to the software having to categorise the whole topic. The reply to a search query cannot be as specific as we have become used to.
Due to the inability to advertise without intrusion and the lack of patience of people listening to lists of options, when shopping online through the medium of voice search, it is impossible to browse so therefore becomes only possible to buy the top few items in a category. Alexa will add the “Amazon Choice” unless specifically told to list alternatives. When reading the options, only the first few are presented and in the interest of being succinct, the only information provided is the name and the price. This results in only functional shopping being possible. All effort brands put into selling the benefits of their product are ignored.
Due to this all online shopping done through current voice search devices is brand blind. If the only differentiator between products is the price, then the only promotional tactic to drive sales possible is to encourage a race to the bottom to have the lowest price point, which benefits neither the brand not the shopper.
Is there a need met by voice search that is not met elsewhere?
One of the main purposes of voice search devices is to make online shopping simpler. However the shopper cannot do a full shop solely through the medium of their voice search device; they need to sign off from the app on their phone or website on their computer first to confirm the order. Due to this roadblock it is simpler for shoppers to just use the phone in their pocket to start and complete their online shopping.
Online shopping is an increasingly prevalent behaviour due primarily to convenience for shoppers able to buy any product from the comfort of their home. However due to many factors such as poor digital literacy, adoption by mainstream shoppers remains slow, especially amongst older demographics. eCommerce currently accounts for 17% of retail sales. Voice search is only a miniscule fraction of that as shown Amazon’s own data that only 2% of Alexa-enabled device owners used their device to assist in their online shopping.
Even in a case where there is a specific use; for example, asking Alexa for a cooking recipe and to order something that is missing to be delivered ASAP the benefits are very limited. It sounds great in theory but in practise is not that useful. You cannot read the recipe off a screen, so have to keep asking the same question.
How can voice search go wrong?
A voice search device in your home is always listening, but this may not be as scary as it sounds. The idea of being constantly spied on stirs up paranoia of a notes being taken on every word we say, like in the 2007 Oscar winner ‘The Lives Of Others’. Fortunately there is not a person on the other end, just machine learning algorithms analysing data. Amazon does not care about individuals, but cares about larger data sets on shopper trends. The ultimate purpose of this is to manipulate shoppers to buy more, which is almost as creepy as the spy in the attic.
Voice search technology is still in its infancy. While both both Amazon and partners test and learn to experiment with new uses for the technology, there are bound to be some hiccups.
One humorous example occurred when Ocado first enabled customers to add to their shopping list through their Alexa. The programmers had put a lot of thought into the machine’s half of the interaction but had neglected to take human conversational behaviour into account. People are used to only talking to other people, and so when asking Alexa to add items to their shopping list, they treated her like a person and said “thank you”. Alexa interpreted this as the next item on the shopping list and so ordered thank you cards. Fortunately Ocado noticed a huge spike in orders of thank you cards and investigated the cause then refunded the confused customers.
In a unique case in Arkansas, USA, police investigating a homicide attempted to gain data from Amazon of recordings Alexa may have made of the crime.
The warrant read:
"The Amazon Echo device is constantly listening for the wake command of 'Alexa' or 'Amazon,' and records any command, inquiry, or verbal gesture given at all times without the 'wake word' being issued, which is uploaded to Amazon.com's servers at a remote location. It is believed that these records are retained by Amazon.com and that they are evidence related to the case under investigation."
However Amazon’s lawyers dismissed the warrant. Amazon responded to say that they would:
“not release customer information without a valid and binding legal demand properly served on us. Amazon objects to overbroad or otherwise inappropriate demands as a matter of course."
The case drew attention from around the world due to the privacy implications. With an ever increasing number of people buying voice search devices such as Alexa and Google Home, and using them as the voice activated hub to control their smart homes, people feared a nightmare scenario where your own home spies on you and reports to authorities. Amazon was able to successfully argue against handing over potentially incriminating data in this case, but may not always be able to in future.
Full adoption of voice search devices into everyday lives is 20 years away. A significant culture shift is required in terms of surrendering the idea of privacy and personal data protection. Voice search will not be the next big thing because we are a society are not ready for the invasion of privacy that it would require in order to achieve its potential
There is a tremendous amount of potential that voice search can achieve if given enough time to develop properly, but huge technological advances are required to create a functional system instead of a gimmick. Perhaps the comedic element of the wrong answers, inability to have a real back and forth conversation, and gaps in knowledge, will help people to grow comfortable with devices in their homes listening to them. Over time the capabilities will improve and the devices will become more useful. By then if they are accepted as a useful tool then people may be willing to surrender their privacy in exchange for the promised convenience.
However it is not just consumers in their household that are not ready for voice search to be the next big thing; neither are the brands who are hoping to benefit from being the early adopters of this shiny new tool. If their infrastructure and logistics operations are not seamless, then they cannot scale with speed. They first need to fix all of their contemporary issues in portraying their products, services and brands online by addressing the fundamentals of online selling success, and automate as much of their robust eCommerce infrastructure as possible. Only then will they be able to benefit when cultural acceptance catches up with the ambitions of innovators.
Becky Curtis-Hall is a Customer Success Manager for E Fundamentals. Her career spans marketing and sales, but always online, with a specialist interest in the interaction between humans and machines. Completing her masters degree in digital media while working in eCommerce has created a unique perspective on both the commercial side and the academic behind the scenes that fuels digital innovation. When not at work you can often find her out running or drinking wine.