Hackers have used phishing to trick victims into giving out personal information for ages. The user clicks on a link believing that he is opening a particular web site but would be really interacting with another one. The intent of the fake web site is to get personal information such as user names, passwords, credit cards and other such-like valuable assets. More recently hacks are used to take over devices in order to perform crypto mining activities or to take over the victim’s computer turning it into a zombie command-and-control device. 

Image Source:

Google, Amazon, Microsoft, Apple, Samsung and others are all vying for a piece of the voice activation market. Rather than interact with the devices using a keyboard, mouse or finger, humans control their devices via spoken instructions. The players in this market are marketing (or working on) devices that allow people to make calls, setup calendar appointments, send messages, consume online content, open applications and web sites as well as control a multitude of IoT devices. All this using vocal cues.
How long will it be before fraudsters figure out an effective method of Vishing (voice phishing)?  As with regular phishing hackers will trick individuals into performing a sinister action when they believe they are doing something else.  

Here are some of the ways fraudsters could achieve this:

  • The Pretty Please approach. There is a drive to educate children to be polite when they speak to these voice-activated assistance. The reasoning is that if children address these systems disparagingly they will carry this trait in their interaction with humans. So if a person wants to open a website called “SoAndSo” he would word his verbal instruction as “Open SoAndSo please” or “Open SoAndSo thanks” or “Open SoAndSo Thank You”.  The assistant could open,, or rather than the intended
  • Words that are spelt differently in different counties. In these situations the assistant has to decide which version of the spoken word is the correct one. When a British speaking asks to open an app or web site called, will the app open that one or will it invoke, or
  • Audibly confusable words. These are words that devices (and humans) could misinterpret. Words such as Mints and Mince as well as Spitting Image and Spirit and Image fall within this category. Add accents, pronunciation differences, cultural influences and different human circumstances such as flu symptoms and the error rate goes up considerably. Anyone who has spent time with these devices has experienced the frustration (accompanied by insults only allowed on cable networks) when these devices can’t comprehend a simple instruction and keep asking, over and over, for a repeat.
  • Background noise. Anyone who has spent some time interacting with their device would have surely experienced this phenomena when the device triggers without any form of human communication taking place. While these devices are not constantly recording conversations (we are assured) they are constantly listening for the trigger phrase that triggers their comprehension abilities. Environmental sounds, noises and frequencies can be misinterpreted by these devices. Factors such as distance from listening device, sound echoing off walls, furniture and carpeting as well as room temperature can cause these devices to misfire.
  • Impersonation. Keyboard-based authentication necessitates a username and a password. Over the years security has improved with the introduction of minimum password strength and Two Factor Authentication (2FA). Voice Activated Devices have very little control checks except the voice pattern of the speaker. These devices need to reach a balance between being operable in different environments and being able to identify the speaker through his voice print. These functionalities work against one another in that more flexibility comes at the cost of reduced security. A cleaver fraudster can record his victim and digitally construct instructions. Incidentally, solutions that can compute this from archives of recorded audio already exist. How long will it be before someone will be able to translate a series of inaudible or undecipherable sounds into an activation command? Many years ago John Draper aka Captain Crunch use a similar hack on the telephone system.

The companies behind this technology are working to improve the accuracy rate but there will be many instances when these voice assistance will get it wrong. A few days ago KIRO7 ( reported a case in which Amazon’s Alexia sent a voice conversation between her and her husband to a college of her husband without their knowledge. Amazon who analysed the voice instructions confirmed that the couple had, in the course of a conversation instructed the device to do so. Alexia’s confirmation prompts were not registered by the couple and yes’s forming part of their conversation where interpreted by the device as confirmation to send the recording.  This may be a one in a million case but in a world where billions of voice instructions could be taking place such error would go up into the hundreds every day.


Popular posts from this blog

20150628 Giarratana Circular

HOWTO setup OpenVPN server and client configuration files using EasyRSA

How To Reset the firmware, wifi on GoPro Hero 3, 3+ and sync it with latest version of GoPro Quik