There was no reason to be suspicious. The voice at the other end of the call was one that he heard many times before – it was the CEO of his German parent company, and since he was the CEO of the local British operations, they had conversed quite often.
So when he was asked by the German CEO, his superior, to transfer €220,000 to a Hungarian supplier of the company, ASAP, he did as he was told without asking why wouldn’t the German company transfer the funds directly. After all, it was his boss.
Minutes after he made the transfer, he received a call from someone else at the German parent company, assuring him that the UK company will be reimbursed for the full amount. If there was any doubt in his mind, that call had completely laid it to rest.
But then, a few hours later, the German CEO called again and asked that the UK company would transfer more funds to another supplier. At that point, the promised reimbursement payment for the previous transaction had not yet been transferred to the UK company’s account, and the British CEO began to feel that something was not right. He told the German CEO that he would transfer this second payment, but after he hanged up the phone he picked it up again and called the German HQ, asking to speak with the German CEO. When the two connected and the British CEO asked to confirm the German CEO’s request for the second payment, it became clear – the German CEO didn’t ask for any payment to be transferred, and in fact, had no conversations with the British CEO that day. That other person who called about the reimbursement transfer? He claimed he didn’t speak with the British CEO either.
Real people, fake news
So who was that person at the other end of the phone, who sounded exactly like the German CEO?
It wasn’t a person.
The reason that voice and manner of speaking sounded as if they belonged to the German CEO is because it WAS the German CEO’s voice and his own words – but they were created by hackers using an AI-based process called “Deepfake”.
“Deepfake” represents the merging of “Deep Learning” – a sub-genre of Machine Learning, where algorithms attempt to replicate the way our brain understands new data and derive insights and decisions – with “Fake” – i.e does not exist.
You’ve seen many deepfakes in the last few years – its these videos where someone’s face is stitched on top of another person’s body in a very convincing way, including the precise movement of their mouth to fit the pronunciation of the words which are coming out of their mouth.
Creating deepfakes is becoming easier and easier – you don’t need high-performance computing infrastructure, expensive editing software and a few days – just download applications like FaceApp, FaceSwap, DeepFace Lab or the Chinese app Zao to your smartphone – and you’d have a deepfake studio in the palm of your hand.
The initial deepfake videos, and pretty much all of the user-friendly deepfake creation apps, were created as a way of demonstrating the technology’s potential or for entertainment. But some of them were created to alert the public on the potential of deepfakes to create actual damage by doing what they were meant to do: create a visual and oral documentation of a reality which never existed, to mislead and distribute false information.
One such attempt occurred in 2018, when a video of President Donald Trump urging the country of Belgium to withdraw from the Paris climate agreement began making waves on the Internet. When reporters started investigating the source of the clip, they discovered that President Trump never actually said these words, let alone was filmed while speaking them. It turned out that the video was the creation of a Belgian political party, which used audio editing tools to combine parts of Trump’s speeches into what sounded like one speech, and deepfake visual tools to sync the audio with his face movements.
Don’t believe everything you hear
Video is considered to be the most engaging medium, so the threat of a deepfake video created to distribute false information or call to action is something that tech companies and law enforcement agencies are struggling to mitigate.
But audio might actually represent a more immediate threat in terms of financial fraud, as demonstrated at the beginning of this post. In this case, the fraud was not done by combining existing audio into specific sentences – the conversation took place in real-time, so the hackers needed to create a way to quickly reply to questions they did not know in advance would be asked by the British CEO, and do so in the German CEO’s voice.
To do so, the hackers found existing audio recordings of the person from multiple sources – press appearances, investors calls, etc. – and used a combination of speech-to-text and deep learning tools to come up with the oral equivalent of a “font” which acts as the core speech elements of the German CEO.
When they called the British CEO, they played previously-recorded sentences they created from that “core speech” (such as “Please transfer the money to account number…” etc.), but could also reply to questions by writing text which would turn into audio in the German CEO’s voice. It was so convincing, that the British CEO began to suspect he wasn’t talking to his boss only after a separate suspicious event occurred afterwards (the reimbursement money not being transferred).
It is unclear how many attempts to conduct deepfake-based audio frauds occurred, and how many of them succeeded. We can assume that as in the early days of ransomware events, companies which fell victim to such attacks usually prefer to keep the incident unreported, to prevent potential image and legal woes from their customers and the general public.
In February 2020, Pindrop, a company which created a technology to mitigate these kind of audio deepfakes, claimed it had received only a handful of reports from companies which apparently were the subject of such attempts, but the company’s CEO estimated that the total damage of these incidents could have reached $17m.
As in any fight against cybersecurity threats, decreasing the chances of being defrauded by these deepfake audio scams requires not just new technologies, but clearly defined and constantly executed processes within the organization itself. In the same way employees have trained themselves to check the sender’s address on an incoming e-mail, or refrain from opening attachments from unknown sources – to combat phishing and malware attacks – organizations need to create or update their playbooks when it comes to training employees on how to conduct financial transactions in a way which prevents fake conversations from stealing real money.
Efi Debi has been with nsKnox since 2019, serving as the Director of Product, leading the product and UX teams. He brings deep domain expertise in technology-driven prevention of digital payment fraud, holding a Master of Business Administration (MBA) degree and a BA in economics and marketing. In his free time, Efi enjoys driving, listening to music, and cooking with his family.
You’re invited to connect with Efi on LinkedIn.