On fighting fraud, Google Search, and machine learning

By Oren Kaniel

The extent and sophistication of mobile ad fraud is growing at an alarming rate, making it more dangerous than ever.

Since we at AppsFlyer have been working with many of the largest advertisers globally, we identified the first forms of mobile ad fraud as early as 2012. This led us to quickly roll out mechanisms to block these early forms of fraud: such as automatically blocking suspicious IP addresses, and introducing a proprietary client side SDK authentication.

We also integrated with anti-fraud companies to allow our clients to work with a tool of their choice. However, fairly quickly we realized that thanks to our scale, we were actually much faster to identify anomalies and block them than anyone else in the industry.

For me, this was a moment of realization: data is not only the new oil, it is absolutely everything! Data is the oil, the network effect, a significant barrier of entry, and most importantly the value a company can provide its clients that no one else can. Great technology can only take you so far, as practically any technology can be imitated fairly easily nowadays.

Once you deliver great tech and a product that clients love, you can move to the next phase: improving the technology and product by leveraging collected data and machine learning. And this is where a clear line can be drawn between superior and mediocre.

Let’s look at Google Search for a moment. Indexing technology has been out there for companies (and theoretically for anyone) to use. So why is it that Google has not been challenged on this front for over 15 years? The answer is simple: The Data! The data collected by the search engine every day is the fuel that feeds its search algorithm.

Technically, it is we, the users, who keep feeding Google with data all day, every day. It is because of this never-ending flow of data that Google is able to offer great search results. In theory, if we all stopped using Google Search and moved to another provider, say Bing, it won’t be long before it would have far better search results, while Google results will eventually become irrelevant.

This is exactly what we have done with AppsFlyer’s fraud protection solution. AppsFlyer measures Billions of dollars in mobile ad spend, over 500 Billion mobile interactions every month coming from roughly 98% of smartphone devices globally. On average, we interact with and see every device several times a day, every day. It allows us to identify device anomalies on both the user and ad-network level.

For example, because (i) it is impossible for a user to click on multiple ads in a single second, (ii) a user cannot physically be at several locations across the globe at the same time, and (iii) we recognize 98% of devices globally, it is simply impossible for a campaign to have a 70% rate of new devices. When that happens, it is 100% fraud.

Personally, I am not a fan of hype. Machine learning and AI are definitely big right now, but one must go beyond the fluff! Machine learning is worthless without a continuous flow of data at scale to train on. So it is only when you have robust technology, a great product and data at scale that your product can go from solid to superior. And that’s when you reach a level where you provide value that no one else can offer.

A fraud-busting product is a b#$ch because it is mostly comprised of back-end algorithms and machine learning, making it far less visible. While our competitors are trying to catch up with what we have already learned and implemented years ago, we are several steps ahead in leveraging our scale and data.

As a result, our competitors are heavily investing in marketing, causing a frenzy in the market and among advertisers, while their fraud tools are centered on static, rule-based engines without the data. Like Bing.

Fighting fraud is a game of cat and mouse. Our scale enables us to quickly catch these fraudsters (mice), automatically immunizing our entire client base against fraud.