A New Approach to Advanced Fraud Bot Protection
One of my favorite movie posters is from the movie Being John Malkovich. Hundreds of different people are holding John Malkovich masks, making it very hard to determine which is the real John. The world of mobile fraud is very similar. Fraudsters work very hard trying to blend in; sending realistic-looking sets of impressions, clicks, installs and in-app activity. Over the last few months, commons tactics have included bots and bot-based SDK Spoofing. The sheer volume of fraud often makes it extremely hard to determine the difference between legitimate and fraudulent traffic, particularly when the fraud is mimicking human behavior.
Identifying Advanced Bots
A few weeks ago, one of our Protect360 data scientists grabbed me before my first cup of coffee. She was very excited about what she had found.
She was deep into a fraud analysis, working on a particularly interesting anomaly. The team had detected a resurgence in bots attempting to send false clicks and installs. Back in 2014, we called this type of fraud SDK Simulation (some now call this SDK Spoofing) and deployed security mechanisms in our SDK to actively block this activity. After isolating an app heavily targeted by these bots, the data scientist dove deeper, identifying a number of new, advanced bots. She then identified a series of unique behavioral markers that could be used to accurately identify these new fraud bots.
Fraud Distribution Analysis
Our next step was to understand the scale and impact of this fraud: how widespread it is, and how much these new advanced fraud bots are costing marketers. Based on this initial fraud analysis, our anti-fraud data scientists developed a series of new fraud signatures. We then set up a short regression analysis based on these new detection mechanisms. By analyzing the pre-mechanism activity and comparing it to the post-mechanism activity, the team projected that the new mechanism would block about 0.05% of this class of traffic, roughly matching our team’s initial predictions.
Fraud Signature Validation
The next morning, these new mechanisms were sent to one of our data science team leads for further review. The senior data scientist ran a longer regression analysis, applying this new mechanism to a far longer timespan – instead of going back just one week, his regression analysis looked back ten weeks, exploring activity across all apps and clients. This secondary analysis revealed some surprising results. Over the previous couple of weeks the new behavioral identification mechanisms had flagged almost 5% of this class of traffic as fraud! Either the initial fraud signature was flawed, our projected impact was based on an aberrant data set, or there were some other factors we had not yet identified.
While unmasking a new fraud signature with such a large footprint raised lots of excitement, this was 100x larger than our initial projections. Clearly, something was off in either our new fraud model or our initial projections.
There was clearly another variable skewing either the initial test set or our broader validation set. The team started to dive deeper into this 10 week period, as well as zooming further out, looking at different time periods and other fraud trends to try to find correlations that could explain the apparent gap. They looked at performance trends, variations in the number of installs, media sources, in-app behavioral patterns, conversion rates and more, but none of the metrics really stood out. The team then zoomed even further out, analyzing regional and vertical benchmarks, as well as comparing individual apps benchmarks.
Isolating The Key Variables
This broader macro-analysis revealed the key variable driving these abnormalities: a specific group of apps saw a dramatic spike in installs, all in the same week. The number of installs on that week had increased ten-fold.
Now that we had isolated the problematic cohort of apps, we needed to identify what drove this install spike. When we ran our latest behavioral analysis fraud signatures on this same cohort of apps, we found that about 90% of their false growth was due to a series of advanced fraud bots. While these more advanced bots had previously evaded detection, we now knew exactly where and how the fraud was occurring.
But what about that apparent gap in our initial projections? In retrospect, this spike in fraud had actually normalized our initial regression analysis. When stripping out our other fraud protection mechanisms, we found that our new anti-fraud mechanism would, in fact, affect about 5% of this class of traffic.
In a world challenged with fraud activity, finding the right behavioral patterns in the data haystack is not easy. Oftentimes, a mix of different types of fraud serves as an effective mask for newer, more advanced fraud bots. The only way to accurately detect and block these advanced bots is through the strategic application of machine learning across massive data sets, followed by extensive hands-on macro and micro level analysis to validate the learning. Like a good anti-virus solution, this must be an ongoing process, identifying and blocking the latest threats as they appear.
We look forward to sharing more about our findings, as well as this new hybrid of behavior and distribution modeling over the coming weeks.