In 2021, it is a given that marketers must be data-driven.
It is also obvious that gaining a competitive edge in a hyper competitive space requires a deep dive into data.
But what does this mean? What separates the savviest marketers out there from the crowd?
There are multiple answers to this question, but one key element is whether predictive modeling is applied to inform decision-making. It is one thing to measure events, monitor performance over an extended period of time, and optimize; it is another thing to analyze a massive amount of data, develop predictive models, and then apply those models to make decisions early on in a campaign.
In this guide – a collaboration between AppsFlyer, digital marketing agency AppAgent, and mobile marketing consultancy Incipia – we will explore how data-driven marketers can take their data skills to the next level by using predictive modeling to gain that coveted competitive edge.
Predictive modeling: Basic concepts and measurement set-up
Why build models?
There are numerous benefits to predictive modeling in mobile marketing, but we narrowed it down to two key marketing activities:
Knowing your typical user behavior and the early milestones that separate users with high potential and users with low potential can be useful on both the acquisition and re-engagement fronts.
For example, if you know that a user needs to generate X by day 3 to make a profit after day 180, and that number comes in under your benchmark, you know that you will need to adjust bids, creatives, targeting, or other things to improve the cost or quality of users acquired, or else improve your monetization trends. If it is over your benchmark, then you can feel confident in raising budgets and bids to get even more from where they came from.
If users with some potential are nonetheless under-performing early on, you can focus your re-engagement efforts (whether through paid or owned channels) on those users to help them catch up with your curve.
If you identify at-risk customers, you can try to re-engage with them long before you have to win them back. If your data shows there is a very low probability of long-term success, you can exclude them from your paid campaigns altogether to prevent further investment in the wrong users.
What should I measure?
To understand what you need to measure to be best positioned to get your predictions right, let’s explore which data points are useful for which means, and which are not.
Just like the square and rectangle relationship, while all metrics are data points, not all metrics are key performance indicators (KPIs).
But just because a metric isn’t a KPI, doesn’t mean that the metric is not useful. Metrics are easier to calculate and mature much sooner than KPIs, which tend to take longer or involve complex formulas.
Legacy metrics (low confidence in predicting profit, fastest availability):
Required data points: impressions, clicks, attributed installs.
Early indicator metrics (medium confidence in predicting profit, fast availability):
In the age of down-funnel focus, an install is no longer a sufficient KPI. That said, the following metrics, while not useful for use in predicting profit, are still useful as early indicators informing marketers whether their campaigns stand a chance of turning a profit.
- Cost per install
- Retention rate
Required data points: cost, attributed installs, app opens (retention report)
With the exception of retention rate, metrics tend to be tied to a marketing model rather than your business model, and as such, are not useful for determining whether the users you acquired will turn a profit for your business.
If you pay $100 per click or per install, chances are that you’re not going to turn a profit. If your CTR is .05%, chances are that the auction mechanics will force you to pay a high rate per install, again leaving you with less margin to turn a profit.
Where metrics fail to support predictions is when you try to tighten your confidence range to a finer accuracy, such as where the line of profitability is within a $2 to $6 CPI range.
It is important to sub-divide the common KPIs into two types:
Tier 2 KPI confident predictors (medium-high confidence in predicting profit, slow availability):
Useful to serve as early benchmarks of profit, offering more confidence than leading indicators (metrics), but taking more time to mature than leading indicators and possessing less confidence than tier 1 KPIs.
- Customer acquisition cost (paying user)
- Cost or conversion of key actions (e.g. ratio of games in first day played, or ratio of content views during first session)
- Time-based cost or conversion of key actions (e.g. cost per number of games played in first day or cost per content view during first session)
- Cost per day X for retained user: Total spend per day times the number of retained users on that day.
- Vertical specific in-app events: e.g. tutorial completion, completing level 5 on day 1 (gaming), number of product pages viewed in 1st session, number of sessions in 24 hours (shopping), etc.
Required data points: cost, attributed installs, app opens (retention report), in-app events configured and measured, session data (time stamps, features used etc.)
For most business models, these KPIs cannot serve as confident predictors because, while they do account for cost and events commonly correlated with profit, they miss the full monetization side of the profit equation, given that app opens don’t always equal in-app spend, and paying users may buy more than once.
Tier 1 KPI confident predictors: early revenue and consequent ROAS as indication of long term success (high confidence in predicting profit, slowest availability): These KPIs either take a longer time to fully mature or else involve complex processes to determine. However, they tie directly into your business model, and as such, are suitable for predicting your marketing campaigns’ profitability.
Required data points: cost, attributed installs, app opens, in-depth revenue measurement (IAP, IAA, subscription, etc.)
While ROAS is easier to calculate, it requires weeks or months of time for users to continue generating revenue curves. LTV, while immediately providing an estimate of profit alongside CPI or CAC, is a complex model that can easily lose accuracy depending on applications and assumptions.
To wrap up, here’s where each approach is situated in the following chart:
Insights on the pros and cons of LTV-based predictive models
Building an LTV model to predict return on ad spend (ROAS) could be overwhelming due to the sheer complexity and multiple prediction concepts.
There are obvious differences in the way different types of apps retain and monetize users; just think of how distinct in-app purchase games, subscription-based apps, and e-commerce businesses are. It’s clear that there cannot be a one-size-fits-all LTV model.
Alexandra Lomakina, Senior Analyst at Joom notes the need for a model like this:
“At Joom, it takes 2 months from the initial order to receive goods, so technically speaking, we don’t know sooner if more orders from a specific customer will follow. Obviously, we need a lifetime value prediction soon after acquiring the traffic and for that, we use the initial set of behavioral data from the app.”
To better understand the complexities, we’ve spoken to a number of experts from both gaming and non-gaming companies including Rovio, Hutch Games, Wargaming, Pixel Federation, Joom, Wolt, Blinkist, and Kiwi.com.
The topics discussed were as follows:
- Which LTV models did you use?
- How did your LTV model evolve over time?
- Who in the company is responsible for handling the model?
- What’s your North Star metric in the User Acquisition?
- What is your stance on UA automation and future trends?
Based on our interviews, we can say that there are three main approaches, or “schools of thought,” for LTV predictions”
1. Retention-driven / ARPDAU retention model
|Concept||Model a retention curve based on a couple of initial retention datapoints, then calculate the average number of active days per user (for Day 90, D180, etc.) and multiply that by an Average Revenue Per Daily Active User (ARPDAU) to get the predicted LTV.|
|Example||D1 / D3 / D7 retention is 50% / 35% / 25%. After fitting these data points to a power curve and calculating its integral until D90, we find that the average number of active days is 5. Knowing that the ARPDAU is 40 cents, the predicted D90 LTV would equal 2 USD.|
|Good fit||High-retention apps (games such as MMX Racing). Easy to set up, can be useful especially if there is not enough data for other models.|
|Bad fit||Low-retention apps (e-commerce such as Kiwi.com or Joom) that have an insufficient number of retention data points to sustain this model.|
|Concept||Calculating a coefficient (D90 LTV / D3 LTV) from historical data, and then for each cohort, applying this coefficient to multiply the real D3 LTV to get a D90 LTV prediction.|
|Example||After the first 3 days, ARPU for our cohort is 20 cents. From historical data, we know that D90/D3 = 3. The predicted D90 LTV would thus be 60 cents (20 cents ARPU*3). In case there’s not enough historical data to calculate a reliable ratio (i.e. we only have 50 days of data and we want a D180 LTV prediction, or we have too few samples of the D180 LTV), an initial estimate can be made using the existing data points, which is then refined continuously as more data comes in. But for these cases, it’s necessary to take such estimates with a big grain of salt.|
|Good fit||“Standard” types of apps including many game genres (Angry Birds 2) or e-commerce apps (Joom).|
|Bad fit||Subscription-based apps with 1+ weeks long free trial. Many days pass before purchases can happen, and as this method is purchase-based, it’s not a good fit due to the long delay between the install and the first moment where such a prediction would be possible.|
3. Behavior-driven / user-level predictions
|Concept||Collecting a significant volume of data from app’s users (session and engagement data, purchases, in-app messaging, geo / device type etc.) and processing them using regressions and machine learning to define which actions or action combinations are the best “predictors” of a new user’s value. Algorithm then assigns a value to each new user based on a combination of his/her characteristics (platform, country, UA channel) and actions performed (often during a few initial sessions or days).|
|Example||User A had 7 long sessions on day 0 and in total, 28 sessions by day 3; he also visited the pricing page and stayed there over 60 seconds. The probability of him making a future purchase is, according to the regression analysis and machine learning-based algorithm, 65%. With an ARPPU being 100 USD, his predicted LTV is therefore 65 USD.|
|Good fit||Any app with access to an experienced data science team, engineering resources, and lots of data. Could be one of the very few viable options in some cases (i.e. subscription apps with a long free trial).|
|Bad fit||Could be overkill for many small- and medium-sized apps. Most often, far simpler approaches can yield similar results and are much easier to maintain (and be understood by the rest of the team).|
Choosing the right model for different app types
Each app and each team have their own mix of parameters and considerations that should enter the selection process. On the product side, it’s a unique combination of app type and category, monetization model, user purchase behavior, available data (and its variance), and so on.
On the team’s side, it’s the capacity, engineering proficiency, knowledge, and the time available before the working model is required by the UA team.
In this section, we’ll outline several simplified examples of the selection process.
These are based on real-life cases of three types of apps: a free-to-play (F2P) game, a subscription-based app, and an e-commerce app.
The hard paywall
Paid subscription starts very often during day 0 (e.g. 8fit). This is great – we will have a very precise indication of the total number of subscribers already after the first day (e.g. let’s say 80% of all subscribers will do so on D0 and the rest – 20% – sometime in the future).
Provided that we already know our churn rates, and consequently our ARPPU, we could predict cohorts’ LTV simply and easily by just doing a multiplication of (number of payers)(ARPPU for a given user segment)(1,25 as the coefficient representing the additional estimated 20% of users expected to pay in the future).
Limited-time free trial
In this case, a percentage of users will convert to become paying subscribers after the trial is over (e.g. Blinkist or Headspace).
The problem is that UA managers have to wait until the trial is over to understand conversion rates. This lag can be especially problematic when testing new channels and geos.
Behavioral predictions could be handy here.
Even with a moderate volume of data and simple regressions, it’s often possible to identify decent predictors.
For example, we could learn that users that enter the free trial and have at least 3 sessions per day during the first 3 days after installation will convert to subscription in 75% of cases.
Though far from perfect, the predictor above could be sufficiently precise for UA decision making and provide nice actionability for the UA team before more data is collected and a proper model trained.
Paywall types and designs can be greatly influenced by the need to quickly evaluate traffic. It’s super helpful to find out whether the user will convert (or not) as quickly as possible to understand campaign profitability and be able to react quickly. We’ve seen this become one of the deciding factors when determining a type of paywall in several companies.
Free-to-play (F2P) games
These types of games tend to have a high retention rate, and a significant amount of purchases.
Also, the first purchase often takes place during the first day or a couple of days after the install. This makes them viable candidates for the “retention” and “ratio” models.
- Casual game (e.g. Angry Birds 2 or Diggy’s Adventure):
A good fit for in-app purchase based games is the ‘Ratio model’, where it should be possible to quite confidently predict D(x)LTV after 3 days – we should already have identified most of our paying users by then. For some games that monetize via ads, the retention-based approach could also be considered.
- Hardcore game (e.g. World of Tanks or MMX Racing):
Hardcore game users’ ARPPU distribution can be significantly skewed when whales, or the highest-spending users, can spend x-times more than others.
The ‘ratio model’ could still work in these cases, but should be enhanced to take into account different spend levels for different spender types. Here, a “user type” variable would assign different LTV values to users based on their spending behavior (i.e. how much they spent, how many purchases, what starter pack they bought, etc.).
Depending on the data, an initial prediction could be made after day 3, with another pass a bit later (day 5 or day 7), after user spending levels will have been uncovered.
These types of apps commonly have unique retention patterns, as launching them is often tied to an existing purchase intent which does not happen too frequently.
We can thus conclude that using the ’retention-based model’ is generally not a good fit for such apps.
Instead, let’s explore two examples:
- Airline ticket reseller (e.g. Kiwi.com)
The time from install to purchase in travel is significant, sometimes months long. Given that purchases and revenue are distributed over an extended time frame, the “ratio” or “retention” models won’t work in most cases. Therefore, we should seek to find behavioral cues and uncover potential predictors in the first post-install session (as this is often the only information we’ll have at our disposal).
Using these cues, and given that there is sufficient data, we’d estimate the probability a user would ever buy a ticket, and multiply it with an ARPPU for a relevant combination of his characteristics (platform, country of origin etc.).
- Online marketplace (e.g. Joom)
Users tend to make their first purchase soon after an install. What’s more, that first purchased item often takes considerable time to be shipped. As a result, customers tend to wait for the first shipment to evaluate the service before committing to another purchase. Waiting for the “second purchase” batch of data would render predictions unusable due to a long delay, and subsequently limits any calculations to the initial data.
Depending on when do users place their orders (let’s say a good majority of users to ever order will do so in the first 5 days), we can use the ratio method (D90/D5) and multiply the result by another coefficient that would account for additional future (secondary) purchases.
From MVP to complex models
Every data analyst we talked to at big publishers agreed that it’s important to start your predictions path with a simple “Minimum Viable Product” (MVP).
The idea is to verify initial assumptions, learn more about the data, and gradually build a model.
That usually means adding more variables to have more granular and precise models (e.g. k-factor, seasonality, and ad revenue, in addition to initial segmentation by platform, country, and UA channel).
Anna Yukhtenko, Data Analyst at Hutch Games agrees:
“Complex is not a synonym for “good,” people can get frustrated fast when UA managers are blocked with not having the data because someone is doing complicated stuff.”
In reality, we have found that companies tend to stick to conceptually simple models. This was a bit surprising, as we expected that once the product takes off, data teams eagerly begin spitting clouds of fire, machine learning algorithms, and AI to get on par with what we believed was an industry standard. We were mistaken.
Although many see the value in sophisticated models and have tested them in the past, they eventually sided with simpler ones.
There are three main reasons:
- Cost/benefit of advanced models. The cost/benefit ratio of creating and maintaining a complex model just doesn’t add up. If a sufficient level of confidence for day-to-day operations can be reached with the simpler models, why bother?
- Engineering time to create/maintain. Creating an advanced model can swallow many engineering hours, and even more to manage it – this is a huge issue in smaller teams. Quite often, the BI department just has no capacity to devote to the marketing team, which leaves the marketers themselves alone in an uneven battle against statistics and data engineering.
- Continuous changes. Every product version is different and monetizes differently (adding or removing features could have a huge effect, for example); there’s different seasonality in different countries; market-wide effects that need to be considered; and so on. Changes need to be made on the fly, and making changes to a complex model can be painful and slow, which can prove disastrous in a fast-moving mobile environment with continuous media buying. It’s so much easier to tweak a simple model, and marketers can sometimes do it themselves the way they want if they understand how it works.
That being said, for a certain subset of apps, a behavior-based model might be the only good fit; however, in this case, an experienced engineering and data science team should be at hand.
Another data set that is gaining traction is ad-generated LTV models with user-level ad revenue estimates.
For more on this subject, see chapter 4.
Teams and responsibilities
In general, designing, setting-up, and adapting a predictive LTV model should be a job for an analytics / data science team (if the company has one).
Ideally, there are two roles:
- An experienced analyst with an overreach to marketing that can advise on the strategy and tactical levels and decide which model should be used and how
- A dedicated analyst which then “owns” LTV calculations and predictions on a day-to-day basis
The “day-to-day analyst” must continuously monitor the model and keep an eye out for any significant changes and fluctuations.
For example, if weekly projected revenues do not match reality and are not within pre-set boundaries, a tweak in the model might be necessary now, not after weeks or months.
Tim Mannveille, Director of Growth & Insight at Hutch Games notes:
“It’s a team effort. We created something like an early warning system where, once a month, we get together and walk through all the assumptions that go into the model and check if they still hold true. So far, we have around 12 major assumptions (e.g. value of incremental organics, seasonality, etc.), which we control to make sure we’re on the right track.”
Once prediction results are calculated, they’re automatically passed over to and used by the UA team.
UA managers most often simply rely on these results and report inconsistencies, but they should try to take it up a notch so they can better challenge and assess the models in use (on a general level; understanding the intricacies behind a complex model and its calculations is not required).
To outsource, or not to outsource?
Generally speaking, we can look at short and long term goals. Initially, outside help can be invaluable.
Collecting the right data, setting up a well-engineered pipeline, and selecting and applying a basic model are tasks that can be easily outsourced. Outsourcing in this case can certainly jumpstart the process, especially if a company has limited knowledge of the topic.
However, in the longer term, given the product is verified to be viable and more advertising dollars are being spent on more networks, an internal team should take over.
The model is a living organism that has to be constantly monitored and adjusted to new circumstances such as an updated monetization model or a significant change in the conversion funnel.
Elena Tsarkova, User Acquisition Manager at Wooga highlights this noting:
“Initially, we were thinking of using 3rd party marketing analytics and just plug-in an LTV model built internally at our BI department. Later, we’ve decided to go the extra mile and invest in a full in-house tool to achieve the desired level of customization and flexibility we were seeking with our level of spend.”
Marketing pros interviewed for this chapter:
- Teemu Rautiainen, Elif Buyukcan, Kasim Zorlu and Leonard Seffer from Rovio Entertainment Alexandra Lomakina from Joom
- Fredrik Lucander from Wolt
- Matej Lancaric from Boombit (formerly at Pixel Federation)
- Gessica Bicego from Blinkist
- Anna Yukhtenko and Tim Mannveille from Hutch Games
Methods for assessing mobile marketing profitability with Excel
If you think you’ve entered into the realm of advanced Excel by using pivot tables, calculated fields, conditional formatting, and vlookups, then you may be surprised to know that you’re missing out on an even more powerful trick in the Excel playbook.
Even better, this trick can be used to predict your mobile marketing campaigns’ profitability!
By using a scatter plot and bit of algebra, you can turn an Excel trendline equation into a powerful tool for identifying early on the point at which your marketing campaigns prove they are likely to turn a profit.
This method can help you graduate from hunches to data-driven decision-making and raise your confidence in weekly reporting.
Predicting which week 0 ROAS predicts 100% ROAS at 6 months
While LTV done right is a great predictor, ROAS (particularly in the first week of a user’s lifetime) is a widely used metric for measuring profit due to its broad accessibility, so let us continue using ROAS as our confident predictor.
In particular, we will use Week 0 ROAS (revenue in the first week of acquiring users/cost to acquire those users) as our confident predictor, which is a cohorted, apples-to-apples method of benchmarking ad performance each week. We’ll use Week 0 ROAS to predict whether we break even on our ad spend with 100% ROAS after 6 months.
1. The first step to using Excel for predicting profit is to ensure you have enough Week 0 and 6-month data points.
While you technically can draw a slope and make a prediction for any point on that slope with two data points, your prediction won’t be very good with so few observations powering it.
The ideal number of observations depends on a multitude of factors, such as your desired confidence level, the correlations in the dataset, and time constraints, but as a rule of thumb for Week 0 ROAS-based predictions, you should shoot for at least 60 pairs of Week 0 and 6-month ROAS observations.
Moreover, it’s also vital to include enough observations that have reached the goal level you set.
If you have 60 data points to plot, but only 2 points where 6-month ROAS crossed 100%, then your equation model won’t be powered by enough of an understanding of what inputs are required to reach this breakeven point. In this case, for all your model knows, the requirement to get to 100% ROAS after 6 months could be another 2 full ROAS percentage points or 5 percentage points, which is a very wide range and not conducive to predicting.
2. Once you have gathered enough observations of the goal level, the second step is to split your data set into two groups, one for training and one for prediction.
Place the lion’s share of data (~80%) in the training group; later on, you will use the prediction group to actually test your model’s accuracy of predicting the 6-month ROAS, given the Week 0 ROAS.
3. The third step is to use a scatter plot to graph the data, with the Week 0 ROAS on the x-axis and 6-month ROAS on the y-axis.
Add a trendline and add the equation and R-squared settings.
Graph the training data using a scatter plot.
Right click on a data point and add a trendline.
Add the trendline equation and R-squared.
4. Step four involves using the y = mx + b linear equation to solve for the equation’s x value (Week 0 ROAS) when the y value (6-month ROAS) is 100%.
Rearranging the equation using algebra is done as follows:
- y = 9.2695x – .0936
- 1 = 9.2695x – .0936
- 1 + .0936 = 9.2695x
- 1.0936 = 9.269x
- x = 1.0936 / 9.269
- x = 11.8%
Thus, we calculate that the answer to the question of how to predict profit at 6-month is that your ROAS must be greater than 11.8% in the first week.
If your Week 0 ROAS comes in under this number, you know that you will need to adjust bids, creatives, targeting, or something else to improve the cost or quality of users acquired, or else improve your monetization trends.
If your Week 0 ROAS is over this number, then you can feel confident in raising budgets and bids!
5. Step five is where you use your prediction segment of the full data set to assess how well your model was able to predict actual outcomes.
This can be assessed using the mean absolute percentage error (MAPE), which is a calculation that divides the absolute value of the error (the actual value minus the predicted value) by the actual value.
The lower the sum of the MAPE, the better the predictive power of your model.
There is no rule of thumb for a good MAPE number, but generally, the more data your model has and the more correlated the data is, the lower the MAPE and the better your model’s prediction power will be.
If your MAPE is large and the error rates are unacceptable, it may be necessary to use a more complex model. While more difficult to manage, more complex models (especially involving R and python) can increase the prediction power of your analysis.
And there you have it: a framework for predicting marketing campaign profitability.
But don’t stop reading yet!
This guide has more goodness to come.
Improve your predictions
For the curious readers out there, the question on your minds may be whether the default, linear trendline is the best to use for predicting profit.
You may even try out a few more trendlines and discover that the R-squared (a measure of the fit of the equation to your data) improves with other equations, raising the profile of this question even more.
While the marketing adage of “it depends” applies again in selecting the best trendline, another marketing adage is useful as a response: KISS (keep it simple, stupid). If you are not a statistician or a math enthusiast, your best bet is to use the simpler trendlines (linear).
Why is this an issue?
As a simple example, consider the addition of unexpected data into the model. In the following two scenarios, see how a lower Week 0 ROAS maturing unexpectedly well, or a higher Week 0 ROAS maturing unexpectedly poorly affects each trendline model’s accuracy (assessed using the MAPE).
Using the MAPE to compare the different trendline-based models here shows that, while the linear and exponential models are not the most accurate in any case, they are the most consistent.
As a final note, check out this list of other questions that can prove useful for ensuring your prediction analysis is formed on solid ground:
- Did you continue feeding your model to keep it trained on the most relevant data? Make sure you checked to see whether your model’s predictions come to fruition based on new observations, or close to it.
- Do you have too much variation or, conversely, overfit? A very low R-squared or a very high R-squared are indicative of a problem in your model’s ability to predict new data accurately.
- Did you use the right KPI? Go ahead and test different KPIs (e.g. more or fewer days of ROAS or LTV) and use the MAPE to compare the profit prediction power of each! You may be surprised at how poorly correlated the standard measures prove to be.
- Did your leading indicators or early benchmarks experience significant change? This can be a sign that something significant has shifted in the real world, and that trouble is brewing for your model’s ability to predict profit accurately moving forward.
- Did you apply segmentation to the data? Segmenting users into more homogenous groups is one great way to reduce noise and improve the predictive power of your model. For example, don’t apply the same model to all users across all channels and geographies if those users have significantly different retention and cost trends.
- Are you considering the influences of time?
Most marketers are aware of the influences of seasonality being a reason for which predictions can break down, but the lifecycle of your app/campaign/audience/creative can also influence the ability of your model to make accurate predictions.
Adding another piece to the pie: Predicting in-app ad LTV
In-app advertising (IAA) has become increasingly popular, accounting for at least 30% of app revenue in 2018.
Hyper-casual and casual games, in addition to many utility apps, naturally leverage this revenue stream as their primary source of monetization.
Even developers who had been completely reliant on in-app purchases (IAP) have started monetizing with ads. As a result, we can see that many apps are now successfully combining both revenue streams to maximize their users’ LTV.
For example, look no further than King’s Candy Crush, or Rovio’s Angry Birds 2.
For hybrid monetization, LTV is composed of two parts:
- In-app purchases/subscription LTV
Revenue actively generated by a user who spends money to purchase in-game or in-app currency, special items or extra services, or a paid subscription.
- In-app advertising LTV
Revenue passively generated by a user that views and/or interacts with ads (banners, videos, interstitials, etc.)
The data challenge
Ideally, marketers would be able to understand the nominal value of every single impression; that would practically make it a “purchase.”
Most importantly, it would inform user (device) level LTV segmented by our standard dimensions – platform, channel, country, etc.
After gathering sufficient data, we’d be able to create prediction models similar to what we’ve already described in chapter 2 for in-app purchases. But in the real world, it’s not that simple – even calculating in-app ad LTV on its own is difficult because of the volume and structure of revenue data marketers are able to get their hands on.
To list a number of issues:
- There’s rarely one source of ads that is being displayed; in reality, there are many, many sources, with an algorithm/tool behind them (ad mediation platforms) that constantly switch sources (and eCPM)
- If one user views 10 ads, it’s quite possible that they came from 5 different sources, each with a completely different eCPM
- Some ad networks pay for actions (install, click) rather than impressions, confusing things even further
- When working with commonly used mediation platforms that offer user-level ad revenue, the number remains an estimate; the underlying ad networks often don’t share this data with them (usually leading to a division of generated revenue to users who viewed the impressions)
- eCPMs can drastically fluctuate over time and it is impossible to predict these changes
In-app ad LTV prediction models
Many companies we interviewed were not actively involved with ad LTV predictions.
Among gaming app marketers who were interested in the topic, none had this actually figured out to a stage they’d be happy using it. Instead, it was more of a work in progress.
|Retention-based/ARPDAU retention model+||Using the ARPDAU retention model; ARPDAU in this case also contains the additional contribution of in-app ad revenue||D1/D3/D7 retention is 50%/35%/25%. After fitting these data points to a power curve and calculating its integral until D90, we learn that the average number of active days is 5. Knowing that the ARPDAU is 50 cents, the predicted D90 LTV is $2.50.|
|Ratio-based method+||Integrating user-level ad revenue estimates into the stack in order to use the ratio method in the same way (i.e. based on coefficients from D1, D3, D7, etc).||ARPU calculated from both in-app purchases and in-app ad revenue is 40 cents after the first 3 days. We know that D90/D7 = 3. Predicted D90 LTV would then be $1.20.|
|Simple multiplication method||Calculating the ratio between in-app purchases and ad revenue to use a multiplier for the total LTV calculation. With more data in place, multiple coefficients can be calculated for platform/country dimensions, as these usually have the biggest impact on the ad vs in-app revenue ratio.|
Link to behavior-based LTV predictions
It’s important to mention another key factor that can heavily influence the potential profitability of app users: cannibalization.
Users spending money by making in-app purchases often have a significantly higher LTV than users that just consume ads.
It is of the utmost importance that their intent is not disrupted by free handy stuff messages. On the other hand, it’s important to incentivize users to watch ads, so they’re often rewarded with in-app currency or bonuses.
If an app contains both rewarded ads and in-app purchases, it is possible that, at a certain point, a player that would otherwise become an IAP spender would not due to a significant reward of in-app currency in return for watching ads.
This is exactly where behavioral predictions come into play – by observing players’ behavior, a machine learning algorithm can determine the likelihood that certain users would become “spenders” and tweak their game/app experience.
The process works as follows:
- All users should start with a no-ad experience while engagement data starts to be collected
- The algorithm continuously calculates a probability of a user becoming a spender
- If this probability is over a set percentage, ads will no longer be shown as more data is gathered (“waiting for the purchase”)
- If the probability falls below a set percentage, it is most likely that this user will never make a purchase; in this case, the app starts showing ads
- Based on players’ longer-term behavior, the algorithm can continue evaluating their behavior while modifying the number of ads and mixing up between different formats.
The bottom line is that calculating user-level in-app ad LTV at the level of precision we’re used to having with in-app purchase LTV will likely continue to be a challenge, at least in the near future.
Most companies will be content with using simple models and approaches that will deliver the optimal cost/benefit ratio when it comes to implementation difficulties and the added value of more precise insights.
Still, it is possible that we’ll experience rapid advances in this area considering the frantic pace of the martech ecosystem and the growing importance of in-app advertising as a key revenue stream for apps.
The contribution method
While well-tuned behavior prediction methods may yield the most accurate results in attributing ad revenue, a simpler and viable method for handling this issue of assigning ad revenue to an acquisition source is based on allocating a channel’s contribution of ad revenue according to some user behavior data point.
Contribution margins work by converting a channel’s contribution to overall user behavior into that channel’s earning margin from the overall ad revenue generated by all users.
The theory is that the more a channel’s acquired users generate actions in an app, the more influential and deserving that channel’s hand in claiming credit for advertising revenues from those users.
The first step involves selecting a data point to use for determining each acquisition source’s ad revenue contribution margin. As a starting point, you can use Excel trendline regression to identify which user behavior KPI best correlates most with changes in advertising revenue.
Note: because the contribution method involves attributing revenue based on a proportion of total activity, you will want to use a data point that is a count-like number of active users in a day, rather than a ratio-like retention rate.
A few options include:
- Total active users
- Total user sessions
- Total session duration
- Ad-measurable data (e.g. ad impressions)
- Total key events (e.g. games played)
Once you have a few data points to observe, scatter plot each data point against total ad revenue per day in order to see where the correlations between changes in user behavior and total ad revenue are strongest. Add the R-squared data point to your graph to identify which data point has the strongest correlation.
There is one downside to this Excel trendline regression method: the less variation in user behavior and ad revenue, the less accurate the model’s ability to observe the strength of correlation between data points.
As a result, you will have less confidence in being able to choose one data point over another.
In this simulated dataset, we see the counts of each data point, per day, as well as the total ad revenue generated per day.
Based on this simulated data, we can see that the event with the best correlation strength appears to be the number of active users based on our R-squared fit metric.
This means that the data point from our set which best explains changes in ad revenue is the number of active users, and therefore we should use the number of active users to attribute ad revenue by channel.
Once you have selected a user behavior KPI, it’s time to calculate the contribution margin.
In Excel, copy down the total ad revenue and total number of active users across all ad channels, per day, and calculate the percentage of the total user actions for each day and channel. Then, multiply each channel’s daily contribution margin by the cumulative ad revenue generated on each day.
This process requires that the user behavior data be measured per channel and accessible every day, so that the contribution margin of all channels can be calculated with each new day’s revenue data.
Note: while we only include four ad channels here for illustrative purposes, you will want to also include your organic and other channel data here, too, in order to fully attribute the daily revenue to daily user behavior.
Above, we can see the calculated ad revenue generated by day, by channel, which allows you to estimate the profitability of each channel.
Note: you will need to revisit your assessment of useful KPIs for ad revenue attribution as user behavior trends and ad revenue monetization data shift, or as new data points become available.
For example, in the above dataset, we can see a second grouping of data points towards the end of the data period (starting roughly January 10th) where there is significantly more ad revenue per day than earlier in the month. This is reflected in the grouping of data towards the top-right of each scatterplot, away from the bottom-left group.
The more complex the dataset, the less accurate this simple Excel regression assessment will be and the greater the need to apply segmentation and a more rigorous analysis.
Best practices for building mobile marketing prediction models
Building predictive models is a complex process with numerous factors affecting its robustness. Clearly, there is a lot of experimentation that’s required to find the right model.
The following are some important tips to remember:
1. Feed the beast
When building data models or systems that are used to guide significant decisions, it’s not only important to build the best system possible, but also to perform ongoing testing to ensure its effectiveness. For both purposes, make sure that you continuously feed your profit prediction model to keep it trained on the most relevant data.
In addition, always check whether your model’s predictions come to fruition based on new observations, or at least close to it.
Not following these steps could mean that a model with an initial useful prediction power could go off the rails depending on seasonality, macro auction dynamics, your app’s monetization trends, or many other reasons.
By observing your leading indicators or early benchmarks and looking for significant changes in data points, you can gauge when your own predictions are likely to break down, too.
For example, if your model was trained on data where the average day 1 retention rate ranged from 40%-50%, but for the stretch of a week, the day 1 retention rate dropped to 30%-40%, this could indicate a need to re-train your model. That might be especially true given that quality signals from the users you most recently acquired have shifted, likely leading to changes in monetization and profit, all else equal.
2. Choose the right KPI for predicting profitability
There are several options to choose from, each with a set of trade-offs in viability, accuracy, and speed to produce recommendations.
Go ahead and test different KPIs (e.g. more or fewer days of ROAS or LTV) and use one or all of the following to compare the profit prediction power of several KPIs:
- A ratio of success-to-failure at satisfactory predicting
- Mean Absolute Percentage Error
You may be surprised at how poorly correlated the standard measures prove to be.
3. Segment your data
Segmenting users into more homogenous groups is not only a great way to improve conversion rate, but also a proven method to reduce noise and improve the predictive power of your model.
For example, applying the same model to both interest-based campaigns and value-based lookalike campaigns could lead to less effective results. The reason for this is that monetization and length of lifetime trends of users from each unique audience target are likely to be significantly different.
4. Remember to factor time
Most marketers are aware of the influences of seasonality on breaking down predictions, but the lifecycle of your app/campaign/audience/creative can also influence the ability of your model to make accurate predictions.
The acquisition cost trends in the first week of a new app launch will be much different from those in the fifth month, the second year, and so on, just as the first thousand dollars in spend in a previously untapped lookalike will be different than the ten-thousandth and fifty thousandth dollar in spend invested into the same lookalike (especially without changing the creative used).