Insights on the Pros and Cons of LTV-Based Predictive Models
Building an LTV model to predict Return on Ad Spend (ROAS) could be overwhelming due to the sheer complexity and multiple prediction concepts. There are obvious differences in the way different types of apps retain and monetize users; just think of how distinct in-app purchase games, subscription-based apps, and e-commerce businesses are. It’s clear that there cannot be a one-size-fits-all LTV model.
To better understand the complexities, we’ve spoken to a number of experts from both gaming and non-gaming companies including Rovio, Hutch Games, Wargaming, Pixel Federation, Joom, Wolt, Blinkist, and Kiwi.com. The topics discussed were as follows:
- Which LTV models did you use?
- How did your LTV model evolve over time?
- Who in the company is responsible for handling the model?
- What’s your North Star metric in the User Acquisition?
- What is your stance on UA automation and future trends?
“At Joom, it takes 2 months from the initial order to receive goods, so technically speaking, we don’t know sooner if more orders from a specific customer will follow. Obviously, we need a lifetime value prediction soon after acquiring the traffic and for that, we use the initial set of behavioral data from the app.”
Alexandra Lomakina, Senior Analyst @Joom
Based on our interviews, we can say that there are three main approaches, or “schools of thought,” for LTV predictions:
1) Retention-driven / ARPDAU Retention Model
|Concept||Model a retention curve based on a couple of initial retention datapoints, then calculate the average number of active days per user (for Day 90, D180, etc.) and multiply that by an Average Revenue Per Daily Active User (ARPDAU) to get the predicted LTV.|
D1 / D3 / D7 retention is 50% / 35% / 25%. After fitting these data points to a power curve and calculating its integral until D90, we find that the average number of active days is 5. Knowing that the ARPDAU is 40 cents, the predicted D90 LTV would equal 2 USD.
|Good fit||High-retention apps (games such as MMX Racing). Easy to set up, can be useful especially if there is not enough data for other models.|
|Bad fit||Low-retention apps (e-commerce such as Kiwi.com or Joom) that have an insufficient number of retention data points to sustain this model.|
|Concept||Calculating a coefficient (D90 LTV / D3 LTV) from historical data, and then for each cohort, applying this coefficient to multiply the real D3 LTV to get a D90 LTV prediction.|
|Example||After the first 3 days, ARPU for our cohort is 20 cents. From historical data, we know that D90/D3 = 3. The predicted D90 LTV would thus be 60 cents (20 cents ARPU*3). In case there’s not enough historical data to calculate a reliable ratio (i.e. we only have 50 days of data and we want a D180 LTV prediction, or we have too few samples of the D180 LTV), an initial estimate can be made using the existing data points, which is then refined continuously as more data comes in. But for these cases, it’s necessary to take such estimates with a big grain of salt.|
|Good fit||“Standard” types of apps including many game genres (Angry Birds 2) or e-commerce apps (Joom).|
|Bad fit||Subscription-based apps with 1+ weeks long free trial. Many days pass before purchases can happen, and as this method is purchase-based, it’s not a good fit due to the long delay between the install and the first moment where such a prediction would be possible.|
3) Behavior-driven / user-level predictions
|Concept||Collecting a significant volume of data from app’s users (session and engagement data, purchases, in-app messaging, geo / device type etc.) and processing them using regressions and machine learning to define which actions or action combinations are the best “predictors” of a new user’s value. Algorithm then assigns a value to each new user based on a combination of his/her characteristics (platform, country, UA channel) and actions performed (often during a few initial sessions or days).|
User A had 7 long sessions on day 0 and in total, 28 sessions by day 3; he also visited the pricing page and stayed there over 60 seconds.
|Good fit||Any app with access to an experienced data science team, engineering resources, and lots of data. Could be one of the very few viable options in some cases (i.e. subscription apps with a long free trial).|
|Bad fit||Could be overkill for many small- and medium-sized apps. Most often, far simpler approaches can yield similar results and are much easier to maintain (and be understood by the rest of the team).|
Choosing the Right Model for Different App Types
Each app and each team have their own mix of parameters and considerations that should enter the selection process. On the product side, it’s a unique combination of app type and category, monetization model, user purchase behavior, available data (and its variance), and so on.
On the team’s side, it’s the capacity, engineering proficiency, knowledge, and the time available before the working model is required by the UA team.
In this section, we’ll outline several simplified examples of the selection process. These are based on real-life cases of three types of apps: a free-to-play (F2P) game, a subscription-based app, and an e-commerce app.
Let’s explore two cases of subscription-based apps, each with a different type of paywall — a hard gate and limited-time free trial:
- The hard paywall: Paid subscription starts very often during day 0 (e.g. 8fit).
This is great – we will have a very precise indication of the total number of subscribers already after the first day (e.g. let’s say 80% of all subscribers will do so on D0 and the rest – 20% – sometime in the future).
Provided that we already know our churn rates, and consequently our ARPPU, we could predict cohorts’ LTV simply and easily by just doing a multiplication of (number of payers)*(ARPPU for a given user segment)*(1,25 as the coefficient representing the additional estimated 20% of users expected to pay in the future).
Limited-time free trial: In this case, a percentage of users will convert to become paying subscribers after the trial is over (e.g. Blinkist, Headspace). The problem is that UA managers have to wait until the trial is over to understand conversion rates. This lag can be especially problematic when testing new channels and GEOs.
Behavioral predictions could be handy here. Even with a moderate volume of data and simple regressions, it’s often possible to identify decent predictors. For example, we could learn that users that enter the free trial and have at least 3 sessions per day during the first 3 days after installation will convert to subscription in 75% of cases.
Though far from perfect, the predictor above could be sufficiently precise for UA decision making and provide nice actionability for the UA team before more data is collected and a proper model trained.
Paywall types and designs can be greatly influenced by the need to quickly evaluate traffic. It’s super helpful to find out whether the user will convert (or not) as quickly as possible to understand campaign profitability and be able to react quickly. We’ve seen this become one of the deciding factors when determining a type of paywall in several companies.
Free-to-play (F2P) games tend to have a high retention rate, and a significant amount of purchases. Also, the first purchase often takes place during the first day or a couple of days after the install. This makes them viable candidates for the “retention” and “ratio” models.
- Casual game (Angry Birds 2 or Diggy’s Adventure): A good fit for in-app purchase based games is the ‘Ratio model’, where it should be possible to quite confidently predict D(x)LTV after 3 days – we should already have identified most of our paying users by then. For some games that monetize via ads, the retention-based approach could also be considered.
- Hardcore game (World of Tanks or MMX Racing): Hardcore game users’ ARPPU distribution can be significantly skewed when whales, or the highest-spending users, can spend x-times more than others. The ‘ratio model’ could still work in these cases, but should be enhanced to take into account different spend levels for different spender types. Here, a “user type” variable would assign different LTV values to users based on their spending behavior (i.e. how much they spent, how many purchases, what starter pack they bought, etc.).
Depending on the data, an initial prediction could be made after day 3, with another pass a bit later (day 5 or day 7), after user spending levels will have been uncovered.
E-commerce apps commonly have unique retention patterns, as launching them is often tied to an existing purchase intent which does not happen too frequently. We can thus conclude that using the ’retention-based model’ is generally not a good fit for such apps. Instead, let’s explore two examples:
Airline ticket reseller (e.g. Kiwi.com): The time from install to purchase in travel is significant, sometimes months long. Given that purchases and revenue are distributed over an extended time frame, the “ratio” or “retention” models won’t work in most cases. Therefore, we should seek to find behavioral cues and uncover potential predictors in the first post-install session (as this is often the only information we’ll have at our disposal).
Using these cues, and given that there is sufficient data, we’d estimate the probability a user would ever buy a ticket, and multiply it with an ARPPU for a relevant combination of his characteristics (platform, country of origin etc.)
Online marketplace (e.g. Joom): Users tend to make their first purchase soon after an install. What’s more, that first purchased item often takes considerable time to be shipped. As a result, customers tend to wait for the first shipment to evaluate the service before committing to another purchase. Waiting for the “second purchase” batch of data would render predictions unusable due to a long delay, and subsequently limits any calculations to the initial data.
Depending on when do users place their orders (let’s say a good majority of users to ever order will do so in the first 5 days), we can use the ratio method (D90/D5) and multiply the result by another coefficient that would account for additional future (secondary) purchases.
From MVP to complex models
Every data analyst we talked to at big publishers agreed that it’s important to start your predictions path with a simple “Minimum Viable Product” (MVP). The idea is to verify initial assumptions, learn more about the data, and gradually build a model. That usually means adding more variables to have more granular and precise models (e.g. k-factor, seasonality, and ad revenue, in addition to initial segmentation by platform, country, and UA channel).
“Complex is not a synonym for “good,” people can get frustrated fast when UA managers are blocked with not having the data because someone is doing complicated stuff.”
Anna Yukhtenko, Data Analyst @Hutch Games
In reality, we have found that companies tend to stick to conceptually simple models. This was a bit surprising, as we expected that once the product takes off, data teams eagerly begin spitting clouds of fire, machine learning algorithms, and AI to get on par with what we believed was an industry standard. We were mistaken.
Although many see the value in sophisticated models and have tested them in the past, they eventually sided with simpler ones. There are three main reasons:
- Cost/Benefit of advanced models. The cost/benefit ratio of creating and maintaining a complex model just doesn’t add up. If a sufficient level of confidence for day-to-day operations can be reached with the simpler models, why bother?
- Engineering time to create/maintain. Creating an advanced model can swallow many engineering hours, and even more to manage it – this is a huge issue in smaller teams. Quite often, the BI department just has no capacity to devote to the marketing team, which leaves the marketers themselves alone in an uneven battle against statistics and data engineering.
- Continuous changes. Every product version is different and monetizes differently (adding or removing features could have a huge effect, for example); there’s different seasonality in different countries; market-wide effects that need to be considered; and so on. Changes need to be made on the fly, and making changes to a complex model can be painful and slow, which can prove disastrous in a fast-moving mobile environment with continuous media buying. It’s so much easier to tweak a simple model, and marketers can sometimes do it themselves the way they want if they understand how it works.
That being said, for a certain subset of apps, a behavior-based model might be the only good fit; however, in this case, an experienced engineering and data science team should be at hand.
Another data set that is gaining traction is ad-generated LTV models with user-level ad revenue estimates. For more on this subject, see chapter 4.
Teams & Responsibilities
In general, designing, setting-up, and adapting a predictive LTV model should be a job for an analytics / data science team (if the company has one). Ideally, there are two roles: an experienced analyst with an overreach to marketing that can advise on the strategy and tactical levels and decide which model should be used and how; and a dedicated analyst which then “owns” LTV calculations and predictions on a day-to-day basis.
The “day-to-day analyst” must continuously monitor the model and keep an eye out for any significant changes and fluctuations. For example, if weekly projected revenues do not match reality and are not within pre-set boundaries, a tweak in the model might be necessary now, not after weeks or months.
“It’s a team effort. We created something like an early warning system where, once a month, we get together and walk through all the assumptions that go into the model and check if they still hold true. So far, we have around 12 major assumptions (e.g. value of incremental organics, seasonality, etc.), which we control to make sure we’re on the right track.”
Tim Mannveille, Director of Growth & Insight @Hutch Games
Once prediction results are calculated, they’re automatically passed over to and used by the UA team. UA managers most often simply rely on these results and report inconsistencies, but they should try to take it up a notch so they can better challenge and assess the models in use (on a general level; understanding the intricacies behind a complex model and its calculations is not required).
To outsource or not to outsource?
Generally speaking, we can look at short and long term goals. Initially, outside help can be invaluable. Collecting the right data, setting up a well-engineered pipeline, and selecting and applying a basic model are tasks that can be easily outsourced. Outsourcing in this case can certainly jumpstart the process, especially if a company has limited knowledge of the topic.
However, in the longer term, given the product is verified to be viable and more advertising dollars are being spent on more networks, an internal team should take over. The model is a living organism that has to be constantly monitored and adjusted to new circumstances such as an updated monetization model or a significant change in the conversion funnel.
“Initially, we were thinking of using 3rd party marketing analytics and just plug-in an LTV model built internally at our BI department. Later, we’ve decided to go the extra mile and invest in a full in-house tool to achieve the desired level of customization and flexibility we were seeking with our level of spend.”
Elena Tsarkova, User Acquisition Manager @Wooga
Marketing pros interviewed for this chapter:
- Teemu Rautiainen, Elif Buyukcan, Kasim Zorlu and Leonard Seffer from Rovio Entertainment Alexandra Lomakina from Joom
- Fredrik Lucander from Wolt
- Matej Lancaric from Boombit (formerly at Pixel Federation)
- Gessica Bicego from Blinkist
- Anna Yukhtenko and Tim Mannveille from Hutch Games