What To Test (P1)

The A/B Testing Playbook For Mobile Game Growth, Part 1: Structuring the Experiment

The Playbook for A/B Testing for Mobile Gaming part 1: Structuring the Experiment. Walkthrough of key considerations when designing an A/B test.

turbine.games

The A/B Testing Playbook For Mobile Game Growth, Part 1: Structuring the Experiment

But there is one important thing this article lacks of: what to test?!

Of course, the correct answer depends on a specific product — but... Not so much. I believe in the what-to-tests approach that is applicable to most games regardless of their genre, plus most suggested have the potential for high impact. And here it is:

Early Stage

Tests you might run on just-released MVP, soft launch or early scale stages.

FTUE

Forced Tutor Duration. Let's skip the basics: we all know that longer tutors are always better, and the forced part is a necessary one. The main question is the optimal duration of a forced part of the tutorial, and that's what you should test in most of the games. Default numbers that fit a lot of genres are: from 15 to 60 minutes.

‣

Getting deeper…

Progress Speed. One of the common answers to “How could we increase engagement?” sounds like “let's show them a lot of new content and drag fast by progress so they won't get bored”. I too used this logic a lot (probably professional deformation), but sometimes got counter-intuitive results when players actually prefer slower progress and less new content/mechanics over time. And ofc it's most crucially in the first raw hours of gameplay.

So, the perfect FTUE tests should target Forced Tutor Duration and Progress Speed over first hours of gameplay. Obviously, control version parameters should be based on the most relevant competitor (probably adjusted to the amount of available content), and variants should be like x1.5 and 0.75 to the control parameter. Specific test configurations should vary based on how much traffic you can afford, but in terms of the above-mentioned article, it's best to run them in Blended Stacked Testing.

⚠️

How To Analise In FTUE testing, funnel conversions should not be a target metrics. They are just indications of potential issues, and the target metrics should be: a. are churn rate by minutes, b. 1d retention by hours and c. (optional ) X min engagers to 2nd session starters conversion. Also, you should exclude early quitters whose churn was most likely affected not by FTUE but by core gameplay itself: for example, if you show most of the core gameplay by 8th step, then ignore all those who haven't reached 10th step.

In my experience, it might increase d1 by at most ~15% percent (not points), but only half of those in terms of expected cohort revenue (probably cause the core audience doesn't give a shit about FTUE's quality). Not much, but not nothing + relatively cheap.

Converting point

Probably every game has not only a converting offer ("starter pack” or similar) that is supposed to be the first purchase for potential payers — but also a well-scripted roadmap that leads players to the point where this offer would seem like a most desired purchase. Approaches may vary: in some games, it's the point where the player loses several times in a row, slows down the progress, or faces a lack of energy, sometimes its “Use It, Then Lose It” + “Then Buy It”, sometimes just showing interstitial and offer "no ads” right after. Anyway, it's what I call converting point and it's lot of things to test here.

Time to pressure. Don't confuse it with time to show the offer: usually, that comes before the game actually creates a need or clearly shows its value to the player. In other words, we're talking about time before the first pressure. Optimal time varies but usually lies between 20 and 120 net minutes from the start.
Pressure level and type. Modern games don't use classic hard paywalls in the early game, but all sorts of “soft paywalls” are at your disposal (some were mentioned above), so you can use different levels of pressure, making your converting point more or less soft for the player.

And despite the almost infinite combination of parameters that you could vary in the test, it can be easily reduced to a few with the highest chances of being the best. I'll give you examples in the next article.

⚠️

Not For Test

It's truly important what actually you're selling (offer's content) — but that's not where tests would tell you something beyond well-known practices and common sense. Yeah, the offer's value should be clear to the players, it should solve their current needs and it's always better when you offer some persistent/constant value (not only resource/energy) even if it's just a decoration.

The same goes for standard offers mechanics: time limitations, high discounts, visibility from the mainscreen, triggers for pop-ups, "last chance” + cancel confirmation, etc… — yes, you need all of those from the start.

And the last thing: there is no harm in suggesting a lot of concurrent offers even in the early game — don't be afraid to look intrusive; at worst, you would get 0.1% bad reviews from those who would never pay for anything anyway.

The best result I've got with those experiments was about +15% in net revenue / LTV; but being honest, such a high impact was reached primarily cause it was run vs obviously weak variant (old-fashioned paywall that cut audience from further payments).

Gameplay intensity

In many kinds of games, you have something like time-replenished energy, or passive income, or reward timers (eg time to open chests), or daily missions, etc. All of those and some similar mechanics create the type of gameplay that could divided into four phases:

ACTIVE GAMEPLAY

1.1. useful: while you get the fastest progress over time by using those resources (or rewards, or energies) before they ran out

1.2. wasted: gameplay is available, but progress is significantly slowed due to lack of smth spent in the useful phase

PASSIVE GAMEPLAY

2.1. useful: while you're inactive or out-of-the-screen, but energy replenishing, timers ticking, income generating, etc

2.2. wasted: while important timers are already achieved and you won't get much value, inactive

Sure, it's not a universal description, and not every game actually fits this model -- but a lot of them do. And if so, there is something to test.

First, forget about ‘wasted active’: there is nothing to test cause it's always better to have an option of unlimited time-killing activities that won't affect your economy too much yet allow you to retain players as much as possible. Only 'useful' phases are test-worthy. And that gives us four combinations:

Long Active Short Passive

Long Active Long Passive

Short Active Short Passive

Short Active Long Passive

Ofc, ‘short’ & ‘long’ are relative measures based on default values from closest competitors. Intensity modifications would affect everything up to late-game, so the best time to run them is before you'll build a mid-game.

'Intensity’ tests delivered me less than 10% (in net revenue), yet there always was at least one variant better than the control — and that's good enough, considering those tests are very cheap. I mean, you already have a remote balance config where you can create those variants by just tweaking a few parameters (you have one, right?)

OMG, long read again 😣

In the next episodes:

LiveOps Stage

Events: as tools and targets
Balance curve, win-costs, etc
Never-ending optimization

Test It Before Make It
Tricks for testing on small samples