AB Testing in Apps — Not Just for Advertising

If you’re scrolling through your Facebook feed, which image is more likely to catch your eye?

Spelldom Castle Spelldom Elf

Perhaps the image of the castle will intrigue strategy game enthusiasts. The letter tiles in the image with the elf will likely interest the word game audience. Maybe one of the images will perform better with a particular age group or gender. It’s possible that one image may generate more downloads, but the other attracts users who are more likely to stick with the app and spend money. Even after considering all of the information, we don’t really know for sure which picture will convert better as a Facebook ad.

Enter A/B testing.

Since we don’t know which image is best, we’ll run both! Facebook designed their ad platform so that you can test different pictures and calls to action in the same ad campaign. Some users will see one ad, while a mix of similar users in the same demographic will see the other. After you collect enough data points, you can make a reasonable decision about which ad is superior.

While modern online advertising platforms make A/B testing easier than ever, the practice of testing different ads has been around forever. I’m not telling you anything that every marketer doesn’t already know. Does every developer know it, though? More specifically, have they considered incorporating A/B testing, not into the app’s marketing, but into the app itself?

They usually haven’t. Here’s the process that most developers go through instead:

Let’s say you have a social networking app. You’ve built-in some basic analytics, and even integrated event logging to check for bottlenecks in the user on-boarding process. You release the app, get an initial burst of users, wait a few days to make sure you have enough data points, and then examine the event logging. Perhaps you discover that you lose the most users at the step where they have to upload a picture. You decide that users must be having a hard time choosing a picture on the spot, so you decide to give them an option to skip that step. They can always upload a profile picture later. You program the changes, test it, feel satisfied that everything is working, then submit your update to the App Store. Several days later, your update gets approved by Apple. You release the new update, wait a few days, then analyze the new results.

There’s nothing particularly wrong with this process. It works, and it’s something you will inevitably have to do for unforeseen issues. With a little foresight, though, you can do things better with A/B testing. First, there’s the obvious improvement in time optimization. Waiting to collect data, programming changes, waiting on Apple’s approval process (which, kudos to Apple, is much shorter than it used to be), and then waiting to collect more data is quite a bit of waiting. If you set up A/B testing for your on-boarding, where a portion of your users experience different on-boarding mechanics, you can collect data once, then analyze a variety of user experiences. Once you decide on the optimal user on-boarding scenario, you can treat the majority of your users to that experience, and continue to test others ideas with a smaller audience. If your app draws information from a server, you can even adjust what portion of your users will experience different alternatives by editing your backend, which allows you to update your app to the optimal experience without waiting on Apple’s approval process.

In addition to the time savings, A/B testing allows you to run your trials against similar audiences. Your app brings in different sorts of organic users throughout its lifecycle. We will ignore users recruited from ads for now. The users who download your app in the first few days following its release are your friends and early adopters. These people are often more forgiving, especially if the early adopters feel like they have a tie to you, perhaps through a forum where you talked about your app with their community. The exception to consider is if Apple features your app. In that case, a huge majority of your incoming users have no vested stake in you and likely don’t have as much interest in your app in general. Don’t worry—getting featured is a good problem to have! Regardless of whether your initial users have a higher propensity or lower propensity to stay, they’re going to be different than the users that show up over the coming weeks, who find your app from referrals or search. These users will be different still from the users that download your app in the following months. As your App Store rank shifts, the split of organic users arriving from referrals and search will too. Logically, we know users that download your app because their friends told them to are going to be more likely to stick with it. If you test all of your changes in the app in sequential fashion, you’re not necessarily conducting your test with comparable users. This might lead you to conclude that a certain change improves user stickiness or monetizes better, when in reality, you’re just testing with different types of users.

It’s possible to overcome this issue by only looking at users you bring in from a certain ad network, as they will be relatively uniform on average. It’s still not a perfect approach, and doesn’t solve the issue with organic users, which are ideally the bulk of your users. Your best bet is to plan for the A/B test from the outset. Consider what variables you want to test, then save time as you collect data and craft the ideal app experience for your users.