A/B testing websites and web applications is relatively straight forward. Tools like Google Optimizer (now part of Google Analytics) make tests easy to set up, and changes can be deployed without much hassle.
Things are much harder if you want to run a test on a desktop application:
- There are no off-the-shelf tools to help.
- Changes are installed by the user, not deployed by the development team. This means you’re reliant on either existing users upgrading their software or new users installing it for the first time for your changes to become public.
- Users don’t expect their software to change without first installing a new version. Users also have to be given a reason to upgrade without jeopardising the test. Once the test has finished you can’t just switch everyone over to the winning design without another update.
- Users are more sensitive about data being sent back from desktop applications than websites. You therefore have to ask their permission first, and this reduces your sample sizes.
- The types of conversion being optimized for are often harder to affect. There’s a big difference between optimizing a sign up form and changing the functionality of an application.
- Data volumes tend to be smaller. Once a user has downloaded and installed your application they’re already quite far down your conversion funnel so tests will take longer to run.
However, A/B testing desktop applications is still possible! We recently ran a month long test on one of our .NET Profiling tools. The test itself was a failure, our new design was not an improvement, but we were able to establish a procedure for running experiments that worked.
- To avoid changing the application under our user’s feet we decided to only show the test to new users and did not upgrade existing ones. This made particular sense in our experiment as we were trying to optimize the experience for new users.
- To show our users different versions of the application we chose to keep a single build rather than a separate one for each test. This reduced our build overhead, kept things more maintainable and meant that if necessary our support team could help users switch versions. Our download server tagged each installer with a random number and during installation this added an item to the registry telling the application to show either version A or B.
- To track events were able to use our existing feature usage reporting. All we needed to do was decided exactly what we wanted to track. This gave us the raw numbers we needed that we could then perform significance calculations with.
- To save time, as our changes were only minor, we chose not to update any documentation or screenshots.
- To overcome any data privacy concerns we explicitly asking the user’s permission to send back usage data.
The only major problem we were unable to resolve was the low volume of data we were able to collect. Only around 12% of users opted to send back feature usage data, resulting in only 5 new users per bucket per day. This meant that in order to see significant improvements in a reasonable amount of time our new design would have to be substantially better.
Of course, this problem isn’t unique to desktop applications, and the remedy is fairly simple. Either run your test for longer, get more users to opt-in, or make bigger changes. If you’re application is anything like ours don’t expect to test every little detail – save A/B testing for dramatic design changes.