Split testing sample size lookup table

How do you decide on the test cell sample size for email splits tests? I’ve often seen discussion about it being important to get right, but seldom seen information on how to do that. If you’ve been waiting for an answer on how to pick test cell size then this post is for you. By the end of this post you will know how to decide on your test cell size.

Sample sizes are all about statistics; fear not, you’ll get the information you need here without a formula in sight.

Let me start with an example of an A/B test using a sample size of 100. With a result of test cell A giving a 5% click rate and test cell B 6% then logically B is the winner. The relative uplift of B on A is 20%. That is a 1% increase on 5%, giving increase 1/5 = 20%.

Sounds great? With the sample size of 100, test cell A 5% click rate means 5 people clicked and in cell B 6 people clicked. Just one person difference. If just one person had gone the other way there would have been no difference between cell A and cell B, no winner, no 20% increase.

The result is termed not statistically significant. This means that the difference between the test cells is due to random variation and not because of a true difference in the effectiveness of getting the click.

Clearly jumping to a conclusion based on one person difference is totally unreliable. Of course a larger test cell is needed. But how much larger, should it be? 200, 300, 4000?

Help is on hand to allow you to decide your test cell sample size by using this simple table.

 

To use the table you need to know just two things:

  • What is your normal response rate?
  • What is the smallest difference you want to measure?

Your normal response rate
This is simply the normal click through rate you expect. If this is the first time for the type of campaign being tested then make an educated guess based on your other campaigns. Guessing on the low side will play safe with sample size.

Smallest difference to measure
The size of the difference in response between test cells affects how big the test cells need to be. You must trade off test cell size against the smallest difference in response that you want to know is a true difference, a difference that is statistically significant.

To help answer this question think about the level of bottom line improvement worth testing for and the cost of doing the testing. Typically its not worth trying to measure less than a 10% difference, for example, less than an increase of 5% to 5.5% is not interesting.

As data is a precious resource using the smallest possible test cell means more test cells and more tests.   Running more tests searching for a 10% or 20% increase is better than one large test cell that allows a 1% increase to be measured with statistical significance.

Using the table
Let’s say our normal click rate is 10%. In the test we want to measure if the click rate changes by 20% or more.  That is if it increases from 10% to 12% or more. First look down the left hand table column and find 10%, then look across to the column for 20%. This gives the answer that the sample size needed for each test cell is 2000. With this sample size you can be confident that any click rate change of 20% or more is a statistically significant result and not just randomness.

I’ve used click rate as the optimisation response metric throughout this post. The table and the same concept applies whether for open rate, click to open rate or conversion rate. You can plug those alternatives into the same table. Just remember if you use a click to open rate your sample size will be smaller. The sample in this case is the number of people who opened, not the number of people to whom you delivered.

Which metric you should use depends on the metric that best represents your marketing objective. Hint, that’s unlikely to be your open rate.

Should there be any students of statistics reading, you may wish to know the table above is based on a confidence level of 95%.

I hope that you are now better equipped and know better next time you hear someone saying ‘just use 10% of your list to test’.

This entry was posted in Best Practice, Testing and tagged , , on by .
Tim Watson

About Tim Watson

Tim Watson has over 8 years experience in B2B and B2C Digital Marketing, helping blue chip brands with successful email marketing.

He is an elected member of the UK DMA Email Council, supporting the email marketing industry. Tim Chairs the Legal and Best Practice hub of the Email Council, authoring and reviewing DMA whitepapers and best practice documentation. He is also a frequent speaker and blogger on emerging email marketing trends.

Tim works as an independent email marketing consultant providing strategic support to email marketing teams.

  • http://twitter.com/MarketingXD MarketingXD

    Very good post! A couple of notes:

    (1) If you want to make a decision after (say) 2 hours, use the response rate that you expect after 2 hours, not the final rate. This article has an example open rate curve – the curve for e.g clicks is probably similar – but ideally you should chart your own:
    http://blog.mailermailer.com/2011/07/highest-volume-of-email-opens-occur-within-the-first-hour-after-delivery/

    (2) As you say, these figures assume 95% confidence. This is fine if you are e.g. experimenting with an email template that you will use for a while. But I would accept a lower figure for a one-off mailing. This has advantages, e.g you need smaller sample sizes, can get your result quicker and send the winning copy sooner, or try more variations etc.

    • http://twitter.com/tawatson Tim Watson

      Thanks Pete, both valid and useful additional information.

  • http://www.jatheon.com/email_archiving/index.php email archiving

    Great table that helps more than you know.

    • http://twitter.com/tawatson Tim Watson

      My hope is to ensure that email marketers are not making bad decisions from their split testing due to small samples. I’m happy if it does this. Thank you for the feedback.

  • Pingback: The Icing News: Rules and Regulations Around the World, A/B Test Cell Size, etc… | CakeMail

  • Valentin

    Tim,
    Using an alpha of 5%, i.e. the 95% confidence you mention is great, but what is the power of the test? Without starting that explicitly you are misguiding your readers as they may like to have a more adequate probability to detect a difference when truly exists – think opportunity cost. I suggest a quick update to the table using a power of 90%. That will make it really useful.

  • Paul Corey

    Do you always have to calculate response rate as a percentage of emails sent? For the click through rate, could you instead calculate response rate as a percentage of opens?

    My thoughts are that you could technically use that calculation and then use the number of opens as your sample size.

    Logically this makes sense to me if we’re running a template test where the subject line is constant and only the email content is changing.

    For the tests that I’ve run this makes a huge difference in the percentage increase that we see, but I’m not sure if this is a statistically invalid method of calculation.

    • http://twitter.com/tawatson Tim Watson

       Good question and you are correct. You can use click to open rate, but in that case your sample size is the number of emails opened rather than sent.

      So whilst the ‘response rate’ increases the sample size goes down.

  • IG

    Thanks for the article Tim. Would you please also share the math formula behind the table “test cell sample size selector”?