How to do stratified random sampling in Microsoft Excel

File:Simple random sampling.PNG

I just Googled the above question, and found nothing but a sea of confusion, proprietry software and (to me) incomprehensible macros. A bit more thinking and research came up with a fairly simple answer, so I wanted to document it here for any other lost excel-loving M&E professional. I assume a knowledge of all the functions I mention below – if you don’t understand them google or email me and I’ll explain. You can also download an annotated excel example of stratified random sampling here.

Start by working out how big you want your sample to be in each category. This should be a relatively easy task. Run a pivottable on your data (or use countifs, if you’re old school) – this should reveal the total number of each category in your sampling frame. For example, if you’re sampling people based on location, you might find 100 in location A, 200 in location B, and 200 in location C. Work out the percentage each group contributes to the whole (in this example A is 20%, B is 40%, and C another 40%) and multiply this by your total sample size to give you your desired sample size per category.

Next, allocate each entry in your sampling frame a random number, using the RAND() function. You’ll probably want to copy this column and paste as value, to stop the values changing each time you click anywhere. This will be the basis for your sampling. If you were doing normal random sampling, you’d simply rank these random numbers and pick the top few. However, since you’re doing stratified sample, you’ll need to use a RANKIF function. This doesn’t come as standard in Excel, but is easy to replicate using this excellent guide. Use the RANKIF to rank the data, according to how big the random number is within the sampling category. In the above example, you will end up with a ranked list for location A, B and C independently of each other. )

Finally, you just need to use an IF formula to check, for each row, whether the rank is above or below the number of samples for that category. For example, if you want to select 10 points from Location A, then anything ranked 1-10 should be selected, anything above should be rejected. You can do this easily with an IF formula combined with a VLOOKUP that checks back to the original table you made way back in the second paragraph.

So that’s it – these Excel descriptions are harder than I ever realised! I have made a downloadable, annotated file to try and explain this better; you can download it here, and please give any feedback regarding whether this was helpful.

Are you looking for similar services?

Get in touch to discuss