Like A Girl

Pushing the conversation on gender equality.

Code Like A Girl

Dumb data, smart science: a 5-step guide to smarter analytics

Every now and then I have someone say to me they need help with reports and analytics. I then ask one simple question: what do you want to measure?

Whenever I hear the answer “I want to measure everything”, I just feel like facepalming my skin from my skull, because the idea of measuring everything is, in itself, a disservice to your sanity. It brings a very constant risk of measuring just for the sake of having a number. This is a trap you should avoid at all costs, because measuring is a pain in the ass, and having tons of numbers only means you can get lost in data that much easily.

1- Choose your weapon

To achieve smarter analytics, we must first be able to dodge producing dumb data. When deciding what to measure, you must think of every metric as a weapon, designed to be efficient, lean and flexible.

Whenever you choose a weapon, you ask yourself “how am I going to use that?”. The same applies for metrics. Before you start measuring, you must understand what you want to do once you have the results. Otherwise, you’re just creating work that will lead to no value.

For instance, let’s say you run a bakery. You want to measure stuff, but why? To know your business better. Ok, but what do you want to know about your business? Here are some interesting ideas:

a. How many buns are baked x sold per day?

b. What are the rush hours for bun-selling?

c. How long does it take from the moment the baker starts baking to the moment the bun is ready for selling?

d. What is the ratio of bun types that are sold?

And some not-so-interesting ideas:

a. How many different people buy in your bakery?

b. What’s the average age of these customers?

c. What’s the gender ratio of the customers?

Would it be nice to measure all the above? Maybe. But the first four items are easily connected to an action (verifying ratio between baking x selling provides reduction of waste; the rush hour lets you know in advance when to increase production during the day; the lead time for baking allows you to anticipate the need to increase production and start baking more at the right time; the ratio of bun types allows you to adjust production to match selling ratios). The 3 final items, however, are not immediately actionable, so if we don’t have a good purpose in mind, we should just discard measuring them.

2- Know you don’t know

But at least I am self-critical

Let’s say you’d like to find out if there is interest in new types of bun. You don’t know if your customers would accept different flavors, and to include those in daily production could result in major waste. What can you do?
Knowing the rush hours for bun-selling you can bake one batch of each flavor and present small helpings, for free, all at the same time, for a small time window. You then measure how long it takes for the first batch to finish, and count all the remaining units in the other batches. The empty batch is clearly a favorite, while the others might be contenders or not, depending on how many units are left. You finally decide what new flavors to include in daily production.

3- What’s your moment?

When you have an idea, there are different needs of data for different moments. When there is a lot you don’t know, qualitative data is probably better. You listen to your customers with no filter, and look for pains and needs that might be unattended.

When you already have a hypothesis or a need to attend to, you must then turn to quantitative data, which will provide the means for you to understand the application of your hypothesis and predict results.

4- Correlation x Causality

Apparently more Nick Cage films = more people drowning. Yeah, that’s right.

One of the most common causes for dumb data is mixing correlation and causality. Correlation happens when you identify change in two or more indexes at the same time. For instance, you identify that bread sales and visiting customer numbers increase on weekends. Those two indexes are correlated. But does that mean more customers cause more sales? You get a report from the cash register and find that during the weekend, the number of unique invoices doesn’t rise proportionally. The number of buns per invoice, however, increases dramatically. You interview the customers and find out the cause for having more customers is that buyers bring their family along, and the cause for having a sales increase is that they will have more time for snacks at home during the weekend. Therefore, having more customers is not the cause of selling more buns. They are simply correlated (people take their families to the bakery during the weekend and they also buy more buns for snacking throughout the day). The weekend and the spare time it provides, however, have a causality connection to the sales increase.

When we mistake correlation for causality, we may waste time and energy with actions that won’t bring us the results we desire. For that reason, always double-check your interpretation and then devise a plan to improve the numbers. If you EVER feel like that’s not necessary, because there’s no way two indexes aren’t connected by causality, read this and try again.

5- There’s no value in the absolute

But with an itty bitty living space!

When I tell you I sold 200 buns, will you think that’s a positive or a negative indicator? If this question got you scratching your head, don’t fret. The right answer is “I have no idea”, and the reason it is impossible to know is because you have no parameter to compare the 200 buns to, or a time frame to set to this number. Good metrics must be comparative, and related to specific periods of time, otherwise they are just untreated, incomprehensible absolutes.


I sold 200 buns.

I sold 200 buns today >> sales x time frame.

I sold 200 buns today, as opposed to my 150 daily average >> sales x time frame x average in time frame.

I sold 200 buns, today, as opposed to my 150 daily average, which is a 33% increase >> WOW.

That is the power of analytics, not as a bunch of numbers that look pretty on a spreadsheet, but as knowledge for your business, used to decide your next steps.

So choose your metrics well, know there is a lot you don’t know, understand if your moment requires qualitative or quantitative metrics, investigate cause and correlations, and format the data in a way it becomes something you can analyze and define strategy on. That is a very lean and focused approach to metrics that will help in guiding your actions and avoiding waste.

More on analytics:

Lean Analytics Book – Use data to build a better startup faster

If you like this post, don’t forget to recommend and share it. Check out more great articles at Code Like A Girl.