Skepticism

Drew Marconi
February 22, 2025

A while ago, one of our customers ran a test that completely broke our logic.

This brand discounted their multi-packs by like 30%, bringing the prices to such a level that it’d almost be foolish to buy a single pack of the same product. Naturally, we were expecting that we’d start to see way more transactions for the multi-packs vs. the single packs, and get bigger basket sizes.

The test group won—there was more revenue and more profit—but not for the reason we expected. The brand was selling way more single pack SKUs. Multi-packs actually decreased in volume. It made no sense.

I share this test—and the actual data—as a take-home assignment with people applying for jobs at Intelligems¹, because I want to see how they respond to the data: Do they accept the answer and move on (“Seems like we should roll out these multi pack discounts!”)? Or do they express skepticism about how thoroughly the results broke logic?

The latter is what I look for. When people are skeptical of the data in situations like this example, that usually indicates there’s a willingness to dig deeper into the question of “why?”

I also think this quality is what makes for great testing programs.

People test because they desire to grow their business; that comes via increasing profit or revenue. And because that’s the end goal, it’s not uncommon for brands to ignore how that revenue or profit grows, so long as it does. This can be a mistake—especially in testing—because the intermediate steps (the “how”) matter a lot.

If we stay focused on revenue growth, say, you may accept the results of the test above and conclude that discounts worked. Call it a day. Next test up.

But if you looked at the intermediate steps, you might begin questioning the results. You might dig a bit deeper. And it’s in that questioning and digging where great programs are made.

In the above example, you’d expect revenue to grow (the end goal) because customers bought more multi-packs (the intermediate step).

When that doesn’t happen, though, you can learn something about your test or something about your customer: Did valuable information, needed to help customers feel more confident in buying multiples, get pushed down the page? Are there outliers in the control or test variants? Are there other ways, outside of the quantity buttons, to make it easier for my customers to buy more than one? Did we code the test wrong? Should we rerun the test and see if we can reproduce the results?

The difference here is pretty stark. In the former, we’re testing to see if a widget works. In the latter, we’re testing to see if we can positively influence customer behavior—and using the test results to not just pick a winner, but also build a more nuanced understanding of what’s actually happening with customers and the experience.

Regardless of which group you pick as the “winner,” only the latter approach leads to more questions, which you can turn into more hypotheses, which you can use to find more possible avenues for growth.

The digging into the why actually begins to create a roadmap we can take action on, because we’ve continued to ask questions of how the customer is responding to the change.

While the elongated point I’m trying to make here is that skepticism around data is good, the lesson on how to get there is a bit more tactical and to-the-point: Form a strong hypothesis. Run your test results against it. Can you tell a reasonable story about why you’re seeing those results?

In our example above, we could have written a hypothesis that would have drawn our attention first to the product mix results, as opposed to the revenue results, which would lead us to (helpfully) question the results. Something like: “discounting multi-packs will lift AOV because the number of customers purchasing those packs will increase without negatively impacting conversion rate, thereby increasing revenue.”

Such a hypothesis (admittedly easier to write retrospectively) would make multi-pack transactions a primary metric for us to analyze. In fact, revenue impact might be the last thing we looked at, given our hypothesis that such a change in purchase behavior would lift AOV. (This is, by the way, one of the reasons we go so deep in our analytics in the Intelligems platform: if you’re going to take the time to ask the questions, you shouldn’t have to spend the time finding the answers.)

Playing this example through, how much more likely do you think it would be for people to be skeptical of the result? And how much faster do you think that skepticism turns into action?

That, to me, is the real value. If you treat testing as a “pass/fail” endeavor around revenue, it exhausts itself at the end of each test. But the whole point of testing is to grow the business, which means you need to know how the business grows.

And that’s what skepticism gives you.

1 These sorts of take-homes - where someone puts in some work, and then we meet as a group to discuss - have been an incredible hiring tool for us. Best way to get a feel for what it’d actually like to work together (on both sides). Topic for another newsletter…²

2 Are footnotes going to become my thing? Did I read too much David Foster Wallace as a teenager?