Publishers Should Be Testing More Frequently
In a previous life, my dream was to be a microbiologist. I thought working in a lab would be so fascinating. However, going into my senior year of college, I realized that I didn’t want to spend another four years in grad school and even more post-grad, so I switched to history.
During that time, I learned how penicillin was discovered and how to construct experiments to disprove hypotheses. Both are relevant to my belief that publishers need to be testing more frequently. And I’ll come back to them in a little bit.
Before I do, though, I want to set the stage for why I am writing this piece. And I should preface that this piece is going to be nerdy. But I think it’s so important. I wrote about product management and media companies a couple of weeks ago. Re-reading it, I found one part that I wanted to explore in more depth:
But product doesn’t just help create a new thing and then move on. Let’s go back to that new ad example. Once the ad is out in the market, the product team can help improve it iteratively. The reason it’s called product management is that there is a sense of ownership. As product teams scale, individuals can begin to specialize. And so, if the initial goal was to drive 100 leads per 10,000 visitors with V1, the company should incentivize that product manager to develop a strategy to push it to 110 leads per 10,000 visitors for V2.
By creating a testing culture, we can often identify new improvements that move the needle or fundamentally new ideas that we hadn’t even considered. But I don’t believe most media companies have cultures of testing. On the contrary, most companies are resource constrained, so they focus on launching things without ever continuing to improve upon things.
I believe that if media companies did invest some resources in testing—with concrete hypotheses and just random things (mistakes)—they might find new opportunities that they didn’t know existed. So, while I don’t think media companies often find transformational innovation, I do believe they can find legitimate growth opportunities.
And so, this brings me to how penicillin was first discovered. According to the American Chemical Society:
Returning from holiday on September 3, 1928, Fleming began to sort through petri dishes containing colonies of Staphylococcus, bacteria that cause boils, sore throats and abscesses. He noticed something unusual on one dish. It was dotted with colonies, save for one area where a blob of mold was growing. The zone immediately around the mold—later identified as a rare strain of Penicillium notatum—was clear, as if the mold had secreted something that inhibited bacterial growth.
It was an accident. One of Fleming’s Petri dishes got contaminated, and suddenly, he discovered penicillin. Now, the outcome is not that he invented the first antibiotic. It took another 10+ years and many other scientists until it could legitimately save lives. But a mistake led to a dramatic innovation.
This happened to me the other day. We had an ad running in one of our newsletters. Typically, it would go in the middle of the newsletter. That’s always where it goes. However, it accidentally wound up at the top of the newsletter. As a result, clickthrough rates improved by 26%. That’s a lot. And it was a mistake.
Why do I mention this?
Because when you’re heads down just trying to get work done, you might see a mistake, develop a process so it never happens again, and move on. However, what if you stopped and asked yourself if there were any benefits from the said mistake? We knew it was a mistake in this case, but we stopped to determine whether it was a bad mistake. In the case of penicillin and this ad, we found the error was good.
But this brings me to the second part of what I learned in college. All the scientific method does is give you a framework to disprove a hypothesis. For example, my hypothesis could be, “moving this ad up in the newsletter won’t increase CTR.” With a simple test, I can disprove that hypothesis. But why do I keep writing disprove rather than what we’re taught in grade school, which is to prove?
According to Charles Rock, Ph.D. at St. Jude Children’s Research Hospital:
A hypothesis or model is called falsifiable if it is possible to conceive of an experimental observation that disproves the idea in question. That is, one of the possible outcomes of the designed experiment must be an answer, that if obtained, would disprove the hypothesis.
And this is what we need to develop when it comes to a testing culture. There needs to be a willingness to do work that might seem pointless. However, the work has the potential to help the business evolve. What if you don’t think moving the ad up will increase CTR, but it does? You’ve disproven the hypothesis, and now you’re getting more clicks. That means you can charge more! True impact on the business.
But even if you want to create a testing culture, it’s easy to do it wrong. And that’s because people get impatient and want to test many things. And so, there are a few terms that are important to understand:
- Independent variable: What you are changing in your test/experiment
- Dependent variable: What should be impacted by the change in the independent variable
- Controlled variable: Everything else
In a good test, there should be a lot of controlled variables, one independent variable, and ideally, one dependent variable. But what does that look like in real life? Here are some possible examples and places you might want to spend some time testing.
Let’s start with a harmful ad problem. In this hypothetical world, you have partners saying that ads on the site are not performing like they used to. As a result, you’re forced to give more discounts, or they’re just canceling outright. It’s frustrating and unproductive.
And so, we decide to develop a test to identify what’s the problem. You realize that the ad is halfway down the page, so you decide that you’re going to move it to the top of the page. So, if we look at the three variables, it’ll look like this:
- Independent: Ad moves to the top of the page
- Dependent: Ad CTR should improve
- Controlled: Nothing else on the website should change (stories stay the same, colors remain the same, page speed remains the same, etc.)
You then run the test. The outcome should be that the ad CTR (dependent variable) either improves, gets worse, or remains the same. There are no other outcomes, and you can be pretty confident about the data. Except you can’t. What if the ads are different day to day? What if one creative is better than the other? The more of these we can turn into controlled variables, the better.
This, then, introduces what’s known as an AB test. Rather than just moving the ad up and comparing it to historical numbers, deliver one ad experience to 50% of your audience and a different experience to the other 50% of the audience. Now you know which experience performs better.
But this is a negative scenario where partners are complaining. What if they’re not complaining but think you can make more money instead? In that product piece I wrote a few weeks ago, I talked about improving lead capture from 100 leads by 10,000 visitors to 110 leads. With a large enough audience, that could be impactful.
So, what could a test be? Maybe the hypothesis is that you’re asking for too many fields that are not integral to your lead gen. I’ve seen some lead forms that are 25 fields. Do they need to be that long? And so, you hypothesize that if you cut the number of fields from 10 to 5, you’ll see an increase in conversion. This is what the test would look like:
- Independent variable: Removing the five fields
- Dependent variable: Improved conversion on the form
- Controlled: Everything else stays constant, including button color, website, copy, etc.
What happens? Do you see an improvement on that page? If not, maybe you should start by removing individual fields and seeing what that does. If not, perhaps you should test the copy. If not, maybe you should try the button size or color (not and, or). My point is that you can get granular and find improvements.
I’ve focused on ads here, but you can quickly see how this plays out with subscriptions, selling commerce, and other parts of the business. But the important thing is to develop a culture where these sorts of things are acceptable. Testing and experimentation are fundamentally about curiosity. Do you allow people at your company to be curious and try things? Or, do you put things in place and expect them never to change?
That doesn’t mean mistakes don’t happen. The only way a test can truly help disprove a hypothesis is if constructed correctly. As importantly, you need enough data to be statistically significant.
To be constructed correctly, it needs to have the above three variables clearly defined. There should be only one independent variable. Everything else should be controlled. The stricter you are here, the more confident you can be about the results.
And to be statistically significant enough, you need to let it run for long enough. I’ve seen people run one-day tests with a single ad and then say something did or didn’t work. That’s not enough time. Each publication will have its own optimal time for a good test. I will say that the more traffic you have, the shorter a test needs to run. However, I like to default to one or two-week tests.
Mistakes happen. When they do, sometimes you break something, and sometimes you get penicillin. The same can happen at a media company. Encourage your team to be curious when a mistake happens to see if there are any unintended but positive outcomes from it.
And even when mistakes aren’t happening, push the team to question things and develop hypotheses on ways to improve the business. If you structure your tests correctly, you’ll see a true impact on the business. Of course, it takes time, but I believe this separates good companies from great ones.
Thanks for reading. If you have thoughts, hit reply or join the AMO Slack. Have a great weekend!