Publishers Shouldn’t Treat All Readers the Same Way When It Comes to Subscriptions

By Jacob Cohen Donnelly • August 29, 2023

Living in New York City during the month of August is so interesting because it’s empty. And that’s what it feels like with media as well. It’s quiet. People are getting their last minute vacations done. Everyone’s just trying to soak up a little more sun. I’ll be taking off on Friday for a last bit of relaxation, so there won’t be an AMO issue.

But next week brings Labor Day, white pants go back into the drawer, and it’s time to sprint to the end of the year.

As part of that sprint, add October 26th to your calendars. That’s when I am hosting the first ever AMO Summit here in NYC. The agenda is basically locked. The sponsors are locked. If you want to be there, get your ticket today. I’m increasing prices next month.

Managing inventory across multiple ad products and publications is a lot. You’d be surprised how many operators just use Google Sheets for scheduling and then have to send dozens of emails back and forth with the client to get approval.

But there is a better way.

Sponsy allows media companies with multiple products—newsletters, podcasts, magazines, social, and others—to easily track what inventory exists and work with clients to get ads approved and into production.

With an easy-to-use customer portal and various automations, built-in publication metrics and reporting, Sponsy makes it easy to manage your ad sales without needing to build custom software. This allowed media network TLDR to scale 8x while saving 40 hours a week on ad management.

Pubs need to be smarter with their readers

Subscription businesses come with an assortment of challenges. An old boss of mine used to say that he didn’t like subscriptions because they were “leaky buckets.” And that’s true. Whether it’s churn—active and passive—or people looking to bypass the paywall entirely, running a subscription business is not as simple as throwing up a paywall.

Digiday has a story about readers bypassing paywalls. In many respects, it’s a problem that needs to be explored. But I think there are smarter ways. First, the story.

Publishers can see if anonymous users (those not subscribed or registered) are reading premium content or reading more pages than a metered paywall allows, said Arvid Tchivzhel, managing director at Mather Economics’ digital consulting practice. He estimates about 4-5% of readers are doing this.

The general thesis is that dealing with that 4-5% is like a game of whack-a-mole and it’s not worth the resources. But is it? According to a Toolkits story from 2022:

Fifty-three percent of U.S. consumers say they attempt to bypass paywalls on publishers’ websites when they encounter them, and 69 percent say they avoid clicking links to websites they already know use paywalls or registration walls, according to a study of 2,509 U.S. consumers conducted by Toolkits and National Research Group.

It’s obvious that a majority of people are hitting paywalls and getting frustrated or refusing to even visit a site because of the existence of a paywall. The problem with these two quotes is they are dramatically different.

Is the problem limited to just 4-5% of the audience or is it more than half of people? I might agree that it’s not worth spending a ton of resources for a twentieth of the audience, but for half? That might be worth exploring and digging into.

But what I really think is going on here is that many publishers are not taking anything into consideration about the user. They’re treating each person as the same and that introduces complications, especially when trying to grow the business. There are a few things that are worth exploring.

First, frequency and freshness matters. If the user has come to the site once or twice in a 30 day period, I have a hard time believing that anyone is going to fork over money. Why would they? Compare that to someone who visits the site numerous times in a 30 day period. They are, of course, going to be more likely to pay for something.

Instinctually, this makes sense. If people are building a habit with your product, they are going to be more resistant to breaking said habit. This is why early onboarding of paying subscribers is so important; getting them into the habit of consuming content regularly is one of the best indicators of whether or not they are going to continue paying for the long-term.

Publishers that have multi-revenue streams should be thinking about more than just the paid conversion on the first pageview. We need to be smarter here. Why would someone commit after only one or two stories?

Second, location matters. In the Digiday story, the author writes:

In 2019, The Boston Globe stopped people from using their browser’s incognito mode to bypass their paywall. The number of people who subscribed after hitting the paywall tripled.

And that’s great for The Boston Globe. Here’s the issue… Because The Boston Globe is on the internet, random people hit the site. I’m one of them. From time to time, I hit the site. But because the majority of the content is about Boston, I’m never going to pay. And so, I hit the site, I’m hit with an immediate hard paywall, and I leave.

For local publishers, it’s important to anticipate a user’s likeliness to subscribe based on their location. Why would a New Yorker pay for The Boston Globe? The same can be said in the inverse. Why would someone in Boston pay for The New York Daily News? If the person is outside the core geography, you should attempt to monetize another way.

And third, source of the pageview matters. How the user accesses the site should dictate the aggressiveness of the subscription CTA. This ties into frequency quite a bit, but if someone is coming from Twitter, what is the likelihood they are going to subscribe? I suspect it’s flyby traffic that saw someone tweet about the story, clicked, and then bounced when they were asked to pay.

If a user is typing in your URL, however, and coming directly to the site, it’s likely you’ve got someone’s loyalty. That is someone you want to push much more aggressively to sign up.

Subscription businesses are hard and they are not nearly the panacea that everyone has assumed. It’s not just throwing up a paywall and hoping everything works out. Different cohorts of readers are going to convert and monetize differently. Treating everyone the same minimizes your revenue potential. Some people should be monetized with ads. Some should be monetized with subscriptions. And some should be pushed to a free sign up so that they can be nurtured. If we’re smarter with this, we’ll make more money.

Does blocking AI in robots.txt work?

Tension is heating up between various website owners—it’s not just media companies—and OpenAI. Earlier this month, OpenAI announced that it had its own web crawler so it could scrape sites for training GPT models.

According to Insider:

As of this week, 70 of the world’s top 1,000 websites have moved to block GPTBot, the web crawler OpenAI revealed two weeks ago was being used to collect massive amounts of information from the internet to train ChatGPT. Originality.ai, a company that checks content to see if it’s AI-generated or plagiarized, conducted an analysis that found more than 15% of the 100-most-popular websites have decided to block GPTBot in the past two weeks.

The six largest websites now blocking the bot are amazon.com (along with several of its international counterparts), nytimes.com, cnn.com, wikihow.com, shutterstock.com, and quora.com.

The top 100 sites blocking GPTBot include bloomberg.com, scribd.com, and reuters.com, as well as insider.com and businessinsider.com. Among the top 1,000 sites blocking the bot are ikea.com, airbnb.com, nextdoor.com, nymag.com, theatlantic.com, axios.com, usmagazine.com, lonelyplanet.com, and coursera.org.

These sites are blocking GPTBot by putting a disallow into their robots.txt, which is a file that tells web crawlers what they can or cannot do on that respective site. And so, if you look at nytimes.com/robots.txt, for example, you’ll see it says:

User-agent: GPTBot
Disallow: /

There are a number of other things that are blocked in there, but calling out GPTBot explicitly makes a lot of sense. Why would these publishers want to let OpenAI teach its GPT model on their content for free? This is the whole thesis behind the negotiations regarding publishes wanting compensation from OpenAI.

And this is potentially very risky for OpenAI. Benedict Evans has an amazing piece about generative AI and intellectual property. I want to highlight this one paragraph:

On the other hand, it doesn’t need your book or website in particular and doesn’t care what you in particular wrote about, but it does need ‘all’ the books and ‘all’ the websites. It would work if one company removed its content, but not if everyone did.

If every website owner decides to block OpenAI, then the software’s ability to learn weakens considerably. That can be dangerous because it’ll reduce the quality of what ChatGPT can publish. Now, that isn’t something publishers ought to care about, but it explains why OpenAI is trying to play ball.

The question then is whether we should all be blocking GPTBot from our sites. The short answer is yes, but this is only a stop gap. There are two outcomes here. First, OpenAI honors the robots.txt and stops learning from all these sites, which would be bad for OpenAI. Second, it ignores robots.txt—which is technically possible—and continues to learn from the content.

The bigger issue is that this only stops GPTBot. Google has introduced Search Generative Experience (SGE) where the AI experience is directly in the SERPs. As far as I can tell, there is not a separate Google SGE bot. Therefore, when the Google Bot crawls our sites, it’s likely using that content to teach its AI. And so, while we might be preventing OpenAI’s bot, we’re not stopping generative AI bots at large.

Here’s what I will say about all of this: we should not be doing anything to help these tools learn. There’s zero value for our businesses and it only helps a potential competitor get stronger. For the time being, putting a disallow into the robots.txt is a necessary first step all sites should be taking. From there, we’ll need to find better, long-term solutions.

Thanks for reading today’s AMO. If you have thoughts, hit reply or become an AMO Pro member for an invitation to the exclusive members-only Slack. Additionally, you’ll receive:

A second AMO every Friday
Transcripts of all AMO podcast episodes
Early bird access to upcoming events
And more…

If you buy a ticket to the AMO Summit on October 26th here in NYC, you get one year of AMO Pro included. Register today and I’ll see you then!

Publishers Shouldn’t Treat All Readers the Same Way When It Comes to Subscriptions

Pubs need to be smarter with their readers

Does blocking AI in robots.txt work?

Related

You're on the list!