The Rapid Rise of Synthetic Content

By Jacob Cohen Donnelly • November 28, 2023

Just a reminder that you have until the end of the day to get 25% off an annual subscription to AMO Pro for your first year. If you want to take advantage of this, head over here and sign up. You’ll start receiving the Friday newsletter, an invite to the AMO Slack, and early bird access to upcoming AMO events plus so much more. Thanks for reading!

Publishers have realized that generating direct audience revenue is critical to running a successful business, especially with the uncertainty of the ad market. However, it’s one thing to launch a subscription business and another to run it.

In this report, BlueConic explores The State of Publisher Subscriptions. Through a survey of 201 publishers, this report explores the solid growth the majority of publishers are seeing, why revenue diversification is the top goal of publishers, and the inevitable two-front war of both acquiring subscribers, but also retaining them.

Download the full report today.

AI-generated content is here to stay

Graduating college with a history degree in the slow growth years immediately after the Great Recession didn’t leave me with many prospects. And so, despite spending $100,000+ for a college education, I started my career doing the thing I had done to afford beer and Chinese food while in college: SEO work.

I graduated college in 2010. And for those that have worked in and around SEO for that long know that this was the last year when a lot of the super spammy tactics worked. Keyword stuffing—where you just used the same keyword over and over again—was a common practice. In 2010 and before, you could rank a site for a pretty competitive keyword with some ease. Google bombing, where you’d rank a site for an unrelated keyword, was a common tactic. Exhibit A: The top result for “Miserable Failure” in 2006 was George W. Bush’s biography on the official White House site.

But then everything changed. I got my first SEO job one month after Google introduced its massive Panda update in early 2011. This was a major algorithmic change that targeted sites with bad content. A year later, Google went after keyword stuffing and advanced link schemes with its Penguin update. And with each, incremental update, being in SEO got harder. I’d argue that the search engine result pages (SERPs) got better for a while, but the easy wins were no longer there.

So, why do I give this history lesson? I would argue that Google is at risk of losing control of its SERPs, something I don’t think has been the case since 2010. The ease in which synthetic—AI-generated—content is being published is going to have an increasingly negative impact, which will result in users opting for other tools to find their results. And this, in turn, will have an impact on publishers.

We can already see this playing out with the rise of “Reddit” on the end of search terms. I’ve found over the last 9-12 months, I do this so much more where I type something into Google and then at the end, add Reddit. Now every result that pops up is something that I know, without a doubt, is coming from a human. I don’t always trust the information, but I know that it’s a human creating it.

And Google is very aware of the problem. According to a CNBC article from June:

At an all-hands meeting earlier this month, Prabhakar Raghavan, Google’s senior vice president in charge of search, told employees that the company was working on ways for search to display helpful resources in results without requiring users to add “Reddit” to their searches. Raghavan acknowledged that users had grown frustrated with the experience.

“Many of you may wonder how we have a search team that’s iterating and building all this new stuff and yet somehow, users are still not quite happy,” Raghavan said. “We need to make users happy.”

And it’s only going to get worse. With the rise of generative AI tools, we’re going to see a lot of crap, synthetic content get published on the internet with a single goal: to try and rank for long-tail keywords. This tweet on X, part of a longer thread, is worth looking at:

We pulled off an SEO heist using AI.

1. Exported a competitor’s sitemap

2. Turned their list of URLs into article titles

3. Created 1,800 articles from those titles at scale using AI

18 months later, we have stolen:

– 3.6M total traffic

– 490K monthly traffic

This is what publishers are increasingly going to be competing with. Someone can export our sitemap and then use AI to create thousands of pieces of content with the goal of ranking. And it’s working.

But publishers don’t get to be too upset since there are some of us that are doing this ourselves. According to Futurism, Sports Illustrated, owned by the Arena Group, has not only created AI-generated content, but even has AI bylines.

The AI authors’ writing often sounds like it was written by an alien; one Ortiz article, for instance, warns that volleyball “can be a little tricky to get into, especially without an actual ball to practice with.”

According to a second person involved in the creation of the Sports Illustrated content who also asked to be kept anonymous, that’s because it’s not just the authors’ headshots that are AI-generated. At least some of the articles themselves, they said, were churned out using AI as well.

“The content is absolutely AI-generated,” the second source said, “no matter how much they say that it’s not.”

After we reached out with questions to the magazine’s publisher, The Arena Group, all the AI-generated authors disappeared from Sports Illustrated‘s site without explanation. Our questions received no response.

Indeed it is hard to get into volleyball if you don’t have a Wilson to work with. Let’s be clear here. Arena Group is not the only one that is doing this. And look, there is certainly some pearl clutching by reporters because a machine can create words far faster than they can.

But this is the future we are dealing with. According to a Europol report, “Experts estimate that as much as 90% of online content may be synthetically generated by 2026.” That’s a lot of content being made by AI.

So, why does all of this matter?

Let’s go through the list of things. First, Google is increasingly aware of the fact that users are unhappy with the SERPs. There is a growing demand for human created content, hence why people are adding “Reddit” to the end of their searches. According to research done by Her Campus Media, “Over half of Gen Z chooses TikTok over Google as their #1 search engine.” That’s a serious problem for Google, especially considering it also owns YouTube, a direct competitor to TikTok.

Second, tools are being created to make it unbelievably easy to create hundreds, if not thousands, of pieces of content to rank for all sorts of keywords. And third, less than inspiring publishers are also doing this. And so, it’s going to create a wonderfully perfect storm where there’s just a ton of crap popping up in Google. And publishers are going to have to make a choice in how they compete. They could try to do the same thing—which is what an increasing number of publications are doing—or they can focus on continuing to deliver value to users.

Because let’s be abundantly clear here… very little of this synthetic content is valuable to users. And the reason is very simple: there is no understanding of the inputs, nor is there any real check on the outputs. How can there be when you’re trying to publish thousands of articles in hours?

At some point, Google will have to act. It cannot hope to retain its dominance in search if an increasing number of people do not trust the results. It is, of course, trying to do that with E-E-A-T, which is its quality rater guidelines I am pulling the exact sentences from this document.

Expertise: Consider the extent to which the content creator has the necessary knowledge or skill for the topic.
Experience: Consider the extent to which the content creator has the necessary first-hand or life experience for the topic.
Authoritativeness: Consider the extent to which the content creator or the website is known as a go-to source for the topic.
Trustworthiness: Consider the extent to which the page is accurate, honest, safe, and reliable.

Trustworthiness is last in big part because the first three help inform trustworthiness. But the first three considerations are important. Can AI have expertise, experience, or authoritativeness? I don’t see how it can in its present state. And so, every dollar that is spent creating synthetic content is a dollar that is wasted because at some point, Google will punish this content. And publishers will be back to square one.

This doesn’t mean that using AI for content creation is completely off limits, of course. At the AMO Summit, I spoke with Skift’s CEO, Rafat Ali, about this. He said:

If you train an LLM on your archives over the last eleven years for us as a company, chances are if you’re going to ask any question about the business of travel, we have the answer among the hundreds of thousands of stories we’ve done or research reports we’ve done or conferences we’ve done. And it’ll pull from 20 of these and give you an answer. It is our IP that it is trained on. You would be surprised how accurate it is if it’s trained on a confined set of data or content in this case.

We started writing articles for ourselves on Skift. This is our own IP, so we are creating original articles [based on our content]. These are not news stories. These are explainers, background, timeline, etc. for big news. For example, Airbnb vs. New York City and Airbnb being effectively banned. We created a timeline because we’ve been covering it from 2012 of Airbnb vs. New York City’s challenges over the years. It created a timeline.

We do have one person dedicated on the editorial team who generates these articles and fact checks them. In general, they are very accurate. We publish them as Ask Skift stories. That frees up our journalists to do higher level work versus doing these background stories.

While the AI, itself, doesn’t have E-E-A-T, the content that the LLM is trained on does because it is proprietary. And that, theoretically, would make the quality of the output significantly higher. But you’ll notice that this does not replace the work that his editorial team does. It simply enhances it and allows the team to focus on acquiring additional data—aka reporting—to continue making the LLM smarter.

And so, when the time comes that Google acts—I suspect sooner rather than later—publishers like Skift that have used the technology to create quality content will find themselves doing okay. But for the many publishers that see the above tweet about creating 1,800 pages of crap AI content and get excited, you will experience a similar outcome to Demand Media. It went public on January 26, 2011. 28 days later, the Panda Update was here. And by April 2011, Demand Media’s stock had peaked and withered away.

The internet is going to be full of synthetic content. Google will have no choice but to act. The publishers that survive will be those that focus on creating value for their audience, using generative AI to give them better information, not simply more useless information.

Thanks for reading today’s piece. If you have thoughts, hit reply. Be sure to take advantage of the 25% off an annual subscription of AMO Pro for your first year. That offer ends today. Have a great day and see you next week!

The Rapid Rise of Synthetic Content

AI-generated content is here to stay

Related

You're on the list!