Anonymous copy – the next thing to ruin your rankings

Dec 8th, 2022

Following on from an excellent piece by Lauren Fellows, one of our expert SEOs, on authorship, I thought I’d add my tuppence worth – a tidied up, expanded and edited transcript of my talk from the recent Benchmark Search & Digital Marketing Conference – on why establishing authorship will be even more important going forward.

You should keep your copy simple, right? Like, really simple. Like, dead properly incredibly super simple. The tools tell you; the Flesch-Kincaid grade scale tells you; the endless line of social media influencers that crowd the pavements which run alongside the information superhighway tell you. So, you should, right? You know, because of Hemingway and Orwell and all those baby’s shoes that were never worn.

You don’t have to dig too deeply to find such advice about simplicity – page one of Google is full of it. Including, for example, those on this slide:

I hate to speak on behalf of a profession, but as I make up a third of copywriters by mass, I feel I have special dispensation to say that as an industry, we’ve trained writers to master the art of baby-speak – to reign in the fundamental abilities and passion that brought many of us to the industry (quickly after we realise that teaching is an objectively awful job done by a mix of heroes and the insane).

I’m hoping today to convince at least some of you of the benefits of style and recognisable authorship for search – but first I want to tackle the myth of simplicity. I will state at this point, however, that I’m talking about blog and resource content – essentially any language on a page that isn’t “spoken” by your logo. There’s a need for a brand voice in certain places.

One of the reasons, I feel, that we’ve fallen for the myth of ease is that it, obviously, offers shortcuts. There are tools that can give us an immediate yes or no answer as to whether a piece of content is sufficiently ‘easy’.

In an industry that deals with deadlines as much as ours, the concept of a time-saver is promethean, it opens up a world of possibility. I’ve spoken at this conference before, for example, on report automation – something which has saved me weeks over the years. The problem with automatically judging text, however, is that it’s really difficult and often not very well executed.

Take the foundational ‘Flesch-Kincaid’ scale – you’ll have encountered it on MS Word at school, maybe, but it, or something very much like it, powers most of the reading score tools available. The following, from Wikipedia, sums this up pretty well:

As readability formulas were developed for school books, they demonstrate weaknesses compared to directly testing usability with typical readers. They neglect between-reader differences and effects of content, layout and retrieval aids. For example, the pangram “Cwm fjord-bank glyphs vext quiz.” has a reading ease score of 100 and grade level score of 0.52 despite its obscure words.

Similarly, Coleridge’s Rhyme of the Ancient Mariner, a 4000-word epic poem full of grief and madness and albatrosses, has a reading level of grade nine and includes many stanzas such as the following:

He holds him with his skinny hand,
‘There was a ship,’ quoth he.
‘Hold off! unhand me, grey-beard loon!’
Eftsoons his hand dropt he.

Now, as a former teenage goth, I’m a pretty big Coleridge fan, but even now I have to look up Eftsoons every time I read the poem. Nevertheless, this poem falls into the ‘Good’ category in most readability tools.

What may surprise people further, however, is what happens when we test the two paragons of prose – Hemingway and Orwell – against such tools. I’m only going to use two examples of each, so my research here has not been exhaustive. I’m happy to listen to offers to fund further research, but you get what you pay for with free content.

Orwell first, from the piece ‘Why I Write’ – a foundational text on the art of writing:

While it ranks relatively well in terms of SEO (‘writing guide’ was the keyword here, I think), one automated tool scores it at the ‘poor’ level for readability while online app ‘Hemingway’ gives it a grade 8 level but marks a third of all sentences as hard or very hard to read and is generally pretty passive aggressive about the quality of the writing.

Next is ‘Politics and the English Language’ – this piece contains one of my favourite opening paragraphs of any non-fiction, and is another of the great pieces of English writing on the language.

Most people who bother with the matter at all would admit that the English language is in a bad way, but it is generally assumed that we cannot by conscious action do anything about it. Our civilization is decadent and our language – so the argument runs – must inevitably share in the general collapse. It follows that any struggle against the abuse of language is a sentimental archaism, like preferring candles to electric light or hansom cabs to aeroplanes. Underneath this lies the half-conscious belief that language is a natural growth and not an instrument which we shape for our own purposes.

This essay fares even worse than his last attempt – at this point, we’re really considering whether to hire this freelancer for any further projects.

Here we can see that Orwell couldn’t even manage to compete for the keyword ‘politics and language’, while the writing is needlessly difficult, with more than half of all sentences rated either hard or very hard to read. So, we politely disengage from Orwell and return to Fiverr to find a real writer. To wit, we hire Hemingway.

With Hemingway, as with Orwell, I’ve taken non-fiction examples – this one is titled Picked Sharpshooters Patrol Genoa Streets which was, as is the next example, written during his time as a reporter for the Toronto Star in the 1920s – the same decade in which he wrote The Sun Also Rises and A Farewell to Arms. Sadly, Hemingway is unimpressed by Hemingway. Unfortunately, here we go again. Poor SEO, average readability, no keyword performance for ‘increased police presence in Genoa’.

Next up is the piece Two Russian Girls the Best Looking at Genoa Parley, which honestly says a lot about Hemingway, whose missive on the Genoa Conference – one of a number of early indications that the 20th century was not yet done with world wars – ended with:

The Russians are seated. Someone hisses for silence, and Signor Facta starts the dreary round of speeches that sends the conference under way.

It’s better – readability is okay, SEO is okay and performs well for ‘Genoa conference’, tone of voice is consistent. Nevertheless, almost half of sentences are hard to read and Hemingway advises us to think about simplifying Hemingway.

I know that, to an extent, I’m being unfair here – these tools are incredibly useful. I’ve used them myself numerous times. My enemy here isn’t the tools themselves, it’s their position in a larger industry shift that sees complexity as antithetical to clarity.

These articles, both the Orwell and Hemingway examples, are excellent pieces of writing – even if you get the impression that Hemingway was writing them with 20 minutes left to hit a deadline after a late night (which is, after all, how half of all marketing copy is written to this day).

Both writers are rightly held in high regard – but their writing is clear rather than simple, it doesn’t eschew style or polysyllabic words, and is hugely enjoyable as a consequence, even when Orwell takes purposeful and vicious aim at me personally, describing the ambition of some to write as:

Sheer egoism. Desire to seem clever, to be talked about, to be remembered after death, to get your own back on grown-ups who snubbed you in childhood, etc., etc.

The myth of simplicity essentially advises us to treat all audiences as homogenous; it treats language as fundamentally measurable, and it separates texts into categories of good and bad. All of this has led to a proliferation of bland and anonymous writing – and has left us open for a huge increase in bland and anonymous AI generated content. As a colleague of mine put it (thank you Mr. Gossage) weight-loss is simple, but losing weight is hard – there is a necessary disconnect between concept and action, and while it’s a writer’s job to narrow that that gap as much as possible, there is no rule to say that we need to condescend to readers. Instead of simplicity, like Orwell and Hemingway, we should strive for clarity.

The late Bill Slawski wrote relatively regularly on the subject of authorship detection when covering the many and varied patents that Google is granted yearly. In one of the last on the subject – a 2020 piece for Search Engine Journal titled Author Vectors: Google Knows Who Wrote Which Articles, Slawski quotes a Google spokesperson who, a couple of years after the deprecation of G+ authorship markup, reassured publishers that removing the markup was fine as:

We don’t use authorship markup anymore. We are too smart.

When you consider that this comment was made in 2016, that the patent under discussion in the article was granted in 2020, and the leaps and bounds that neural networks and natural language processing have taken over the last couple of years, it’s no wonder that August’s ‘Helpful Content Update’ was announced with an introductory paragraph that included the following:

To this end, we’re launching what we’re calling the “helpful content update” that’s part of a broader effort to ensure people see more original, helpful content written by people [emphasis mine], for people, in search results.

Although this update was announced after I’d already written a couple of thousand words of this, my main surprise wasn’t the announcement itself (although it did slightly rob me of the chance to seem prescient), it was the timeframe. In all honesty, I’d expected an update of this nature to still be a couple of years out – though, from its low-level impact, it may still take that long to reach full force.

Anyone that regularly reads on the topics we cover at this conference will no doubt be aware of the excellent work done by Lily Ray on E-A-T and by Dawn Anderson on aspects of machine learning – what may have been missed, however, is how closely related this work has become in the last couple of years. Increasingly, one is feeding into the other and it’s this iterative process that has seen Google’s ability to judge E-A-T attributes improve.

While the Helpful Content Update will judge a website as a whole, there will inevitably come a point at which the various neural networks that are trained on search data will need to take that to the next level – and that is to be able to judge the authority of a single individual – which will likely involve the analysis and detection of authorship.

This will almost certainly begin with YMYL sites (and the broadening definition of YMYL, and using the additional information on E-A-T that Google provided back in July), and with ‘harm reduction’ in mind – whether that’s in preventing radicalisation (have a look at YouTube before search, maybe, Google) or the spread of medical misinformation, establishing authorship and credentials will have to become an increasingly important aim for Google’s various ranking algorithms.

This, from the Sleep Foundation, is a strong start and indication of what will be required when Google eventually takes that step (or would be if there had been any markup to qualify the author and fact checker), where a staff writer is named alongside a medical professional.

I was so impressed with this when I was looking for sleep advice for my son (at the ironically late hour of 1am) that I took a screenshot to remind myself to have a look at the back end in the morning.

Although nothing had been implemented, you can see that there is an ‘updated’ date, there’s a named expert reviewer, there’s a fact check policy. There are the beginnings, here, of a perfect demonstration of authority and expertise – it just needs markup to render the data machine readable.

However, this is likely only a first step in a direction that the next several updates are leading – a direction that may change how written content needs to be handled in future.

Electronic text stylometry, for example, is a rapidly growing area of research – of the 75,000 Google Scholar results dealing with authorship detection since 2018, for example, 31,000 have been published since 2021 and 20,500 since the beginning of 2022. Partly this boom in paper submissions will be due to the clear and important uses of authorship in tackling online misinformation and radicalisation, but it’s also due to the growing sophistication of neural networks.

The Helpful Content Update was touted as an algorithmic enforcer of decades old best-practice for our content – you write with the user in mind, you write to help rather than to rank – and it would have been, while impressive in its own right, just a supercharged successor to Panda. I imagine it will eventually (or new iterations of it will) have a similar impact on the industry, but it’s unlikely to be able to achieve the successful end of unhelpful content alone.

Between the growing sophistication and use of AI generated copy, the use of content farms, Fiverr and other content-at-scale endeavours, there is a huge problem with content that will be difficult to tackle at the same scale at which it is produced without some kind of authorship identification.

One method is the use of n-grams and neural networks to evaluate authorship – and this has been regularly achieving levels of accuracy exceeding 80% over the last couple of years. If Google is not already working on its own versions of the work being done with stylometry at universities all over the world, then it will be doing so soon – and it is already working with neural networks that are either capable or nearly capable of doing the job.

Pretty soon, therefore, it will be possible to assess the authorship of online content and, therefore, the credentials and expertise of that author to write on a given topic. This will change how brands need to handle the process of content writing.

When authorship can easily be identified, brands will need to:

  • Build expert entities in-house, investing in and training content writers
  • Make external copy a collaborative process with comprehensive style guides
  • Begin to seek fact-checkers for YMYL or E-A-T related subjects
  • Properly markup articles to make it easier to attribute content to authors and fact checkers

These are changes that can be made immediately. Schema and thorough and useful author pages can used to establish unique author entities (though work will need to go into making these recognisable – for which I can recommend the Kalicube Tuesdays podcast with Jason Barnard) and extended to include citations and links to fact-checker profiles and qualifications, while externally written copy should be edited to closely fit with writer rather than brand tones of voice (again, except when it’s your logo speaking).

Not only will this help prepare your content for what’s to come, it is able to directly (if not enormously) impact on rankings immediately.

AI text generation is good and getting better. It can do a lot of heavy lifting – if we remember it’s a tool and not a writer. Google has specifically targeted AI generated content because it is far more likely to be inaccurate and far less likely to be customer centric. This is something that my own experimentation with and reading around various AI content tools has reinforced – if you do not already have the necessary domain knowledge to write the article in the first place, you absolutely will end up publishing inaccurate and potentially harmful content.

So, if it’s necessary to use it (and most of the time it isn’t) let it do the grunt work – let it generate titles and meta descriptions at scale, let it ideate and even – if you’re completely stuck – take a crack at a first draft. Publishing huge amounts of AI generated content may work right now (and there will absolutely be early adopters that will make a lot of money in the short term), but no matter how quickly it develops, its days are essentially numbered and certainly shouldn’t be the foundations of a business.

I assume at some point, given enough writing from an author, you’ll be able to train a model to write like any author – but that, like deep fakes of any nature, is a problem for another day.

Anybody that uses LinkedIn will have seen at least a thousand posts (generally repostings of the same three or four originals) on ChatGPT and how it presages an inevitable end of everything from the college essay to copywriting. The evidence, however, is less than convincing. That’s not to say it’s not impressive – it’s astonishing in places, and is absolutely indicative of a technology which could prove to be transformative. There is one thing you’ll see crop up occasionally – and which I’ve made reference to previously for other developments (voice search, for example) – and that is Gartner’s Hype Cycle:

gartner hype cycle

I think it would be difficult to argue that we are anywhere else in that cycle but the ‘peak of inflated expectation’, but there will eventually come a time when we enter the plateau of productivity, but we’re some way from that as yet. There will be some phenomenal outliers – people who use generative AIs to great success – but for the foreseeable future, content created by machine learning models either is incapable of creativity, or is incapable of maintaining a factual foundation in its creativity.

Another LinkedIn post caught my eye today – by Google’s Chief Decision Scientist, Cassie Kozyrkov. The post began by describing ChatGPT as a GAN – which I found surprising, but I kept reading with one eyebrow raised – then came the following paragraph:

For example, you could ask ChatGPT what it would do if it could fly, and it might respond with something like “I would soar through the skies like a majestic eagle, feeling the wind beneath my wings and the freedom of flight.” This type of response would be impossible for a human to come up with, but it’s perfectly within the realm of possibility for ChatGPT.

Rumbled.

I quickly scrolled to the bottom to find a link to an article which reveals that the post was, of course, written by ChatGPT (the GPT stands for Generative Pretrained Transformer – it’s not a GAN). While it seems many of the LinkedIn commentators were happy to believe the poster would be quite so wrong in public in order to post some casual mansplaining, the post linked to (the one actually written by Kozyrkov) is very good – and featured an excellent reason that Google needs to look hard at authorship:

While I’m delighted by ChatGPT, I’m less delighted by human gullibility and the bumpy ride that all generative AI — not just ChatGPT — will be taking society on. We’ll need to learn different trust habits for our forays online and I shudder when I think of the teething pains as we all get up to speed.

Unless Google quickly develops a way to tackle AI generated content, we face 5 or more years of terrible, inaccurate content undermining trust in not only the industries that choose to use it (or which are serviced by agencies which use it) with little or no oversight, but also in the tech and search engines that will surface such information.

Want to stay ahead of the competition and steer clear of reliance on overhyped technology?

Why not sign up to follow our content – or contact us to see what we (humans) can do for your brand.

let's chat
Facebook Twitter Instagram Linkedin Youtube