Content

What’s Google’s policy on duplicate content?

Oct 15th, 2015

According to Google’s John Mueller, there’s no duplicate content penalty, but it can still cause problems for your site in other ways

What is duplicate content?

The not so short answer is: ‘any page, content or section of content (there was no elaboration on the length of such a section) which is exactly replicated across any URL, whether that is on a www or non-www prefix, an http or https, down to index.html and similar page suffixes including mobile friendly sites, tag pages, press releases, syndicated content and product descriptions.’

The even less short answer is that though duplicate content can take any of the above forms, a lot of the perceived side-effects of duplication come down to more complicated factors and signals which are taken into account when Google filters duplicate content. It is therefore much easier to answer the next question and come back to this in a moment.

What is not duplicate content?

To begin with – something which will come as a relief for some international companies – translated content is not considered to be duplicated – this content, which is recognised as serving a separate purpose to the original content, is not treated as a duplication.

This is also true of content which is duplicated, but location specific – if, for example, a company offers the same services to two local areas sufficiently far away (different English speaking countries, or American states, for example), but the duplicated information is equally significant in both areas, then this will not be considered duplicated and both pages will be indexed to be provided as a primary source for searches in each area. Also getting a free pass is ‘in-app’ content and pages which, though sharing a title and description, contain different content.

The filtering process and why it happens

According to John Mueller, the main concern for Google with duplication is simply that it wastes resources, crawl budget and time – delaying the pick-up of new content and making metrics more difficult to track.

It’s major concern is not, for example, first appearance (in response to a question, Mueller stated that if material is syndicated Google will endeavour to return the best performing content according to an interpretation of intent/locality, not the content’s original location), Google wants to provide the best user experience possible and this means returning a wide variety of search results – something which a significant density of duplicate content can interfere with.

As above, the filtration of duplicate content occurs at three stages – these are:

Scheduling: as Google cannot crawl the whole web the whole time, some duplicate content is detected during the scheduling process – as the URL’s to be crawled are decided, increasing the efficiency of each crawl.
Indexing: the reason being that duplicate indexed items wastes storage space. Therefore, generally speaking, Google will only index one version of the content – unless, of course, it meets certain criteria such as those mentioned earlier, including for localisation, in which case both versions are indexed.
Search: duplicated search results, Mueller stated, are potentially confusing as well as reducing the number of unique results on a page. This is why, even if the duplications are legitimate, you may see the phrase ‘we have omitted some entries’.

Penalties

This, of course, leads to the question – if duplicate content is simply an annoyance, why is it penalised? Well, the simplest answer is from Mueller himself: ‘there are no penalties’.

That does not, of course, mean that nothing with duplicate content is ever penalised, just that it is not penalised specifically for the duplicate content. The sort of sites containing duplicate content that are penalised are scraper sites (sites which automatically skim content from other sites), doorway pages/sites which exist to redirect and other similar or equivalent varieties of sites which are, as Mueller put it, just ‘spam’ and may be punished either manually or algorithmically.

Why you should still get rid of duplicate content

Although there is no direct penalty related to duplicate content, it can still cause problems for your site in other ways – from causing Google to display the non-preferred version of the content in search, to splitting page authority and traffic between two pages which will obviously impact the content’s ranking.

How to tell if you have a duplicate content problem

According to Mueller, clear indications that you have issues with duplicate content are if non-preferred URL’s are appearing in searches, Google search console is showing multiple instances of title and description duplication, or if site crawlers are returning a greater number of page-crawls than you have pages. In addition to this, there are a number of free web resources which can trawl your site for duplicate content.

How to fix it

For a start do not use robot.txt to resolve duplication issues – this simply disguises the duplication from the site crawlers, but does not resolve the problem as both sites will be indexed and the resultant problems stemming from duplication will persist. Google understands, Mueller stated, that some duplication is inevitable in some situations (such as with press releases, or third party product descriptions), the key is to be consistent with your signals to Google – ensure you are using the desired URL everywhere (site map, canonical, href etc.), with Mueller agreeing that “rel canonical” is a good method of overcoming problems caused by multiple product pages (for size/colour etc.), avoid URL variations in CMS, ensure you’re making appropriate use of 301 redirects wherever possible, and that your site is easy to read with a well-structured hierarchy (information on all of this and more technical SEO can be found in our recent cheat sheet).

How to avoid it

We’ve said it before but it bears repeating, be SURE with your content. Make it substantial, unique, relevant and engaging. If you repurpose content, ensure you give it a new spin or narrow the focus, don’t artificially rewrite (miss spaces, change spellings etc.), and adhere wherever possible to webmaster best practice.

The horse’s mouth

To find out more, and to see what we can do for you, contact us today.