We are continuing our series of guest blog posts to provide you with the most interesting and hands-on insights from the world of SEO and content marketing. This week, the blog guest contributor is Ann Yaroshenko, Content Marketing Strategist at JetOctopus: a full-service SEO website crawler and auditor.
What are duplicates in SEO?
Duplicates generally refer to substantive blocks of information within or across domains that either completely match other content or are appreciably similar (Google’s definition).
Sometimes, having the same content on a few URLs is desired; it’s certainly not something Google’s algorithm will penalise your website for. Nevertheless, if you have similar content to everyone else, you won’t be able to take your brand beyond commodity status and become the industry authority.
“The Lowest rating is appropriate if all or almost all of the main content on the page is copied with little or no time, effort, expertise, manual curation, or added value for users. Such pages should be rated Lowest, even if the page assigns credit for the content to another source.”Google Search Quality Evaluator Guidelines, July 2018
A well-considered SEO strategy should focus on reducing Googlebot crawl expectations and consolidate ranking equity & potential in relevant URLs, and you can realise this goal by minimising duplicate content.
When a search bot finds duplicates, it is trying to pick up the most relevant variant to be shown on the SERP. Your main task is to help bots make the right choice (we will talk about the reliable ways to do this below).
Here are the wide-spread cases in which duplicates may appear on your website:
- Dynamically generated URL parameters;
- WWW/ non-WWW and HTTP/HTTPS pages;
- Printer versions of pages;
- Syndicated posts, etc.
Today, a Googlebot is able to recognise duplicates even before the duplicate has been crawled. (English Google Webmasters confirms this at the 27:38 mark.)
Google is constantly being improved to better detected duplicates, so it’s time to address this problem.
How Does Duplicate Content Affect Your SEO?
One of the crucial criteria concerning a page’s Quality rating is E-A-T (the metrics by which Google’s evaluators rank pages).
Unless the page is Expert, Authoritative, and Trustworthy, it fails to provide value to users and, therefore, to Google. If your website contains multiple pages with largely identical content, it can result in poor User Experience as people are presented with a lot of highly similar content repeated within a set of search results.
Raven’s on-page SEO studyIf your website contains duplicate content, it can negatively affect your rankings and lead to traffic losses for these reasons:
- Googlebots don’t know whether to direct the link metrics (E-A-T, anchor text, link equity, etc.) to one URL or spread them over a few versions.
- Link equity can be further diluted because other websites will choose the duplicates as well. Instead of all inbound links pointing to one URL, they link to a few pages, spreading the ‘link juice’ among the pages with similar content. Since backlinks are one of the main ranking factors, this will affect search visibility.
- Duplicate content leads to poor UX as it is confusing and hard to navigate for users.
Google uses different metrics to determine the health and satisfaction of the content ecosystem, but John Muller said at BrightonSEO 2019 event that there is no magic metric that Google can point out to determine this.
So, the main judge of our content is still the user. Does anybody like scrolling duplicate content? No, it’s boring and time-consuming. The user and, therefore, Google reward original content – it’s a reliable way to build your website authority.
How to deal with duplicate content?
The next question is how to address the similar content issue effectively. We’ve explored Google’s recommendations, and now we are ready to share some meaningful insights:
- 301 redirect. If you are creating a new version of your website, some new URLs can duplicate the previous ones. Implement 301 permanent Status Code in your .htaccess file to smoothly redirect users and search bots to a new location with minimal losses of the PageRank. Note: .htaccess is a hidden file; the fastest and easiest way to edit a .htaccess file is to use the File Manager in cPanel. Here is a full guide on how to implement 301 redirect painlessly. Instead of using the cPanel method, you can change the .htaccess file via FTP or SSH – an easy alternative.
- Syndicate safely. Content syndication is when you take published content on your website and give permission for the webpage to be posted on another website/URL. When you syndicate your content on other sources, Google will pick the most appropriate version for a user’s query, which may not always be the version you’d choose. In any case, make sure the syndicated content contains a link back to your original URL.
You know already that links are an incredibly valuable ranking signal. The number of links to a page remains highly correlated with ranking.
For all intents and purposes, the study already shows a strong correlation between total links to a page and its ranking. So, the more authoritative resources that link to your original URL, the more chance there is that the content on the site will be ranked higher on the SERP.
However, you can also ask other webmasters to use noindex meta tags on HTML pages or in an HTTP header to prevent Googlebots from indexing syndicated versions of the content. That is a reliable way to ensure only your original content will appear on the SERP.
- Review the similar content on your website. If you have plenty of similar URLs on your website, think about expanding each URL or consolidating the pages into one. For example, if you have a tutor-platform website with different pages for English courses, but the same information on all URLs, consider merging these URLs into one page about all available English courses, or you could expand each description of the language courses to contain unique content about each option.
- Help Google to choose the right domain. If you use a few domains (for instance, different protocols to provide additional capabilities), use GSC to figure out how you want your site to be indexed: You can tell Google your preferred domain (for example, http://yourdomain.com or https://yourdomain.com). This feature is currently supported only in the old Search Console:
- On the GCSHome page, click the site you want.
- Click the cog-wheel icon , and then click Site Settings.
- Select the option you want.
- Explain URL parameters to Google.
- If your website contains URL parameters for page variations (for instance, size=S vs size=M), Googlebots can treat these URLs as duplicates. GSC offers a useful URL Parameters tool that helps Google crawl your site more efficiently by indicating how it should handle parameters in your URLs. Note that using the URL Parameters tool incorrectly can cause Google to ignore important pages on your site, with no warning or reporting about ignored pages. Here is the in-depth instruction on how to implement parameter specifications correctly.
Google does not recommend blocking Googlebot access to duplicate URLs, whether with a robots.txt file or by using other methods.
A better solution is to provide duplicate pages to a Googlebot but mark them as duplicates by using the rel=”canonical” link element, the GSC URL parameter handling tool, or redirects.
Since we didn’t want you to have to be scrolling down this article for the whole day, we have covered here the key points concerning why duplicates can harm the authority of your website and have given the basic recommendations on how to deal with this problem. For the creation of this article, we explored a lot of relevant sources. We’ve gathered the most expert content into one list so that you can find more detailed info about duplicates:
- SEW: What is Duplicate Content?
- Google: Steps You Can Take to Proactively Address Duplicate Content Issues
- Hobo: Duplicate Content. A Common SEO Misunderstanding
- Moz: How to Fix Duplicate Content issues
- SEJ: The Complete Guide to Mastering Duplicate Content Issues
If you have any questions about technical SEO in general, and duplicate content in particular, feel free to ask!
Ann Yaroshenko is a Content Marketing Strategist @ JetOctopus. She is certified in Technical SEO, Content Strategy, and Google Analytics. Ann has been a part of the JetOctopus team since 2018.
Text: Ann Yaroshenko