Ways To Identify Duplicate Content Issues And Resolve Them
Duplicate content poses a great challenge to search engine algorithms. However, the major search engines, such as Google, Yahoo and Ask have refined their duplicate content filters substantially in recent years to make them smarter. In this article we will probe how to identify and match duplicate content with the original copy. We will also see how search applications ascertain which one is the original and which one is a duplicate copy. Many people have asked me if search engines discard all copies, including the original one, whenever their duplicate filters are triggered. My answer has always been no.
Search engine algorithms do not employ magic or pure rocket science, but use the type of common sense we humans use to make calculations.
Types of Content Duplication on The Internet:
- In extreme cases unscrupulous webmasters deliberately attempt to manipulate search engine ranking algorithms, in other to gain more traffic to their products and/or services. This can be by posting a sizable amount of the same content on multiple web pages, domains, and/or sub-domains. Copying a huge portion of text content from a web page and pasting it into another is tagged as duplication by search spiders. These multiple web pages do not provide any real value for user experience, other than pure manipulation of search engines. When uncovered, those web pages will instantly be removed from the search index for a long period of time.
- Canonicalization issue is a common form of duplicate content headache for crawlers. I see a lot of homepages that can be accessed by any of the following examples: http://yourdomain.com, http://yourdomain.com/index.html, http://www.yourdomain.com and http://www.yourdomain.com/index.html. Spiders view these URLs as being different and since their content match with each other, the URLs will be subsequently flagged as having duplicate issue. This causes their rankings to drop significantly. Although, this form doesn’t attract any penalty, the link juice any of the URL gets will be split among all the URLs. The remedy to this problem is to choose a preferred URL and set up a 301 redirect and point the others to it.
- Syndicated content with no backlink. If you syndicate a news content, make sure the sites where the content appear include links pointing to the original copy on your site. If no link to your site exists and the site where you syndicated the news gets crawled first, search bots will certainly give originality credit to the other site and mark your original publication as a duplicate content. Another way to avoid this problem is to ask the sites where you syndicate your publications to add a NoFollow tag to their versions, so search crawlers will not index theirs.
- Some blogs and news sites also unintentionally have duplicate issues. Such problem occurs when a script like WordPress displays a post on the homepage, category, archives as well as the details page, without implementing the NoFollow attribute on the category and archives. Inspect the software you’re using and make sure it does not have such issue.
There is a popular myth that search engines give authenticity of content only to authoritative websites, if such content is found elsewhere. This assumption is totally wrong. You have seen that some duplicate content issues are intentional, while others are not. Whatever the case, make sure your site doesn’t have it.













