Optimizing wordpress (or any CMS) to stop duplicate content penalty and supplemental results
January 17th, 2007
This blog entry follows up on my previous blog about supplemental nightmares ‘Gone Supplemental’, and you’d be well advised to read that one first.
Some of my clients recently pointed out to me that their wordpress blogs were showing huge numbers of supplemental results in Google. A supplemental result occurs when google, for some reason or another, deems that information in a page is not important enough to include in its’ main index. This can be the kiss of death for sites like this one, which rely upon unique information to generate traffic and adsense revenue.
So, how do you stop supplementals with wordpress? Well, after a bit of hunting around, I discovered that the most likely reason for the problem was the structure of wordpress itself – Content is replicated all over the place. For instance, this blog entry will appear under the ‘wordpress’ and ‘seo’ categories to the right, in the archive for January, and in the blog entry itself. In most cases, it will also appear on the main page of your site, but I get around that problem by using the ‘Optional Excerpt’ field in the wordpress editing box to serve a small summary of each of my wordpress blog entries on the front page rather than the whole article.
If you don’t watch yourself, Google will see this replicated content as ‘duplicate content’, meaning, as far as Google is concerned, you are spamming them. This is a quick and easy way to get relegated to supplemental status, and once you are there, it is hard to come back. This is a real problem with CMS systems in general, and I’ve experienced it using Joomla also.
So, what’s the solution?
Well, there are a few, but for ease of use, I reckon you can’t beat a good robots.txt file. This will help prevent duplicate content.
In my particular setup, I’ve used the following robots.txt rules to ban google from crawling the archives, categories and RSS. Note that banning crawls of your RSS directory can be a bit of a hairy banana – many search engines, like yahoo, deliberately look for RSS feeds, and they can improve your popularity amongst other blogs. I’m still not quite sure where I stand there – perhaps I’ll wait for some of your views on that issue.
Check out my modified robots.txt to avoid supplemental results in wordpress here. Also, a decent wordpress plugin that I use on my site to reduce supplementals is DupPrevent, which you can download here
If you don’t understand regular expressions and robots.txt files, google provides a good guide to robots.txt files here.
Update – I’ve had some clarification on whether RSS is considered duplicate content – see RSS and Duplicate Content if you like, otherwise, just be advised that having an RSS feed shouldn’t be a problem. I’ve updated my robots.txt file to reflect that, and include RSS content.Doc