Optimizing wordpress (or any CMS) to stop duplicate content penalty and supplemental results

January 17th, 2007

This blog entry follows up on my previous blog about supplemental nightmares ‘Gone Supplemental’, and you’d be well advised to read that one first.

Some of my clients recently pointed out to me that their wordpress blogs were showing huge numbers of supplemental results in Google. A supplemental result occurs when google, for some reason or another, deems that information in a page is not important enough to include in its’ main index. This can be the kiss of death for sites like this one, which rely upon unique information to generate traffic and adsense revenue.

So, how do you stop supplementals with wordpress? Well, after a bit of hunting around, I discovered that the most likely reason for the problem was the structure of wordpress itself – Content is replicated all over the place. For instance, this blog entry will appear under the ‘wordpress’ and ‘seo’ categories to the right, in the archive for January, and in the blog entry itself. In most cases, it will also appear on the main page of your site, but I get around that problem by using the ‘Optional Excerpt’ field in the wordpress editing box to serve a small summary of each of my wordpress blog entries on the front page rather than the whole article.

If you don’t watch yourself, Google will see this replicated content as ‘duplicate content’, meaning, as far as Google is concerned, you are spamming them. This is a quick and easy way to get relegated to supplemental status, and once you are there, it is hard to come back. This is a real problem with CMS systems in general, and I’ve experienced it using Joomla also.

So, what’s the solution?

Well, there are a few, but for ease of use, I reckon you can’t beat a good robots.txt file. This will help prevent duplicate content.

In my particular setup, I’ve used the following robots.txt rules to ban google from crawling the archives, categories and RSS. Note that banning crawls of your RSS directory can be a bit of a hairy banana – many search engines, like yahoo, deliberately look for RSS feeds, and they can improve your popularity amongst other blogs. I’m still not quite sure where I stand there – perhaps I’ll wait for some of your views on that issue.

Check out my modified robots.txt to avoid supplemental results in wordpress here. Also, a decent wordpress plugin that I use on my site to reduce supplementals is DupPrevent, which you can download here

If you don’t understand regular expressions and robots.txt files, google provides a good guide to robots.txt files here.

Doc


Update – I’ve had some clarification on whether RSS is considered duplicate content – see RSS and Duplicate Content if you like, otherwise, just be advised that having an RSS feed shouldn’t be a problem. I’ve updated my robots.txt file to reflect that, and include RSS content.Doc

Digg!

Entry Filed under: SEO Discussions,Wordpress Tutorials

If you found this page useful, consider linking to it.
Simply copy and paste the code below into your web site (Ctrl+C to copy)
It will look like this: Optimizing wordpress (or any CMS) to stop duplicate content penalty and supplemental results

16 Comments Add your own

  • 1. DuckMan  |  March 13th, 2007 at 12:05 am

    Well since I wrote that, I have some more tips..

    Write clearly and succinctly.
    link liberally to relevant articles on other sites
    link liberally from your popular articles to other articles on similar topics on your own site.
    GAIN LINKS FROM OTHER SITES
    Make liberal use of the < -- More --> tags or optional excerpts to avoid having the full content of any given article on the index page.

    But by far the most important think to avoid supplementals is inbound links to your site from other sites.

    Definitely I’ve learnt heaps about that along the way, and probably the following article about supplementals is the most comprehensive I’ve written so far.

    M

  • 2. DuckMan  |  March 13th, 2007 at 4:09 pm

    More additions – from a seperate conversation I had –

    _____________________________

    I beg to differ,

    My robots.txt does get the job done for my site, and I see a couple of errors in yours which are going to cause you some issues down the track.

    I’d suggest using the robots.txt checker in webmaster tools with a few dummy url’s

    Some other tips, and these aren’t meant to be critical of you or your new blog – I think providing good information is a noble cause and your blog is new, and supplementals can be a frustrating part of any new blog:-
    1. the majority of your inlinks are either blog comment spam (take for example your comments on seobuzzbox, which are all no-followed, so aren’t going to give you any PR from there) and/or links from link farm style ‘directory pages’, which, again, on the balance of probability won’t give you much (if any) pagerank increase.

    2. Subdomains (ie beta.utheguru.com instead of http://www.utheguru.com are notoriously difficult to get deeply indexed. You’ll see my test site, beta.utheguru.com isn’t even indexed, even though it has links from here and has been around for a couple of months.

    It seems that google makes the bar a bit higher for subdomains. The two ways that I know of to get a subdomain fully indexed are a) time – the so-called ‘sandbox’ effect seems to be the default behaviour with subdomains – expect to wait 3 months or so, or, you can curtail this wait with b) relevant links – high PR links (ie 5 or better) from a site in a related field will get your new subdomain indexed almost overnight – provigil.utheguru.com is a good example.

    3. Duplicate content – you’ll see that the few pages of my site that are in the supps (apart from the comments pages, which are in the supps because alot of my articles don’t have any comments, and hence the comments are worth indexing) are pages where I have extensively quoted other sites – this is also the case with your blog. You’re aim is to document FAQ items from the google webmaster tools forum, which is great. But you need to remember, if you are simply quoting material that is already on those forums, it’s likely google is going to see you as a plagiarist of original content (rightly or wrongly) and hence put your pages in the supps.

    4. Enabling comments. As far as I can see, you need to be a registered user on your site in order to post comments. An interactive site is one that people like linking to. You should consider enabling comments and using something like akismet or bad behaviour to prevent spammers.

    Cheers,

    Doc / theDuck

    M

  • 3. Vin  |  June 8th, 2007 at 12:17 pm

    Hi,
    What about reducing supplementals on Blogger? Any particular steps we can take? Do you think too many labels can also do this?

    http://betabloggerfordummies.blogspot.com/2007/05/googles-supplemental-index-reduce.html

  • 4. Alam  |  June 27th, 2007 at 7:05 pm

    Hi

    How can add custome tittle to each pages in wordpress theme site

    Thanks

  • 5. theDuck  |  June 27th, 2007 at 7:27 pm

    Hi Alam – I’m not quite sure what you mean – the title of each post should be the title of the page when you are using wordpress.

    Some themes come by default with the title (in the page header) in the style “site name – post title”, if you want to change the order like I have, you should edit the header.php file.

    Cheers,

    Matt

  • 6. Supplementals and the Sup&hellip  |  July 24th, 2007 at 12:42 pm

    […] WordPress is one example of a CMS, and it will generally put duplicates of your content all over the joint – for instance, you’ll find this article on the front page of my blog, under the SEO discussions category, and in the archive for March on this site, and they’ll all have different URL’s.Find out about avoiding duplicate content in CMS like wordpress here. […]

  • 7. David Eaton  |  December 27th, 2007 at 6:01 pm

    Hi, I just started my blog using word press and I was getting hits like crazy for about 5 days, then I made changes from to my blog and it really crashed everything.. My rankings in google went from #7 to #55 in one day!

    Here are the things that I suspect happened to me..

    1.. I started using tags, I did not know what these were but the very next day after playing around with the tags

    2. I installed a new bookmarking plugin, “Social Dropdown” it’s a new bookmarking system where it keeps a hidden div near the top of the page, and when you click on the dropdown javascript repositions the bookmarks to make it look like a dropdown, again not sure if that affected my ranking, but I deactivated it for now.

    Here is what I am starting to do..

    I went ahead and added a robots file.. I just copied yours.

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /images/
    Disallow: */trackback*
    Disallow: /wp-*
    Disallow: */feed*
    User-Agent: MediaPartners-Google
    Allow: /

    Next I have all_in_one_seo_pack installed and I went to options->All in One SEO and checked

    Use Noindex for Categories
    Use Noindex for Archives
    Use Noindex for tag archives

    Anyway I hope this helps me out..

    If you get anymore helpful information please post it so I can find it..

    Thanks,
    David

  • 8. theDuck  |  December 27th, 2007 at 8:18 pm

    David –

    I’d be checking over the google webmaster guidelines. There are a couple of things on your site that look suspiciously like paid links.

    Also, some of your content appears to be copied on or from other sites – http://co.mments.com/conversations/11 ring a bell?

    If this isn’t a ‘penalty’ per se, something else to be aware of is that new blogs or blog posts often have a short period where they rank extremely well – from there they then drop back into obscurity where they belong. I suspect this is algorithmic – Google gives new posts their time in the sun to see whether or not they generate attention.

    Concentrate on writing good original content, building good non-nofollowed incoming links and you will start to do better.

    Cheers,

    M

  • 9. David Eaton  |  December 28th, 2007 at 2:40 am

    Hi, M

    Thanks for taking the time out to look at my site, I am most gratefull!

    I do have a some questions, where is the google guidlines located at ? I know I am new, I have never seen them, never really needed a reason.

    I am sure that word press was duplicating content and I just caught it, I am new to word-press and I did not understand that the installed All In One SEO had the noindex option that meant that I could keep out search engines, on certain options, I did not know it was important, so I had them all unclicked.

    Also I was playing around with the site, last night and realized that I have not been using the optional excerpt or what it even for. I know now that this helps keep down the Duplication rule.

    Another mistake I think I made is that I have affiliate links all over the place that lead to http://linkgate.mydomain/X.html I tried to put the nofollow tags in the link, but I think I missed a couple so I placed a robots.txt to remove the subdomin from the index.

    Here is my new main robots.txt

    # Disallow directories

    User-agent: *
    Disallow: /cgi-bin/
    Disallow: /images/
    Disallow: /linkgate/
    Disallow: /blogs/
    Disallow: /newsletters/
    Disallow: */trackback*
    Disallow: /wp-*
    Disallow: */feed*

    And here is the robots.txt os my subdomain

    # Disallow spidering of subdomain
    User-agent: *
    Disallow: /

    I also have google adsence as paid links does this hurt me ?

    Also word-press creates full page RSS feeds and I have not been able to stop this, So other sites can just pull all of my orginal content on to there pages..

    Maybe another mistake I done, was signup under 28 of the most popular bookmarking sites as username “IsThatAScam” and booking marking my own sites from the isthatascam.com domain – I have gotten 3 emails stating that this is not allowed, hum.. I will not be doing this again.

    One smart thing I done, was to create articles, but tring not to use the same content as I posted on my website, I would change them up just a bit to give me unique article, so far I have written one article as a test, I did not see much traffic, from my stats I had more reviews see my content than other vistors.

    Well hopefully I can clean this up.

    Thanks,
    David

  • 10. theDuck  |  December 28th, 2007 at 7:05 am

    Google adsense – not a problem.

    Links to other affiliate sites without any link condom – definitely a problem.

    Duplicate content – not a problem unless you’ve nicked it from someone else – check the link I referenced above.

    The Google Webmaster guidelines are here – http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=35769

    Cheers,

    M

  • 11. theDuck  |  December 28th, 2007 at 7:11 am

    Re: the full page feeds, you’ll find an option in the admin panel of wordpress – Options->Reading (scroll down to syndication feeds) that allows you to post a summary only.

    Alot of what you are trying (creating articles but mashing them up, posting links on bookmark sites) are starting to move towards and area I feel uncomfortable with – you need to be careful about doing things to ‘manipulate’ the system as Google will more than likely eventually penalise you for doing so.

    There are loads of ‘thin affiliate’ sites on the web. Why not try something different and use your writing abilities to put together some really unique, compelling content that might attract visitors in it’s onw right? I know it’s difficult when you have a new site and you want traffic right away – but it works 🙂

    Cheers,

    M

  • 12. Lynette  |  March 4th, 2008 at 6:43 am

    I am looking to add my companies press releases to wordpress and pull these pages that are currently in html format down off the site.

    Am I thinking the correct that not only will our pr department be able to post new press releases them selves bout would also allow good amount of hits? Also, in order for this change to work successfully do I need to allow comments to the pr’s for ranking? I would rather not allow comment to our pr’s.

  • 13. CMS Web design  |  July 8th, 2008 at 11:47 pm

    I am looking to add my companies press releases to wordpress and pull these pages that are currently in html format down off the site.

  • 14. Social Bookmarks  |  October 30th, 2008 at 9:19 am

    I love your thoughts! I normally don\’t even bother to leave comments, but I wanted to let you know that you hit the nail on the head!

  • 15. Webmaster  |  January 24th, 2009 at 3:56 am

    I got PR3 bidding directory hope your infromation would be helpful to me in promotion

  • 16. Purchase Domain  |  February 14th, 2009 at 4:49 pm

    Thanks, there is more reason to comment than ever before!

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Featured Advertiser

Buy me a beer!

This sure is thirsty work - Here's your chance to buy me a beer :)

Links

Feeds

Posts by Month