SEO_WordPress: Plugin to maximise search engine positioning in WordPress
June 16th, 2007
WordPress, like so many other CMS (content management systems) has a huge problem with duplicate content – to understand this I need you to think like a search engine spider. So, I want you to close your eyes and imagine yourself as a spider – breath deeply, close all 8 eyes and count backwards from 10… (if you don’t have time to read the full article you can download the plugin now by clicking here).
Robot (Spider) Fundamentals
Ok folks – are you all zenned into the spider frame of mind? No? Well, to help you out, I’ll give you a couple hints about life as a search engine robot (spider):-
- I only have a limited amount of time to visit your site.
- I usually (but not always) arrive via your index page.
- My job is to look over the page I arrive at, save any content I see and then send it ‘back to base’, then follow any links on that page and do the whole thing again.
WordPress is not Spider Friendly
WordPress has a fundamantal flaw – it’s designed for humans (WOW! That’s a concept!) , so wordpress tends to make life difficult for spidey – wordpress puts the same content in lots of different places. Take this post for example – This exact same post will be able to be found in numerous spots on my site:-
- It will be found in the SEO Tools ‘category’ (and any other categories I have it in)
- I’ll be able to find it in the monthly archive for June, 2007.
- I’ll be able to find it on the index page (http://www.utheguru.com) for at least a little while (after which it will gradually sink deeper and deeper into the bowels of the site).
- I’ll be able to find it in the RSS feed for my site.
- It will be available in the form of a trackback.
- Last, but not least, it will be available as an individual post.
So, we have THE EXACT SAME CONTENT replicated all over the place – this problem is known, suprisingly, as duplicate content.
Telling the spider ‘where to go’
Why is this such a problem? Well, put on your spidey thinking cap again – first off, rule number one – spider has limited time. If you check your server logs, you’ll see that the spider only crawls deeply (spends more than a few seconds traversing your site) about every 7th visit – for me, that means about once a week, for other smaller sites, you might only get ‘deep crawled’ every month or so. Google usually only crawls the front page, a few of your newer posts and pages that other sites have linked to – so we need to make the most of our opportunities.
Matt Cutts (the head of the Google Webspam Team) talks about ‘Herding the Bots‘ on his blog, which should give you an idea just how important this is. In short, Matt describes various ways of telling the bots what you consider important pages using tools such as Robots.txt, rel=nofollow and something called the “meta noindex” tag.
Staying out of the ‘supplemental index’
Why is herding the bots so important? Well, another prominent (ex) Googler, Vanessa Fox gives us a hint, I quote:-
“The question I got most often after the session was about the supplemental index. Does having duplicate content cause sites to be placed there? Nope, that’s mostly an indirect effect. If you have pages that are duplicates or very similar, then your backlinks are likely distributed among those pages, so your PageRank may be more diluted than if you had one consolidated page that all the backlinks pointed to. And lower PageRank may cause pages to be supplemental.”
A supplemental page is a page that isn’t as likely to appear when someone does a search for something you’ve written about – you can read heaps more about the supplemental Index and Bot Behaviour on my post about how to get out of the supplemental index. Duplicate content is something you should try to avoid if you want your pages to stay out of the supplemental index.
My Strategy – a combination of robots.txt and noindex
So – how do we avoid this problem of duplicate content and make our wordpress inherently more search engine friendly in one fell swoop? Well, first of all, we start with robots.txt. A robots.txt file tells search engines what they should and should not index. In the case of wordpress, I really don’t want versions of my articles in trackbacks, rss feeds, or archives to be indexed – so, I block them using the following robots.txt:-
User-agent: * Disallow: */trackback* Disallow: /wp-* Disallow: */feed* Disallow: /20* User-Agent: MediaPartners-Google Allow: /
Ok – cool – so, now, when googlebot (or any other robot) crawls my site it doesn’t go near any of those locations (except for mediapatners-google – that’s the adsense bot – we want it to be able to see all pages so that it can make well targetted ads) – so we’re immediately herding Googlebot to the remaining three sources of duplicate content:-
- The copy on the index page (ie, on http://www.utheguru.com).
- Our main copy (http://www.utheguru.com/seo_wordpress-wordpress-seo-plugin).
- The copy (or copies) in the category pages (ie the copy at http://www.utheguru.com/category/seo/seo-tools).
Of these three, we really only want the first two – so, we could potentially robots.txt out the category pages – but that would be a bad idea. Why?
WordPress posts tend to ‘age’ quickly
Posts fairly quickly disappear off the main page as time goes by, but they remain in the categories page longer – If we were to robots.txt out all category pages, we’d run a fairly high risk of having them disappear from the index altogether – googlebot would no longer be able to easily find them and would assume they’d been lost forever. The solution? the meta noindex command – if we add the following command in the <head> section of our category page we’ll tell googlebot that we want it to follow all the links on the category pages – but not actually put the category pages in the index – in essence, herding the bot to our content pages.
meta name="ROBOTS" content="noindex,follow"
The SEO_Wordpress plugin
Ok – so if I’ve done my job right, you should be totally confused by now. DO NOT DESPAIR – the good news is that I’ve written a plugin (based upon one called DupPrevent) that does this all for you. Using and installing the plugin is simple – just download it by clicking here , drop it in your wp-content/plugins/ folder and then activate it using the ‘plugin’ tab in your wordpress admin panel.
NOTE: I realise that some people like to make their own changes to robots.txt. If that’s the case for you, it’s fine. If you have a custom robots.txt, the plugin detects that and will skip the the robots.txt changes so you can make them yourself. If you’re unsure about robots.txt syntax, or anything else I’ve discussed on this page, the Google Webmaster Help Team has put together the following great FAQ.
Extra Note: I had a couple other questions from readers:-
But hasn’t the supplemental index been abolished?
Well, yes, but in name only. Results in the supplemental index used to be designated ‘supplemental’ if you did a site: search. Google has now removed that label, but the supps still remain (read more about the ‘cloaking’ of supplemental results here) – nothing has changed. Supplementals still exist, you just won’t know about them unless you use a tool like oyoy.eu.
What if I already have a robots.txt?
In the situation which you describe, the plugin will detect there is an existing robots.txt, and will let it be. Any existing robots.txt takes precedence over the plugin by default.
What does the /20* mean in the robots.txt you describe?
the /20* means block any pages that start with your domain name (and a forward slash), immediately proceeded by a string starting with the digits 20 – have a look at one of your archives page – the stock standard archive page goes something like this – http://www.utheguru.com/2007/06/ – so this blocks all archives between the years 2000 and 2099 – I think that should be sufficient
What is the ‘head’ section you describe?
The “head” section is an invisible part of your page that gives browsers (and search engines) information about the page – if you are using firefox, you can hit ctrl+u (it might work on the other browsers too) to show the source code of your page. You’ll see the meta robots code is inserted in there.
On that topic – if you like playing with wordpress, I’d suggest you get a firefox plugin called ‘Web Developer Firefox Plugin’ – it’s a great way to easily play with your css files and has heaps of other tools – but you will need to get the latest version of firefox (below) for it to work.
Once you’ve done that, you can download the developer plugin here All the best, theDuck
Entry Filed under: SEO Tools
If you found this page useful, consider linking to it.
Simply copy and paste the code below into your web site (Ctrl+C to copy)
It will look like this: SEO_WordPress: Plugin to maximise search engine positioning in WordPress
130 Comments Add your own
1. proton | June 17th, 2007 at 3:48 am
Thanks for the tips. I’m still a bit confused. I don’t think I should install the plug in since you state that “make sure that you delete any existing robots.txt files for this to work properly”. It seems I have to tweak my robots.txt file instead. My wordpress blog is just a part of my site and I need to keep the robots.txt file active to block other parts of the site from being indexed.
A few questions;
1- What does /20* mean in the robots.txt?
Disallow: /20*
2- concerning this…..
Where is the “head section” of the catagories pages? I went to “edit category”, but could not find it anywhere.
Thanks for any feedback.
Proton
2. DuckMan | June 17th, 2007 at 12:30 pm
Hi Proton! Thanks for your very relevant question – but I have some good news!
You can still install the plugin without any worries – any existing robots.txt takes precedence over changes made by this plugin – I’ve added answers to them in the main text of this post, above.
3. Links, Articles and More &hellip | June 17th, 2007 at 2:37 pm
[…] SEO_Wordpress: Plugin to maximise search engine positioning in WordPress – UtheGuru.com […]
4. Supplemental Savior: SEO-&hellip | June 18th, 2007 at 12:38 am
[…] who told me about this great plugin called SEO_Wordpress which you can download from http://www.utheguru.com/seo_wordpress-wordpress-seo-plugin which should help a great deal with the problems I was discussing in the SOS series […]
5. Google Webmasters Help FA&hellip | June 18th, 2007 at 8:01 am
[…] SEO_Wordpress plugin for WordPress […]
6. Weblog Tools Collection &&hellip | June 18th, 2007 at 3:40 pm
[…] SEO_WordPress reduces duplicate content on your blog. The plugin essentially ‘herds’ googlebot and other spiders to the content you want indexed. The result? A much more search engine friendly blog and better indexing. […]
7. CodeScheme: WordPress PHP&hellip | June 18th, 2007 at 6:14 pm
[…] SEO_WordPress Plugin is interesting in the context of Google’s increased (over the last 2-3 years anyway) use of their supplemental index. The plugin is another method for reducing crawled duplicate content – however, with the use of a robots.txt as well standard noindex… time will tell. Share: […]
8. SEO_Wordpress: Plugin to &hellip | June 18th, 2007 at 10:36 pm
[…] more info and download…. […]
9. Jenny | June 19th, 2007 at 2:26 am
Interesting. I shall try it out.
10. Funky Dung | June 19th, 2007 at 6:51 am
You mention category pages. Any chance a future version will account for Ultimate Tag Warrior tag pages?
11. WordPress, Duplicate Cont&hellip | June 19th, 2007 at 7:38 am
[…] seen recently a lot of plugins for WordPress aimed at taking care of the duplicate content issue in search engines. Don’t get me […]
12. James | June 19th, 2007 at 3:21 pm
I think what Proton was saying is that it doesn’t seem like the category templates in WordPress actually have their own head tags.
They use a php function to call the header which has the head tags.
At least that’s what it seems like in my theme editor. can we add “meta name=“ROBOTS” content=“noindex,follow” to the head section of the header template or does it need to be used specifically for in the category section?
Yes, I know I can just use the plugin but I already have a robots.txt and have the know how to do this manually so what the heck.
13. DuckMan | June 19th, 2007 at 3:31 pm
Sure you can – but header.php acts as the header for EVERY page, not just category pages.
So, to do it manually, you’ll need to insert the meta tag in header.php sometime within the head tag, and you’ll need to use a conditional php statement so that the tag only echo’s on category pages.
14. James | June 19th, 2007 at 3:49 pm
I guess that’s what I’m asking. Does the plugin just add head tags to the category archives? Is it that simple?
meta name=“ROBOTS” content=“noindex,follow”
Can I just add that to my category template or will that throw off the fact that it’s calling the header?
I guess I could just install the dang plugin and see what happens 🙂
btw, thanks for your detailed explanation of the supplemental archive and how google sees wordpress sites. It was very informative.
I actually saw another plugin very similar to this but wasn’t convinced that I should block my category pages from being archived. Your insight helped me a lot.
Thanks again for this plugin.
15. DuckMan | June 19th, 2007 at 3:56 pm
No worries – I enjoy this stuff.
Adding the tags in the category.php will unfortunately not work, as category.php contains the information within the body tags – all the head stuff is in header.php – so if you were to add the tags in category.php they’d look like gobbledegook to google.
Cheers, M
16. Site Bug Fixes : Bob Plan&hellip | June 19th, 2007 at 3:56 pm
[…] I added the SEO_WordPress plugin, which helps get search engines to ignore what they perceive as duplicate content on my […]
17. James | June 19th, 2007 at 4:01 pm
I enjoy it as well. One question though. I installed the plugin and I don’t see any changes in my header.php. Does the plugin handle the robot command without actually adding that line or have I done something wrong activating the plugin?
18. DuckMan | June 19th, 2007 at 4:29 pm
Hi James – the plugin ‘hooks’ into the header.php file rather than actually changing it.. that’s the beauty of plugins – you don’t have to remember the changes you made each time you upgrade.
If you want to check whether it works, check the actual source of your category pages using the technique I describe at the end of the article. (firefox etc)
19. links for 2007-06-19 : Bo&hellip | June 19th, 2007 at 5:17 pm
[…] SEO_Wordpress: Plugin to maximise search engine positioning in WordPress – UtheGuru.com […]
20. Investorblogger | June 21st, 2007 at 4:11 am
Well, my supplementals pages are now down, as are my indexed pages… Is that good?
I don’t know yet.
Kenneth
21. Learn C++ - A tutorial an&hellip | June 21st, 2007 at 10:08 am
[…] imagine my happiness when I ran across another plug-in, SEO_Wordpress, which tweaks robots.txt, so that the villainous Google spiders stay away from all of that […]
22. GadgetLite » SEO_Wo&hellip | June 21st, 2007 at 3:42 pm
[…] must have plugin for users of WordPress—it simply reduces duplicate content on your WP blog (click here to find out more about what that means) through the clever use of a combination of robots.txt and noindex. This effectively […]
23. theDuck | June 22nd, 2007 at 12:44 am
Hi Kenneth!
Thanks for mentioning that – what you mention is normal behaviour – the way this plugin works, your category pages will disappear from the index over time – this may take days, weeks or months depending upon how often and deeply your site is crawled.
Over the same period, you should see more and more of your content pages moving from the supps to the main index – exactly what we want.
Cheers,
Matt
24. theDuck | June 22nd, 2007 at 12:46 am
The take home point is that with this plugin, patience is a virtue – You’re not going to wake up one morning and notice ‘hey – All my pages have come out of the supps!’ – what’s more likely is that it will be a very gradual, very steady process.
Cheers,
M
25. Jompeich d’er Bisen&hellip | June 22nd, 2007 at 6:29 am
[…] Plugin SEO para WordPress, much ainformación sobre robots y METAs […]
26. Yoo | June 24th, 2007 at 11:03 am
hi there, i have a little thought my blog right now has a PR of 2 and this blog has no PR, You didn’t install this plug in to this blog or i am a better optimizer like you?
27. theDuck | June 24th, 2007 at 11:10 am
Yoo,
Toolbar PR is updated only about 3 or 4 times a year – the last update was about three months ago.
Given that this is a relatively new post (less than two weeks), it has no toolbar PR yet – the toolbar PR you are seeing is what this page was 3 months ago – since the page didn’t exist then, it’s nothing.
If you actually go back to my homepage, you’ll see the site itself is PR5.
And yes, I use and have used my plugin for quite some time now.
Cheers,
M
28. A complete list of search&hellip | June 27th, 2007 at 4:57 pm
[…] 11. SEO WordPress Plugin – This plugin is exactly what the name suggests, a smart plugin that creates a robots.txt file for you with the disallow statement to block certain directories from being indexed by the Google bot. I had not seen this plugin on most sites, but I currently also use a robots.txt file manually to block trackbacks, feeds, archives, and other directories from being indexed. I think this will be a very useful and effective SEO plugin. Download SEO WordPress Plugin […]
29. Wordpress plugins - SEO W&hellip | June 28th, 2007 at 10:42 pm
[…] always thought wordpress is great for SEO by itself, but you never know, right? Anyway, here is the SEO WORDPRESS PLUGIN for you to download. Have fun with it. Share and Enjoy: These icons link to social bookmarking […]
30. Terry | June 29th, 2007 at 12:33 pm
I activated the plugin but I did not find the robots.txt in my root folder afterwards so I manually added one with a few extra lines relevant to my site.
I have my wordpress installation in a subfolder called /wp and I wonder how the robots.txt file should be written to allow for this.
31. theDuck | June 29th, 2007 at 1:04 pm
Googlebot and Yahoobot both understand the wildcard (*) which means ‘any string’ so I’ve used that in my robots.txt to make this applicable to people like yourself – Test that it works as desired in your google webmaster tools account using the robots.txt validator.
Good luck!
M
User-agent: *
Disallow: */trackback*
Disallow: */wp-*
Disallow: */feed*
Disallow: */20*
User-Agent: MediaPartners-Google
Allow: /
32. Terry | June 30th, 2007 at 12:04 am
Thanks for your quick reply which answered my question perfectly. Very useful site by the way.
33. proton | June 30th, 2007 at 10:55 am
Hi Matt,
I just wanted to update you on how things were going. I installed the plugin about two weeks ago and I can see that it’s now starting to work. The duplicate pages are starting to be removed and pages that were in the supplemental index are now being moved to the main google index.
Now if you can just write a plugin to get google to do it faster!
Thanks for giving us a great addition to WP.
Proton
34. theDuck | June 30th, 2007 at 1:34 pm
Good on you Proton – and thanks for the update.
Folks – Proton, whilst being tongue in cheek, makes a good point – can you speed the process up? Yup – good old fashioned link building – throw a few links at supp pages and that’s often enough to trigger a crawl.
Cheers,
M
35. Things I do to optimize a&hellip | July 6th, 2007 at 7:37 pm
[…] One SEO pack – optimizes titles, makes sure Google doesn’t spider duplicate content – or the SEO_Wordpress plugin – both prevent duplicate content spidering and generally improve the site for search […]
36. Jonathan | July 7th, 2007 at 1:03 pm
I have a question: is it SEO genius, or SEO suicide to noindex,follow the home page? The reason I ask, the only things on that page are the posts, and by the time it gets indexed, those posts have already vanished. Doesn’t seem very helpful.
Also, if I go ahead and do this, do I still keep my PR to give to those new posts? Or is that just written off? I’ve already index,nofollow’d pages of the index, as well as archives.
37. theDuck | July 7th, 2007 at 1:39 pm
Hey Jonathan!
Wow – I love it how people keep asking great, succinct questions that make me sit back and think! Cool – thankyou!
Ok – so lets deal with it bit by bit..
If you did decide you wanted to try doing that, I’d ensure that you used noindex,follow – definitely don’t use nofollow on your index page as it’s often the entry page for crawlers (probably because of its high PR – a hunch) – if you used nofollow you’d prob cause crawling difficulties.
So – what’s my opinion about doing that? I think that would be one to consider carefully. Why? Because the architecture of most sites dictates that PR flows to the index page – so you will find (check google analytics if you have it) that a fairly large number of folks come via that page.
If you are concerned about the page causing ‘duplicate issues’ my advice would be to do the following –
1. Use the < -more-> tag on your posts, so that only the first few paragraphs of each post show on the front page (I do this on a regular basis) OR use the ‘optional excerpt’ feature in wordpress to write a short summary of each post – that will then appear on the front page – if you do this correctly, it can actually help enhance your site’s relevancy (ie add more ‘unique content’)
2. Avoid having just one post on your front page – set wordpress to show as many posts (or excerpts) as possible on the front page – this makes the front page look ‘more unique’ and should help ensure it isn’t chosen by google for a given search in favour of your individual posts.
Ok.. on to your second question..
Little known fact – pages that are noindex’d (or excluded using robots.txt) can still accumulate pagerank – after all, they still have pages linking to them, which is what PR is based upon.
But there is a difference (warning – personal opinion follows) – a page that is robot.txt’ed out, whilst it can still accumulate PR, can’t pass it – Why? Because crawlers are excuded from it, so they can’t see where to pass the pagerank it has accumulated. The same would apply if you used noindex, nofollow.
The difference with the meta tags approach is that you can use noindex,follow – meaning that your page won’t only accumulate PR, but it can follow the links on the page (pass PR) – it just won’t appear in the index.
Cheers and I hope I was clear enough with my answers
All the best,
Matt 🙂
38. Jonathan | July 7th, 2007 at 1:53 pm
It was very clear, Matt. I’ll give it more thought. I know not many people have tried it (at least, not on purpose), but my site is a personal blog and it’s probably worth aggregating the data versus any potential losses.
39. Jonathan | July 7th, 2007 at 1:55 pm
Oh, I must add, the main reason for wanting to noindex root is not entirely for the robots. I had thought about the duplicate content issue, but the main thing is if someone google’s “widgets” and your site turns up the first hit—which happens to be your home page—what do they do if they get there and can’t shortly find what they’re looking for? They’ll probably leave. Like you said, I simply want to direct them to the real content.
40. theDuck | July 7th, 2007 at 2:07 pm
Ah yes – I see your point there.
There would probably be no harm done in trying it, but it’s something I’d personally be reticent to do – as if done correctly (ie using excerpts, more tags) it’s pretty rare that Google misses picking ‘the right’ page due to a higher PR alone – Google is pretty darn good at picking the most relevant page, in my experience.
The only exception would probably be if you had a site that was all about widgets – in that case Google would prob pick the index page if you searched for widgets, as it would think – aha – here we have a high PR page, with lots of stuff about widgets – that’s prob our very best bet in terms of relevancy.
The other exception would be if Google had not crawled the post yet, only the index page – in which case the widgets page of course wouldn’t show up – but if that were the case, the index page would only show predominance for that search term until the next deep crawl (usually a number of days for most sites). In that case what would using noindex do?
Well, you’d have NO results from your site showing up for widgets in the intervening period – personally I’d take the traffic, even if it wasn’t laser targetted 😉
Cheers 🙂
Matt
41. Jonathan | July 7th, 2007 at 2:12 pm
🙂 You know, you’re probably right. I knew there had to be a reason no one was doing it! I’ll just go back to adding the more tag. Thanks for your help!
42. theDuck | July 7th, 2007 at 2:14 pm
No, THANKYOU 🙂 It’s cool to get some more thought provoking questions.
Cheers,
Matt
43. The WordPress Podcast &ra&hellip | July 8th, 2007 at 6:38 pm
[…] SEO WordPress reduces duplicate content on your blog. The plugin essentially ‘herds’ googlebot and […]
44. SEO WordPress Plugin &raq&hellip | July 10th, 2007 at 7:48 am
[…] Plugins Page | Download […]
45. Brian | July 11th, 2007 at 1:23 am
Hi everyone,
I installed this plugin last week. I added the lines below to my existing robots.txt and activated the plugin. The problem is that now Google has stopped crawling 21 urls because it thinks that these urls are restricted by the robots.txt. Any idea why that is? It is a result that I certainly wasn’t expecting. Thanks in advance for any help.
Brian
Disallow: */trackback*
Disallow: /wp-*
Disallow: */feed*
Disallow: /20*
User-Agent: MediaPartners-Google
Allow: /
46. Brian | July 11th, 2007 at 3:05 am
Just realized what was causing the problem. Because all of my posts start with the year, month, and day, followed by the post name, the /20* was telling the search engine bots to deny access to the pages. If I had waited long enough every page on my site would have denied the search engine bots access. I had to remove the /20* from the robots.txt to prevent further denial.
The down side is that if I change the date settings for the urls to a name url it will affect the indexing with the search engines. Am I correct with that assumption? If so, the SE bots will crawl the archives. Is there any way around this?
Thanks,
Brian
47. theDuck | July 11th, 2007 at 8:33 am
Hi Brian!
I’m sorry to hear you had problems 🙁 that’s upsetting.
I tried to make the plugin in such a way that it could be ‘all things to be all people’ – hence the reason a manually written robots.txt will override my defaults.
So folks, if indeed you have manually set your permalinks to be of the form url/year/month/day/title – indeed, you must omit the */20* from your robots.txt
Ok – but if you still want to exclude the archives – no problems.
See my tutorial about changing your permalinks – here – and change your permalinks to a less complex style.
But BE SURE to install Dean’s permalink redirection plugin (download it here) first.. this will tell the search engines you’ve changed your url’s – and help make sure that there is no adverse impact.
48. theDuck | July 11th, 2007 at 11:48 am
Brian,
Thanks to your advice I’ve updated the plugin. It now does not include the robots.txt exclusion for archive categories, and instead implements the same thing using noindex,nofollow.
This should fix your problem 😉
Cheers,
Matt
49. Brian | July 12th, 2007 at 6:04 am
Hi Matt,
Thanks for updating the plugin. I downloaded it and installed it last night. Nice to see that you update your plugins when there is a problem. Most don’t bother.
In the time that it took me to write that comment here yesterday, 16 more urls were restricted. I now have 59 restricted urls that are no longer anywhere on Google. (not even in the supplemental index). Now that the problem has been corrected, hopefully they will come back. I had 25 pages that were originally indexed on the first page of Google’s main index to find them within 2 to 3 days in the supplemental index.
I don’t know how much you know about the ultimate tag warrior plugin but was wondering if there is any way around the page duplication due to the page tags that UTW creates. If I deactivate the plugin I don’t know what effect it will have with the search engines.
Thanks again for your help,
Brian
50. theDuck | July 12th, 2007 at 10:31 am
Hi Brian 🙂
No Probs – your disallowed pages will return at the next crawl – that might be anywhere between a few days and a few weeks, but judging from the rapidity with which your site was crawled initially, I would say it will not take long.
Cheers,
M
51. Milan Dinić | July 12th, 2007 at 10:48 pm
Hi Matt!
I found your blog from WordPress SEO plugins list and find your earlier posts very useful and interesting and subscribed to your feed.
I see that this plugin is very good, but must notice that it’s functions had already been implemented in other plugins [EDITED – Yes I know that other plugins provide similar functionality]. What people can receive here is excellent explanation about all that stuff and creating robots.txt file.
I saw that Duplicate Content Cure Plugin add noindex to one more type of WordPress pages — archive pages visible by clicking “next” or “previous posts” on bottom of index page, for example this page. What do you think about that? Is this good or bad to do? I saw that some add that pages even to robots.txt and their sites are very high ranked in Google.
Also, I must ask you something more: if we don’t want that bots index our category pages, is then useful to add rel=”nofollow” on links to that pages, on sidebar for example? Maybe I am wrong, but every “followed” link takes from page’s PageRank, so why to loose it on pages that will not get to index anyway.
Milan
52. theDuck | July 13th, 2007 at 11:04 pm
Folks, I just thought I’d add – don’t expect overnight results from this plugin. Googlebot rarely visits and crawls your site in ‘one hit’, and until it has crawled every page and recalculated your pagerank you won’t see major results. Generally changes with this plugin will take about a month to kick in – patience, grasshopper 🙂
Ciao,
Matt
53. Rafael Slonik | July 14th, 2007 at 5:17 am
Considering the index page almost always have the best PR on the site urls is it a good thing noindex this page?
I was thinking about putting a brief excerpt on home, but the newer versions of WP do not generate this snippets automatically. Do you know some plugin to do that or do you can add this feature to your plugin?
54. theDuck | July 14th, 2007 at 11:12 am
@ MILAN – Thanks for your compliments.
Milan, some might choose to block the ‘older posts’, ‘newer posts’ buttons, but I personally don’t – why? Because I see it as a legitimate form of navigation for the bots.
The robots.txt you reference belongs to John Chow – believe me, there are other reasons (including a monstrous pagerank and a tendency for all his individual posts to be linked from numerous sources) that John Chow is well indexed.
The category pages are one of the key ‘crawl paths’ for robots if you check your logs. If you rel=”nofollowed” them, you’d essentially remove that path – I’d be careful of doing that.
@RAFAEL:
Please see the comments from Jonathan above – I totally agree with you Rafael – this plugin DOES NOT prevent indexing of the index page.
I’m not sure that wordpress ever has generated excerpts automatically? The ability to write your own is definitely still in the new version, and you can still use tags to split up your posts.
M
55. jjk | July 17th, 2007 at 10:42 pm
Hi Matt!
Firstly may I say – excellent blog and blog video!
I have installed your plugin and Dean’s permalink redirect – both work a treat.
My question as a bit of a novice is understanding the:
‘Disallow: /wp-*’ part of the robots.txt you suggest. As there are files and folders beginning with ‘wp-‘ are they all disallowed or only the folders? Also – is it the ‘wp-comments-post.php’ file in the root that handles the posts, which I assume is robots allowed?
Sorry for the divvy question but would love to understand the structure a little more.
Keep up the good work!
jjk
56. theDuck | July 17th, 2007 at 11:44 pm
Hi JJK!
Thanks for your kind words 🙂
Ok – the disallow: wp-* simply stops wordpress from trying to index things like wp-login, wp-admin etc – the * is a wildcard character – it matches anything beginning with wp-.
In wordpress these are things like wp-login, wp-admin etc – all url’s to do with the administration of the site and not things we want google indexing.
Comment url’s are usually of the form http://www.mydomain.com/postname#comments, so they’ll be indexed just fine.
Cheers,
M
57. Milan Dinić | July 17th, 2007 at 11:46 pm
Unfortunately, I don’t have access to my log files, but I see when searching for indexed pages of my blogs that after home page, first are ranked category pages.
What are your suggestion of “nofollowing” in WordPress blog? I added it to link to author page, trackback (I see that you did it to) and RSS feeds?
By the way, above in the post you suggest to add in robots.txt line
Disallow: /20*
, but that is not in your robots.txt and if it is already have “noindex”, why don’t allow it (if you already suggest that for category)?58. theDuck | July 17th, 2007 at 11:55 pm
Hi Milan,
I just totally block RSS feeds using robots.txt – I don’t think it adds value to have them indexed (isn’t it annoying when you do a search and come up with an ugly RSS) + RSS readers / aggregators don’t obey robots.txt, so you’re not going to cause problems by blocking them.
As for the other parts of the blog I choose to noindex, they’re detailed in the post above so I’ll try to avoid repeating myself and refer you to the text, above 🙂
I’ve left the disallow: /20* in my example above for people that prefer to mess with their own url’s rather than using my plugin – this is in part an educational post, and I’d rather leave it there so that people can understand the rationale behind the plugin.
Cheers,
Matt
59. Milan Dinić | July 18th, 2007 at 12:14 am
Thanks again for your answers!
About RSS feeds: why it is important to have links to them if we can already ping them and they are informed about new post? Also, if we link to other page of our site, and that page is not allowed with robots.txt for crawlers, does it takes something from page’s PageRank, as with any other link?
Because of my understanding of PageRank (which is maybe wrong), I thought about adding rel=”nofollow” to pages I don’t want to add them any value, even if they are already in robots.txt, because I then takes from linking page’s PageRank. Sorry if I’m boring with this questions.
Milan
60. jjk | July 18th, 2007 at 12:29 am
Thanks Matt,
My concern is that every single WordPress page or file in my root has a wp- prefix – giving rise to the concern that I’ll disallow every last one of them. I have nothing like /postname#comments. As I don’t have a great understanding my worry was that if my wordpress structure is in any way different I could be disallowing the pages/folders I want allowed when using this in my robots.txt file.
I’m on WP 2.1
Thanks again.
jjk
61. theDuck | July 18th, 2007 at 11:19 am
Milan –
Nofollow is intended to designate links that you don’t trust – these are usually links external to your site, hence the name ‘external nofollow’. I personally wouldn’t take that approach but others have and do – that’s up to personal choice. This link talks a bit more about nofollow.
JJK – You mean things like wp-login, wp-rss, wp-comments-post etc. They are mostly all internal files. If you use a program like xenu link sleuth (free) to crawl your site you’ll see that your content pages don’t use that prefix.
In any case, as I’ve stated before, you can very simply override the robots.txt in this plugin by adding your own in your webroot folder – could be something like this:-
Useragent: *
Allow: /
Matt
62. jjk | July 18th, 2007 at 5:30 pm
Thanks Matt – that’s what I needed to know. I’m more confident now!
jjk
63. Stephen Cronin | July 21st, 2007 at 12:27 am
Hi Matt
Thanks for a) this excellent plugin and b) the excellent comment you left on my site.
Thanks!
64. Vermeidung von duplicate &hellip | July 26th, 2007 at 10:01 pm
[…] werd mich auch für meinen Blog mal eingehender damit beschäftigen, habe aber bereits das SEO_Wordpress Plugin und das All in One SEO Pack installiert, welche bereits gute Arbeit leisten. duplicate content […]
65. Idiotprogrammer » A&hellip | July 27th, 2007 at 9:58 am
[…] the SEO_Wordpress plugin […]
66. 100 WordPress Plugins&hellip | July 28th, 2007 at 11:13 am
[…] SEO_Wordpress Plugin to maximise search engine positioning in WordPress. […]
67. ovidiu | July 31st, 2007 at 7:23 am
regarding i.e. Disallow: */trackback* inside robots.txt
I have been usign this syntax too, but recently found a site that could test robots.txt and the result told me my robots.txt was invalid because a rule did not start with a “/” so now I am quite unsure if its ok to use something like */trackback* inside robots.txt?
can you confirm this is ok?
68. theDuck | August 1st, 2007 at 12:20 pm
Hi Ovidiu –
Aah.. the wildcards. Why have I got them? Well, some people install wordpress in a sub directory – so having the robots.txt wildcarded ala */trackbacks ensures the robots.txt is still effective in such cases – important for a plugin designed to work across all installations.
Robots.txt checkers like the one you describe check against the (old) robots.txt working group specs which have now been superceeded.
Whilst wildcarding is not specifically described under the original robots.txt working group specifications, it IS supported by the major search engines (Yahoo, MSN, Google).
If you’re ever in doubt, the easiest thing to do is go to Google webmaster tools and use their robots.txt validator – it’s a great tool and free 🙂
Cheers,
Matt
69. ovidiu | August 2nd, 2007 at 7:24 am
thx. I totally agree with the use of wildcards, I was just not sure if they are “koscher” – btw. I am using google webamster tools, I have just been waiting 3 days in vain for the googlebot to pick up my modified robots.txt – so meanwhile I asked here.
70. theDuck | August 2nd, 2007 at 9:10 am
Ovidiu – You don’t have to wait for the new robots.txt to be found! You can just cut and paste your new one in to the webmaster tools tester in the interimto see if it works.. that’s the beauty of that particular tool – you can try out loads of different variations and test them in real time.
All the best,
Matt
71. Peter | August 2nd, 2007 at 7:28 pm
Hi, I’ve installed the SEO pack and filled the gaps but after refreshing my page I only got a white page. What could be the problem? There is the line in the header.php.
Thanks
72. theDuck | August 2nd, 2007 at 8:01 pm
Hi Peter – I’ll need a little more information – things like your url etc and a bit more thorough description of the problem would be appreciated.
M
73. jjk | August 13th, 2007 at 1:35 am
Hi Matt,
It appears that the plugin and robots.txt is gradually starting to work. One question – On my Google webmaster tools page in the sitemap section I have a warning that says: “we found that the site’s robots.txt file was blocking access to some of the URLs” – my sitemap is automatically updated and obviosly includes some of the bocked URL`s. I have no idea whether there could be a dodgy conflict therefore should I dump the sitemap or just ignore the warning?
many thanks
jjk
74. theDuck | August 13th, 2007 at 9:49 am
Hi jjk,
Yes, as explained above, this plugin does use robots.txt to stop google indexing duplicate content like RSS feeds etc. Click on the warning in webmaster tools and it will give you a list of the blocked content.
Matt
75. jjk | August 13th, 2007 at 6:34 pm
Sorry Matt – I’m obviously useless at making myself clear. I absolutely understand what the robots.txt file is doing and which files it has blocked and why. My question is should I get rid of my Google XML sitemap as Google is warning me that its contents don’t match what its spider can search (I fully understand that the sitemap lists all pages including the blocked ones like archives etc). This could well be ok – I just wondered if it’s better to dump the XML sitemap to avoid showing the spider a map of the pages and then blocking its access to some?
For other potential users. After around 3 weeks now my correct (non duplicate) blogs are starting to appear again in the Google top 10 having disappeared for a while. This is exactly what I read would happen so I’m delighted.
jjk
76. jjk | August 14th, 2007 at 6:25 pm
I have managed to re-configure my WordPress Sitemap plugin to not include the archives pages. Hopefully this will eliminate the Google sitemap/robots.txt warning.
jjk
77. AskApache | August 14th, 2007 at 9:46 pm
That is really great! I just stumbled onto your site and I can’t believe you beat me to it! SEO Optimized robots.txt template for WordPress
🙂
78. wp seo | August 19th, 2007 at 3:36 am
We just release a new kind of plugin .
Still in french , V 0.1 , just release this day
Give your page the 301 +url , 302 + url , 404 error code from the “edit post page”
Traduction needed ! ( mail can be found on our site if you can help )
http://www.wordpress-seo.com/seo-http-error-manager.php
How it works :
Download, copy to plugin directory, Activate.
When editing you can choose Leave, 301, 302, 404 ( and url for 301, 302 )
Licence , “can be stolen” 😉 and we still dont care of man who still want to give a licence for “nothing” like this !
Comment for make it better welcome
Sorry for my english
++
79. Martin | August 19th, 2007 at 11:42 am
i was looking to create a robots.txt file for my site and found this site, just have a few questions
1 – to use this plugin should i first change my urls to be more seo friendly? they are currently year/month/date/title
2 – i have all in one seo plugin installed and it has noindex for catergories and archives, do i still need a robots.txt?
3 – should i not include archives and catergories in the google sitemap generator?
80. Chuck L | August 19th, 2007 at 3:09 pm
This plugin doesn’t help me optimize each page, category and post does it?
81. theDuck | August 19th, 2007 at 6:28 pm
Martin:- The plugin you’re talking about does not do anything to the robots.txt – if I were you, since you’ve already got it, I’d keep that plugin installed and just manually add the robots.txt
Re: Sitemap generator – you should leave them as is. This plugin doesn’t actually stop google crawling those pages, it just asks google not to index them – but still allows google to parse the page and follow links.
Re: making your url’s search engine friendly – that’s always a good idea to do, but there’s no specific requirement to do it prior to installing this plugin.
If you’d like to know how to make your url’s more search engine friendly, try My Permalinks Tutorial for WordPress here.
Chuck L – I’m not sure what you mean when you say ‘optimize each page, category and post’ – the word ‘optimize’ can mean a great number of things when it comes to SEO. If you can elaborate a little more?
wp-seo – I’ll have a look at your plugin tonight and see what I think.
Cheers,
Matt
82. Martin | August 20th, 2007 at 12:28 am
thanks Matt,
I ended up changing the urls because i thought what you said made a lot of sense in terms of seo, i used the plugin to redirect all my old links.
one thing i did notice was that even though i put the url as name/post id/ when 2 urls were the same wordpress would make one of them have a number on the end to make them unique. so i just stuck with /name/
Matt, i do not have
83. Martin | August 20th, 2007 at 12:41 am
ok, i did’nt think i had a robots.txt but just checked and looked like one has been created for me, but when i check it in the url all i have is
User-agent: *
Disallow:
what should i add to it? i don’t know much about creating robots.txt which is why i was looking for a plugin to do it automatically
84. Wordpress Search Engine O&hellip | August 21st, 2007 at 8:40 pm
[…] SEO WordPress is another SEO plugin that helps you save your blog from triggering a duplicate content alert just like Permalink Redirect plugin except that it uses a custom robots.txt file to handle the search engine spiders. This plugin does not use redirects to set only one permanent URL to your blog posts rather it disallows search engine bots and spiders to access those regions where accidental duplicate content may be found. […]
85. Geekie.org » WordPr&hellip | August 22nd, 2007 at 1:38 am
[…] be resolved. There are many plugins that can help a blogger optimize his WordPress site, including SEO_WordPress, which will assist in deterring search bots such as Googlebot from duplicate pages (like /trackback […]
86. Search Engine Optimizatio&hellip | August 27th, 2007 at 6:16 am
[…] SEO WordPress is additional SEO plugin that helps you spend your journal from triggering a replicate noesis signal meet aforementioned Permalink Redirect plugin eliminate that it uses a bespoken robots.txt start to appendage the wager engine spiders. This plugin does not ingest redirects to ordered exclusive digit imperishable address to your journal posts kinda it disallows wager engine bots and spiders to admittance those regions where unplanned replicate noesis haw be found. […]
87. Josh Galvan | August 29th, 2007 at 12:18 am
Hi Matt,
First off, I love your program. Thank you so much for it. As well as all the information you provide on making sure our blogs are fit for the search engines. Your site is making me an insomniac!! 🙂
But I guess this is more a general question in regards to using your program. I want to specifically create my category pages to have them rank separately for each of their own keywords on top of my main domain. For example, I want to get a good pagerank for http://www.mydomain.com and then on top of that http://www.mydomain.com/category/fill-in-example.
So would it be a bad idea to have the meta tag of ‘noindex, follow on the category pages? Or am I just totally off base in thinking I can successfully achieve the a good pagerank for both the main domain and category page 🙂
Thank you in advance!
88. fa1z | September 5th, 2007 at 12:53 pm
Hi duck
Sory to ask but are this plugin not index tag page ?, i have many problem with tag and duplicate content i i have to rmove UTW, but i love this plugin. And when i see your plugin i think this plugin can remove my problem, but i still don’t know, you said this plugin not index my home page and my category page, but you not speak about tag page, can you help me with this please ?, sory for my english.
89. Is your blog duplicate co&hellip | October 1st, 2007 at 8:56 pm
[…] to Utheguru.com […]
90. Normanski Bytes&hellip | October 3rd, 2007 at 8:32 am
[…] conflicting with WP-Cache version 2.1.2. Thus I deactivated the plugin and used another plugin SEO_wordpress, I really love the functionality of All-In-One SEO but I guess I’ll have to wait until the […]
91. SeanPAune.com » Blo&hellip | October 10th, 2007 at 11:42 am
[…] SEO_Wordpress – This one is pretty geeky, but is just about cleaning up the SEO of your WordPress install. […]
92. BlogcuBlogu.com » W&hellip | October 28th, 2007 at 10:51 am
[…] SEO_WordPress: Sitenizi ziyaret eden ve sayfalarınızı indeksleyen arama motoru örümceklerini siteniz içinde yönlendiren ve buna bağlı olarak indekslenen sayfalarınızı arttırmaya yardımcı olacak, gerilerde kalmış yazılarınızı ön plana çıkaracak bir eklenti. […]
93. Freddy | October 29th, 2007 at 5:22 pm
I’ve never used your plugin before, but I certainly intend to.
That was a very entertaining introduction.
You are an engaging and most readable writer.
94. theDuck | October 29th, 2007 at 5:29 pm
Thanks Freddy for your kind words 🙂 Glad to have helped.
Matt
95. SEO_Wordpress Plugin | iC&hellip | October 31st, 2007 at 2:58 pm
[…] SEO_Wordpress: Plugin to maximise search engine positioning in WordPress Published in October 30th, 2007 Posted by cesarnoel in Tips, SEO SEO_Wordpress builds upon an earlier plugin, dupPrevent. This plugin is designed to reduce the incidence of ’supplemental results’ inherent to WordPress installs. For more information click here. […]
96. Normanski » MySQL O&hellip | November 8th, 2007 at 6:59 am
[…] is conflicting with WP-Cache version 2.1.2. Thus I deactivated the plugin and used another plugin SEO_wordpress, I really love the functionality of All-In-One SEO but I guess I’ll have to wait until the next […]
97. WordPress SEO | sitedefte&hellip | November 17th, 2007 at 9:22 am
[…] klasörüne yükleyin ve eklentiler bölümünden etkinleştirin. Ayrıntılı bilgi için http://www.utheguru.com/seo_wordpress-wordpress-seo-plugin adresini ziyaret […]
98. Nick | November 22nd, 2007 at 1:20 am
Hi,
What is the difference between your plugin and all-in-one-seo-pack?
99. Whatever-ishere | November 22nd, 2007 at 3:56 am
thanks for the GREAT post! Very useful…
100. InternetGeeza.com »&hellip | November 28th, 2007 at 2:36 am
[…] http://www.utheguru.com/seo_wordpress-wordpress-seo-plugin […]
101. Seo Specialist | December 9th, 2007 at 1:47 am
This looks like it has the potential to be a good plugin so im off to put it through its pace.
102. manele | December 11th, 2007 at 11:36 pm
thx for the plugin! is better than to edit the theme all the time 🙂
103. houserocker | December 15th, 2007 at 2:01 pm
thanks so much for this great and easy to use tool!!!
104. Michael audi | January 11th, 2008 at 11:36 am
Thanks for this very nice post 🙂
105. bobby | January 13th, 2008 at 1:01 pm
nıce
106. bobby | January 20th, 2008 at 2:12 am
verry good information thanks
107. Aurelius Tjin | February 8th, 2008 at 2:07 pm
This is obviously one great post. The information are very insightful and helpful. Thanks for sharing all of these.
108. Mister Olympia | February 16th, 2008 at 7:29 pm
Strange but my browser show error after that
109. Hosting | March 7th, 2008 at 5:50 am
I totally agree with the use of wildcards, I was just not sure if they are “koscher” – btw. I am using google webamster tools, Thanks for your article! 😀
110. manele noi | March 7th, 2008 at 10:53 pm
thanks so much for this great and easy to use too
111. hikaye | March 20th, 2008 at 1:03 am
SEO_WordPress: web Sitenizi ziyaret eden ve sayfalarınızı indeksleyen arama motoru örümceklerini siteniz içinde yönlendiren bir eklenti.
112. mirc | March 20th, 2008 at 1:04 am
SEO_WordPress:güzel bir eklenti thank you
113. john | March 26th, 2008 at 3:35 pm
I test it ready
it support wp 2.5 wow!
114. SEO Consultancy | August 27th, 2008 at 8:56 pm
wow this is a cool plugin got it to work first time – nice one!
115. Wordpress SEO Plugin | September 11th, 2008 at 1:02 pm
WordPress is a great blogging platform and getting it search engine optimized really isn’t a hard task. Using proper permalinks and robots.txt file to prevent duplicate content and a few other methods can really make a difference. The All in One SEO plugin is really the best bet.
116. sgk | September 19th, 2008 at 3:07 am
Thank you so much.
117. Bournemouth | September 19th, 2008 at 11:22 am
Thanks for the great tips, I have already used the no follow attributes in my document links and will now be creating a robots.txt to further influence the spiders…
118. Surrey SEO | October 21st, 2008 at 2:09 am
Thanks, that’s really useful, especially the robots file. Thanks for the plug in am going to try it out soon.
119. squeaky | November 6th, 2008 at 7:50 am
I have to say that overall, WordPress is a very good blogging platform. But, without the SEO plugins, it isn’t very search engine friendly.
Since I have been using the all-in-one seo plugin and digging deeper into my robots.txt file, I am getting much better results in the search engines.
120. Seo Specialist | November 10th, 2008 at 9:17 am
Seems this could be competion for the seo all in one addon, I will be giving it a try.
121. subzero | November 25th, 2008 at 9:00 pm
Just want to quickly share this good information. SEO challenge offering US 5000 for the winner. Competition will end in a few months. Find detail information here. Thanks.
122. mrilham | January 12th, 2009 at 5:23 pm
thank you. i need that
123. Chinese | February 17th, 2009 at 1:25 pm
Thanks. We will be looking at the robots.txt files and wordpress for our site.
124. Maths Tutor | April 3rd, 2009 at 1:01 am
I will look at WordPress and see whether it is useful to our site. It looks like it is hard to use for a non technical person.
125. Rockstar Sid | May 20th, 2009 at 1:41 am
Thanks a lot.. implementing the plugin on other blogs.. I assume they will work wonders as I came from the search engine searching for best seo plugins for wordpress 😀
126. SEO_Wordpress Guy | June 28th, 2009 at 7:02 am
WordPress is the best platform for implementing CMS websites for clients, and now a decent SEO plugin, will check it out and recommed to all clients…
127. SEO Consultant | August 12th, 2009 at 8:19 pm
Hello, I liked your article, especially
“So, we have THE EXACT SAME CONTENT replicated all over the place – this problem is known, suprisingly, as duplicate content”
As an seo consultant
I have been thinking about duplicated content, what could be done to stop sompetitors setting up duplicate sites on a .im or .net version of your site and try to get you penalised for duplicated content?
128. bocelli vivo per lei | October 27th, 2009 at 9:00 pm
I have not unsertood one thing:
the plug in you have created, write the meta statement inside the category page?
I am using a template that have not the category page…what may i handle this situation?
Any suggestion?
129. Luwig | December 8th, 2009 at 4:36 am
Hi Proton,
I am a newbie and many of the no/index follow stuff is new to me. I have today installed the SEO and robot txt files prior to reading all your information and questions. Posted in 2007, so are you still active?
I took over the wordpress website and am adding new content and added the meta tags and robot txt file today.
I hope that I selected the correct options on the plug-ins?
I have another problem. Yahoo only sees 2 out of 4 back links to this site. This is better than Google. Google sees NO back links??!! Will this improve with the meta tags and robot txt files or is there some other problem? I want to expand relevant back-links but want to first make sure that they will work in Google.
I will also download the firefox tools. Will this be the latest version in 2009 or will firefox automatically update these tools?
Many thanks for helping newbies and experts
Kind regards
Leostar
130. hdtvshop | January 14th, 2010 at 9:12 pm
Thanks for the great tips, I have already used the no follow attributes in my document links and will now be creating a robots.txt to further influence the spiders…
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed