Supplementals and the Supplemental Index – a Primer

March 5th, 2007

One of the most annoying (and mysterious) of all seo problems for many bloggers and website owners is the dreaded supplemental index.

In this tutorial / primer, I’m going to aim to give you an idea of what supplementals are, why they occur, how to identify them and how to solve the problems associated with them.

What is a Supplemental?

A supplemental result is defined by Google as follows:-

A supplemental result is just like a regular web result, except that it’s pulled from our supplemental index. We’re able to place fewer restraints on sites that we crawl for this supplemental index than we do on sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.

So, translated into plain english, supplementals are those pages that Google considers not important enough to include in their main index, but not bad / useless enough to not bother indexing at all.

How do I know if I have supplementals?

Firstly, go to www.google.com and enter the search site:www.utheguru.com (replace utheguru.com with your own url). The site: in front of your url is known as a search modifier – there are lots of different search modifiers, but in this case we’re using the site: modifier to tell google to return all pages it has indexed from www.utheguru.com.

There are a few misconceptions about what constitutes a supplemental result. Some people think that supplementals are what is returned when you click on the “repeat the search with the omitted results included” link at the end of a google search. This is not the case.

That link actually shows ‘similar’ content that google thinks might not be relevant to your search, and that content can be supplemental, or non-supplemental in nature.

Actually, a supplemental result is one where the words “Supplemental Result” appear just under the ‘snippet’ (the short description of a site) in a google search. The supplemental results usually appear in the later pages of a site: search, following the main indexed pages. If you click on the thumbnail below, you can see examples of both.

Google Site Search Instructions

Why Do I have Supplemental Results?

Supplementals usually occur for one of the following reasons (in order of increasing likelihood):-

Duplicate content from other sites – have you quoted content from other peoples websites? Does this content make up a large proportion of your page?

Google has sophisticated duplicate content filters that will detect this – remember, it’s ok to quote other sites, but make sure you also have enough good original content on your site to ensure google doesn’t think you are just plagiarising.

A general rule is no more than 50% of any given page should be quotes.If you are concerned about whether you may have too much duplicate content, head over to a site called copyscape (www.copyscape.com) and run your page through their tool.

Duplicate content from your own site – it is a sad fact that many content management systems (CMS) are great at helping beginners spend their time writing great original content rather than trying to learn web-design and html, but really lag behind when it comes to being search engine friendly.

WordPress is one example of a CMS, and it will generally put duplicates of your content all over the joint – for instance, you’ll find this article on the front page of my blog, under the SEO discussions category, and in the archive for March on this site, and they’ll all have different URL’s.Find out about avoiding duplicate content in CMS like wordpress here.

Another cause of duplicate content can be Canonicalization issues – that is where the www and non-www versions of your site are indexed as seperate websites, when in fact they are the same. read more about them in our primer on canonicalization issues here.

Not enough pagerank – is your site more than a few months old? Do you have many other sites linking to you?

If the answer to any of these questions is no, it’s likely that you are in the ‘sandbox’, a kind of purgatory between being indexed and being deindexed.

Some people claim the ‘sandbox’ is an actual step one needs to go through (ie 3 months of not being indexed) while Google gains trust in your site, but that’s just not the case – it’s more about how many people link to you rather than any deliberate ‘temporary ban’ on indexing for new sites.

Don’t believe me? I have one site (www.jaisaben.com) which is almost entirely supplemental – that’s because it is very much a ‘niche’ site, and I haven’t bothered working on it too much – it’s been in the supplementals for months and months – eventually, one day, when it gets enough people linking to it, it will suddenly pop into the main index.

This site (www.utheguru.com) is almost entirely indexed, and was within weeks of me starting it. Why? because it has content that other sites like linking to – as a result, Google considers it an important site, and makes pages I write available in their main index within days.

Is Having Supplementals a Bad Thing?

It can be. Are you presenting ‘niche’ content? If that’s the case, your pages will still be returned as answers to a google search whether they are supplemental or not.

If you are presenting mainstream content, supplementals can be a very bad thing. They make it very unlikely that your pages will be returned by a google search (other than using the site: modifier) at all.

Some people say that once your pages are in the supplemental index, they’ll be there for at least three months (until ‘supplemental bot’ comes for a visit) or perhaps forever. This may have been true in the past, but not anymore. Whether the supplemental index is the end of the road for your site is completely up to you.

My advice? Everyone should aim to have at least 80% of their ‘content’ pages in the main index. It is not that difficult to do.

Supplementals 101 – Bot Behaviour

First, a bit of ‘bot behavioural psychology’ :). I’ve been observing bot behaviour on this site, and others, for many years. During that time I’ve noticed they tend to behave in a set pattern:-

Bot behaviour and the ‘Infant Site’

  • When a site is first submitted, the bots will come and have a fairly deep look at the site, and usually within a few weeks you’ll find your index page listed.
  • From that point on, bots will continue to visit regularly, to check for interesting new content, but they seem unusually reticent to add new content to the google index.
  • At this early stage, it’s very difficult to get anything other than your main page indexed.
  • Googlebot will keep on visiting your site pretty regularly, and at some stage or another you’ll notice some of your other pages appearing in the index, but they will be mainly supplemental.
  • This frustrating cycle will continue forever unless you get the bot really interested by achieving a ‘threshold’ of new inlinks.
  • Once a site has a ‘threshold number of inlinks’ the bot will start to treat your site as ‘an adolescent site’.

Bot behaviour and the ‘Adolescent Site’

  • A site reaches adolescence when it has achieved a threshold number of other sites linking to it – this number doesn’t necessarily have to be large – even 1 link from an ‘authority site’ (page rank 3 or higher) seems to be enough to get a site to this stage.
  • During this stage, ‘deep crawls’ of the site become more frequent.
  • New pages appear in the Google index rapidly after they have been crawled, and usually get a ‘honeymoon’ period – Google figures it will give your new pages the benefit of the doubt, and lists your new page in the main index until it has done a thorough crawl of other sites, and seen whether other pages link to it or not.
  • If Google can’t find other sites linking to your new page, it will often drop back to supplemental status within a few days to a week.
  • During adolescence, the majority of your pages will be in supplementals, and you’ll find that those pages that are indexed are pages that have been directly linked to by other sites.

Bot behaviour and the ‘Mature Site’

  • At some stage Googlebot starts to get really interested in your site, crawls become much more regular, and virtually all new original content is indexed.I’ve heard people say that this is due to the ‘trust factor’ – which I suspect is probably a combination of number and quality of other sites linking to yours, and number of clicks to your site from google searches, indicating relevance.That is the stage this site (utheguru) has now reached, and I generally find any new article I write is included in the main index within a day, and stays there, irregardless of whether other sites link directly to it or not.
  • I call this stage ‘the mature site’, and this is where you should aim to be. Don’t listen to people who say it’s hard – this site is only 2 months old.

In part 2 of this article, I provide strategies that will help you get your pages out of the supplemental index quickly. You can read the next stage of this article here.

{Other Search Phrases – supplemental hell and the mispelt version supplimentals.}

Digg!

Entry Filed under: SEO Discussions

If you found this page useful, consider linking to it.
Simply copy and paste the code below into your web site (Ctrl+C to copy)
It will look like this: Supplementals and the Supplemental Index – a Primer

7 Comments Add your own

  • 1. Pulling pages out of supp&hellip  |  March 10th, 2007 at 10:38 pm

    […] Use copyscape to check for duplicate content on your pages (see my primer on the causes of supplementals here). […]

  • 2. April 2007 Pagerank Updat&hellip  |  April 28th, 2007 at 12:43 pm

    […] forum. This is also often a great way to help trigger an initial crawl on a new site (see my series on the supplemental index for more info on […]

  • 3. Supplemental Results - Di&hellip  |  June 14th, 2007 at 12:13 am

    […] a good read I found regarding supplemental. supplementals and the supplemental index __________________ PS4 Premium Web Directory | Business […]

  • 4. SEO_Wordpress: Plugin to &hellip  |  June 16th, 2007 at 8:22 pm

    […] about – you can read heaps more about the supplemental Index and Bot Behaviour on my post about how to get out of the supplemental index. Duplicate content is something you should try to avoid if you want your pages to stay out of the […]

  • 5. Manu  |  June 18th, 2007 at 8:08 pm

    I am using the exact same robots.txt as yours and I am also using the “WordPress Duplicate Content Cure” and “Bad Behavior” plugins but st6ill many pages of the site show up supplemental in Google?

    How long will Google take to remove these pages from supplemental index?

  • 6. DuckMan  |  June 18th, 2007 at 8:21 pm

    Manu – thanks for your question.

    Everyone has supplementals – even Google does – try a search.

    Having said that, have you read the second part of this post? It gives some very definitive advice about just how to pull pages out of the supps.

    But, to break it down, you need to understand one thing.

    The number of pages you ‘naturally’ have in the supps SEEMS to be inversely proportional to your overall site pagerank – so building links is a great way to enhance your overall site coverage.

    Pulling an individual page out of supps, however, is relatively easy (irregardless of your site pagerank) – you just need to do two things:-

    a) Ensure it’s got lots of unique content (ie a copyscape search doesn’t show up anything else the same)
    b) Have sites directly linking to the page in question – that essentially gives the page additional pagerank and makes it likely it will come back into the main index.

    The good news? Unlike the bad old days, supps are not a death knell for a page – you can quite easily pull a page you consider newsworthy out of the supps (irregardless of your ‘site’ pagerank) within 1 to 2 weeks (even shorter) if you build links directly to it.

    Cheers,

    theDuck

  • 7. SEO Expert Dubai  |  August 27th, 2007 at 11:55 pm

    Thank you for sharing, it’s grate information you have got in here.

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Featured Advertiser

Buy me a beer!

This sure is thirsty work - Here's your chance to buy me a beer :)

Links

Feeds

Posts by Month