Search Results for ‘supplemental’
In a case of short term pain for long term gain long term pain for short term gain, everyones favorite search engine has abolished the supplemental index.
But before you go running around your office whooping with delight like I did this morning – STOP. Google hasn’t abolished the supps, they’ve just stopped telling us which pages are in supps.
What’s that mean to the average punter?
Well, it means less questions on the webmaster forums starting with ‘why are my pages all in the supplemental index’, and less time spent by ‘mom and pop’ sites worrying about it.
Possibly a good move.
Me, well, I’m skeptical about the move. The overriding stated aim of Google is to return quality results. I’ve seen plenty of quality pages in the supplemental index – google has stated repeatedly that the biggest reason for a page being in the supps is NOT a perceived lack of quality, but rather a lack of pagerank.
It’s nice to know they are there so that we can make an effort to bring them into the main index where they belong. Google should be adding MORE tools to help genuine webmasters assess how they can improve their index penetration, not less.
It’s a case of ‘need to know’ – Google now no longer reckons we ‘need to know’ which pages their algorithms consider unworthy of a place in the main index. My initial feeling about that move is that it seems a little paternalistic.
Google has eviscerated the ONLY tool that goes any way toward explaining why a page might be performing poorly.
My take? If they are going to stop tagging pages as supplemental they should just abolish the supplemental index altogether – if a page is being crawled but isn’t in the index, well, we know it sucks – so why lump it in with other results? Put differently, why show us pages in a site: search if they’re not going to rank anyway.
At the moment I’m leaning towards thinking this might have been a (short term) backwards step, although it wouldn’t surprise me if we see some new tools in the Google webmaster tools arsenal to help deal with this prob.
ADDENDUM:- Richard Hearne (www.redcardinal.ie) put it best recently on the google webmaster help forums –
“Of course Google would rather we didn’t discuss or even consider this supplemental index. Then again if Google was serious about fixing issues like these they would scrap the supplemental index… or give us back the supplemental tag so that we can try to fix these issues ourselves. “
August 1st, 2007
Escape the Supplemental Index
So you have found yourself in the Google supplemental index and you want to escape.
Fair enough – unless you are a webmaster / blogger it’s hard to understand just how frustrating it is to find your hard-work ‘binned’ to the supplemental index – but worry no more, it’s easier to get out of the supplemental index than you may think.
In this, part two of my ongoing series on the supplemental index (see part one here – The Google Supplemental Index – A Primer), I’ll be giving you three key steps you can take to get your web page out of the supplemental index and stay out.
STEP 1 – Duplicate Content causes Supplementals
Pick a few key pages on your site, and run them through ‘copyscape’ (www.copyscape.com). If copyscape says you have duplicate content on your pages, this could be the reason for the supplemental status of your pages.
Edit the pages, make them more unique, put any quotes in a
<quote> tag, and try again. Move to Step 2.
STEP 2 – Backlinks, Backlinks and Backlinks
So you have a page in the supplementals, it is brimming with unique content, and you just can’t wait to get it out – it’s not hard. I have used this technique many, many times, and if done correctly you’ll find it helps bring your whole site from the ‘infant’ status I spoke about in my previous article to ‘adolescence’.
- Find a page on your site that is in the supplementals, that has heaps of unique content, and note down the url of that page.
- Find a site that has PR3 or better, and allows you to post your url.
- If you don’t know what Pagerank is, I define it in my article about nofollow
- Don’t know how to discover pagerank? You can do so by getting Firefox with Google Toolbar (download it from my toolbar to the right)
- Post your URL on that page, using descriptive anchor text. (eg, if your page is about widgets, the link should say ‘widgets’ if possible).Try to make your link a deep link – like www.utheguru.com/301-redirects instead of just www.utheguru.com
- Can’t find somewhere you can post a link? Some tips:-
- Your host’s forum / bulletin board (make sure that they aren’t no-following links).
- A friend with an established website (a link from the first page is always best)
- Another of your own websites (I’ve done this before and it works)
- Paid editorial.
- DO NOT subscribe to link exchange schemes, ‘free’ directory listings or other such ‘offers’. At best, they don’t work, at worst, they can get you penalized.
This strategy has worked without fail for me.
Use it, and expect your target page to be out of the supplementals within a week or less.
Some people call it giving a page ‘link juice’, or ‘link love’ – whatever you call it, it works.
STEP 3 – Submit a Sitemap to Google
Google webmaster central, and Matt Cutt’s Video about Webmaster Tools will bring you up to speed about this process.
To generate the sitemap for submission, I highly recommend the following free tool.
Why submit a sitemap? Well, you’ve gone to the effort of getting Google ‘interested’ in your site, so you want to give it the best chance possible of indexing your site properly.
A sitemap will help it do this.
Tomorrow, In part three of this series, I’ll be talking about strategies that will help to KEEP your site indexed.
This advice should help you to progress to a ‘mature site’ that is crawled and indexed regularly, without the need for further intervention to keep new pages from going supplemental.
Cheers,
TheDuck
July 20th, 2007
This post follows on from my tutorial about pulling pages out of the supplemental index.
A reader at Google Webmaster Help Forums has asked me if it would be possible to post a link to his site about classical music to try and pull one of his pages out of the supplementals.
SMc writes:-
I have had difficulty with one page from my site that insists on staying in the supplemetal index. I had a mis typed URL that I subsequently made a 301 redirect back to the correct link. Now both the bad URL and the good are in and have remained in the supplemental for ages and I cant seem to shift it. Would it be possible for you to throw a link at that page for me to try to force it back out ?
Your site has a couple of other probs that may be causing the supps SMc (in particular check for suplicate content), but let’s try and see if it works.
Here is a link to SMc’s Classical Music Site – By the way SMc – some tips:-
- I don’t remember where it was that I read this, but google prefers short URL’s – the physical length of the URL, and ‘depth’ of the URL (depth of directories) should be kept to a minimum. If someone has a link I think it was Vanessa Fox that talked about that.
- You should keep the number of links on a page to less than 100, if possible (see Google’s Webmaster Guidelines) –
- Every page should be 2 or 3 links from the home page.
- Links from ‘related’ websites with high PR probably carry more weight than links from ‘unrelated’ sites (ie mine versus a music site).
- Use copyscape to check for duplicate content on your pages (see my primer on the causes of supplementals here).
Cheers,
theDuck
_________________
Follow-up – it worked 🙂
M
March 10th, 2007
One of the most annoying (and mysterious) of all seo problems for many bloggers and website owners is the dreaded supplemental index.
In this tutorial / primer, I’m going to aim to give you an idea of what supplementals are, why they occur, how to identify them and how to solve the problems associated with them.
What is a Supplemental?
A supplemental result is defined by Google as follows:-
A supplemental result is just like a regular web result, except that it’s pulled from our supplemental index. We’re able to place fewer restraints on sites that we crawl for this supplemental index than we do on sites that are crawled for our main index. For example, the number of parameters in a URL might exclude a site from being crawled for inclusion in our main index; however, it could still be crawled and added to our supplemental index.
So, translated into plain english, supplementals are those pages that Google considers not important enough to include in their main index, but not bad / useless enough to not bother indexing at all.
How do I know if I have supplementals?
Firstly, go to www.google.com and enter the search site:www.utheguru.com (replace utheguru.com with your own url). The site: in front of your url is known as a search modifier – there are lots of different search modifiers, but in this case we’re using the site: modifier to tell google to return all pages it has indexed from www.utheguru.com.
There are a few misconceptions about what constitutes a supplemental result. Some people think that supplementals are what is returned when you click on the “repeat the search with the omitted results included” link at the end of a google search. This is not the case.
That link actually shows ‘similar’ content that google thinks might not be relevant to your search, and that content can be supplemental, or non-supplemental in nature.
Actually, a supplemental result is one where the words “Supplemental Result” appear just under the ‘snippet’ (the short description of a site) in a google search. The supplemental results usually appear in the later pages of a site: search, following the main indexed pages. If you click on the thumbnail below, you can see examples of both.

Why Do I have Supplemental Results?
Supplementals usually occur for one of the following reasons (in order of increasing likelihood):-
Duplicate content from other sites – have you quoted content from other peoples websites? Does this content make up a large proportion of your page?
Google has sophisticated duplicate content filters that will detect this – remember, it’s ok to quote other sites, but make sure you also have enough good original content on your site to ensure google doesn’t think you are just plagiarising.
A general rule is no more than 50% of any given page should be quotes.If you are concerned about whether you may have too much duplicate content, head over to a site called copyscape (www.copyscape.com) and run your page through their tool.
Duplicate content from your own site – it is a sad fact that many content management systems (CMS) are great at helping beginners spend their time writing great original content rather than trying to learn web-design and html, but really lag behind when it comes to being search engine friendly.
WordPress is one example of a CMS, and it will generally put duplicates of your content all over the joint – for instance, you’ll find this article on the front page of my blog, under the SEO discussions category, and in the archive for March on this site, and they’ll all have different URL’s.Find out about avoiding duplicate content in CMS like wordpress here.
Another cause of duplicate content can be Canonicalization issues – that is where the www and non-www versions of your site are indexed as seperate websites, when in fact they are the same. read more about them in our primer on canonicalization issues here.
Not enough pagerank – is your site more than a few months old? Do you have many other sites linking to you?
If the answer to any of these questions is no, it’s likely that you are in the ‘sandbox’, a kind of purgatory between being indexed and being deindexed.
Some people claim the ‘sandbox’ is an actual step one needs to go through (ie 3 months of not being indexed) while Google gains trust in your site, but that’s just not the case – it’s more about how many people link to you rather than any deliberate ‘temporary ban’ on indexing for new sites.
Don’t believe me? I have one site (www.jaisaben.com) which is almost entirely supplemental – that’s because it is very much a ‘niche’ site, and I haven’t bothered working on it too much – it’s been in the supplementals for months and months – eventually, one day, when it gets enough people linking to it, it will suddenly pop into the main index.
This site (www.utheguru.com) is almost entirely indexed, and was within weeks of me starting it. Why? because it has content that other sites like linking to – as a result, Google considers it an important site, and makes pages I write available in their main index within days.
Is Having Supplementals a Bad Thing?
It can be. Are you presenting ‘niche’ content? If that’s the case, your pages will still be returned as answers to a google search whether they are supplemental or not.
If you are presenting mainstream content, supplementals can be a very bad thing. They make it very unlikely that your pages will be returned by a google search (other than using the site: modifier) at all.
Some people say that once your pages are in the supplemental index, they’ll be there for at least three months (until ‘supplemental bot’ comes for a visit) or perhaps forever. This may have been true in the past, but not anymore. Whether the supplemental index is the end of the road for your site is completely up to you.
My advice? Everyone should aim to have at least 80% of their ‘content’ pages in the main index. It is not that difficult to do.
Supplementals 101 – Bot Behaviour
First, a bit of ‘bot behavioural psychology’ :). I’ve been observing bot behaviour on this site, and others, for many years. During that time I’ve noticed they tend to behave in a set pattern:-
Bot behaviour and the ‘Infant Site’
- When a site is first submitted, the bots will come and have a fairly deep look at the site, and usually within a few weeks you’ll find your index page listed.
- From that point on, bots will continue to visit regularly, to check for interesting new content, but they seem unusually reticent to add new content to the google index.
- At this early stage, it’s very difficult to get anything other than your main page indexed.
- Googlebot will keep on visiting your site pretty regularly, and at some stage or another you’ll notice some of your other pages appearing in the index, but they will be mainly supplemental.
- This frustrating cycle will continue forever unless you get the bot really interested by achieving a ‘threshold’ of new inlinks.
- Once a site has a ‘threshold number of inlinks’ the bot will start to treat your site as ‘an adolescent site’.
Bot behaviour and the ‘Adolescent Site’
- A site reaches adolescence when it has achieved a threshold number of other sites linking to it – this number doesn’t necessarily have to be large – even 1 link from an ‘authority site’ (page rank 3 or higher) seems to be enough to get a site to this stage.
- During this stage, ‘deep crawls’ of the site become more frequent.
- New pages appear in the Google index rapidly after they have been crawled, and usually get a ‘honeymoon’ period – Google figures it will give your new pages the benefit of the doubt, and lists your new page in the main index until it has done a thorough crawl of other sites, and seen whether other pages link to it or not.
- If Google can’t find other sites linking to your new page, it will often drop back to supplemental status within a few days to a week.
- During adolescence, the majority of your pages will be in supplementals, and you’ll find that those pages that are indexed are pages that have been directly linked to by other sites.
Bot behaviour and the ‘Mature Site’
- At some stage Googlebot starts to get really interested in your site, crawls become much more regular, and virtually all new original content is indexed.I’ve heard people say that this is due to the ‘trust factor’ – which I suspect is probably a combination of number and quality of other sites linking to yours, and number of clicks to your site from google searches, indicating relevance.That is the stage this site (utheguru) has now reached, and I generally find any new article I write is included in the main index within a day, and stays there, irregardless of whether other sites link directly to it or not.
- I call this stage ‘the mature site’, and this is where you should aim to be. Don’t listen to people who say it’s hard – this site is only 2 months old.
In part 2 of this article, I provide strategies that will help you get your pages out of the supplemental index quickly. You can read the next stage of this article here.
{Other Search Phrases – supplemental hell and the mispelt version supplimentals.}
March 5th, 2007
There seems to be an increasing number of people that are having their previously high ranking sites either completely de-indexed, or largely sent to the supplemental index. Here I write about the problem, and open my blog to your comments. I’d dearly like it if you all write about your experiences here so that we can hopefully get the attention of Google and get some answers – This seems to have been happening for the last few weeks. Click here to read more
Continue Reading January 25th, 2007
Many CMS and blogging systems like wordpress and Joomla serve content in many different forms.
In the case of wordpress, content is replicated all over the place because it keeps your posts in archives.
This can lead to Google and other search engines inadvertantly concluding that you are trying to spam them with duplicate content. Google is passionate about ‘original content’, so this can result in the application of a duplicate content penalty to your site. Here I give you some advice about avoiding duplicate content in wordpress, but the advice stands for any number of other CMS systems as well.
Continue Reading January 17th, 2007
Been having problems with pages going supplemental in Google? Don’t know what the heck Supplemental is? Don’t worry – here we dash through a few examples of some of the major problems you can have if you’re not careful with your CMS system.. and some potential solutions!
Continue Reading January 10th, 2007
Wordpress, like many other CMS (content management systems) creates duplicates of your posts all over your website.
Having duplicate content can lead to less than optimum search engine listing, and is one of the factors that cause ‘supplemental results’ in Google.
In the following article I describe a wordpress plugin I’ve developed to help address these issues.
You can download the plugin now by clicking here or read the full article.
Continue Reading June 16th, 2007
Hi everyone.. I write this from LA airport – presently waiting for (delayed) return flight – 3.30am departure.
Don’t fly qantas… on the way here, plane was delayed 8 hours, then they lost my baggage, and took 6 days to find it 🙂
Now due to a crash with a baggage trolley, flight back is delayed too… also by 8 hours,,,, oh well – what can you do 🙂
I’ll talk to you all soon.. lots photos
The rest of this post is an experiment (trying to pull my page about how to get out of the supplemental index out of the supplemental index – it’s called irony – how to get out of the supplemental index) – there – done – back on the front page – now do your thing, googlebot. Oh – and another one – http://www.naturalflare.com to help out someone on the webmaster forums.
May 27th, 2007
The April 2007 Pagerank Update
It’s official – as of a few hours ago new PR’s are starting to filter through the system – the April 2007 Toolbar PR update is underway! Here’s a few insights about the update and what it means to you, and some tips and tricks you might not know..
Why is my Pagerank Jumping Around?
When a toolbar PR update happens, it doesn’t happen all at once – Google has many ‘datacenters’, and your new PR will ‘percolate’ between those datacentres over the next few days to a week.
The PR shown in your toolbar is usually taken from a relatively random datacenter – for that reason, you’ll tend to see your toolbar PR jumping around alot – this isn’t an indication of any kind of penalty, or anything unusual – it’s just an indication that the PR update is underway.
You can see your PR over the various datacentres at http://www.oy-oy.eu.
What is Pagerank (PR)?
PR, or pagerank is one of the factors used by Google to calculate the importance of your site. Importance is different to relevance – you can have a very low PR site and still outrank much higher PR sites that don’t have content that is as relevant to the user’s search as yours.
People tend to get fixated on PR as it is one of the most visible forms of ‘feedback’ from Google about how your site building efforts are going – and since it only gets updated 3 or 4 times a year, people with active sites (including me) tend to look forward to it.
Should I worry too much about PR?
No. A few reasons:-
- RELEVANCE almost always beats PR if you want good search engine positioning – such things as the words that people use to link back to your site, words in your page, your page title and headings, and words in your url all give Google clues about the relevance of your site. Some people claim that their are 200+ factors such as these that Google uses to calculate relevance.
- Pagerank is generally out of date – it is really, in its most basic form, just a snapshot measure of how many other sites link back to you (and how many sites link back to them).
- You can have a PR 0 site and still beat much higher PR sites in a Google search if you concentrate on RELEVANCE.
As time has gone by, Google has got much better at gleaning ‘relevance’ from a page – and with that enhanced functionality, the relative importance of PR (which was probably once the major contributor to search engine positioning) as a factor in calculating your search engine positioning has been diluted by these other factors – but it is still a factor, and it is worth aiming to improve your PR.
Tips and Tricks to Improve your PR
Well, it’s too late now for this update, but if you’d like to work towards improving your PR (and site traffic) you need to get more sites linking to you, and preferably sites with high PR. Here’s some tips off the top of my head:-
- LINK OUT – link to sites that interest you. This has two effects – it makes your site much more informative for your readers, and it also helps other sites (the target of your link) learn about you. Whilst it is counter-intuitive that linking out will improve the number of sites linking to you, it does. Why? Because it tends to increase your readership. A site with lots of readers becomes a site that people want to link to. Also, people with active sites tend to spend alot of time monitoring who is linking to them – write an interesting article which links to their high PR site, and it’s likely they’ll come and check out your site – if you are lucky, you might get a link from their high PR site back to you as a thankyou.
- WRITE UNIQUE, INFORMATIVE, INTERESTING ARTICLES – if you ever do a Google search for something and you can’t find what you’re looking for easily you have a great opportunity. Find the answer, and write about it. Chances are other people are asking the same question – and you’ll attract links if you write a good quality blog entry about it. Sites that just regurgitate / duplicate information easily found elsewhere won’t tend to get lots of links.
- WRITE SOMETHING CONTROVERSIAL – this is one strategy fitting under the general banner ‘link bait’. My best performing pages are those that have controversial content :).
- USE SOCIAL NETWORKING TOOLS – Things like mybloglog, feedburner, digg-it etc are a great way to improve your following and traffic. I can pretty much guarantee that links to my site increase proportionally with the amount of traffic I receive.
- MAKE A USEFUL TOOL – many of my links come from my wordpress theme, Blix-Krieg. If you put something on your site that is useful, you will attract links.
- USE YOUR HOST– Many web hosting companies have online forums for their users – often, these forums have obscenely high PR. Write something genuinely interesting, and link back to it from your host’s forum. This is also often a great way to help trigger an initial crawl on a new site (see my series on the supplemental index for more info on this).
- BE A GUEST POSTER. Many sites (including mine) allow users to submit their own articles for inclusion – take advantage of the opportunity – write an interesting, relevant article and ask the owner of a high PR site if they’d like to include it – with a link to your own site in the body.
Also check out this page on the top 13 things that won’t effect your pagerank by JLH. Actually JLH is an example of a successful blogger that applies alot of these principles – He writes great articles that are often interesting, controversial and informative all at the same time. He links liberally. He uses a broad array of social networking tools.
Now – could I please ask you folks a favour? I’ve written a WTF at technorati – I’d appreciate your votes – it’s my first experiment with social networking 🙂 Click this link to vote.
Any other suggestions, feel free to post – hell, why not add your url to your comments – I remove no-follow from all comments after 14 days if they are relevant.
All the best,
Matt
April 28th, 2007

The ‘Scraper Site’ – Benevolent Friend or Deadly Foe?
One of our regulars, Susie J, left the following question for me this morning –
Can you have a bad link? I checked my inlinks through technorati. A few stood out with questions marks. Here’s a couple of them:
cold remedies
cancer research
These sites do not have any of their own content — just a list of other sties. There is a link to my site to a specific article — but it does not identify my site by name.
Hiya Susie – these are called ‘scraper’ sites.
I’ve got several of them linking back to me too.
There are a number of things you need to consider first before you get too worried about them.
Links from a Bad Neighbourhood – Good or Bad?
Is it bad to have them linking back to you? Well, there are a number of different perspectives on that.
I’d say this right off the bat – Google knows that you can’t help who links to you, so it is impossible to get an official Google ‘penalty’ from such a site linking to you.
If that were possible, I could set up a mean link farm violating every one of Googles webmaster guidelines, and get my competitors struck off Google’s index just by linking to them from my Uber-evil site.
The only exception, of course, is if you link back to the scrapers, in which case it is possible (but unlikely) that Google may consider you’re participating in some link exchange scheme with them and you might get penalised – that’s called linking to a ‘bad neighbourhood’.
Whether or not links from these sites is good or bad from an SEO perspective is a different matter.
What’s their game?
I had the following discussion about this with a few of my SEO friends a few months back, and the general consensus is that those sites are trying to get good search engine positioning by fooling Google into thinking that they are authorities on a particular topic – such as the common cold, in this instance.
Since they link back to me, I don’t get overly perturbed about them, but I have been puzzled about what their game is – because:-
- They can’t be after Pagerank – who’s going to link back to a site with no real information? (except people like us, wondering why they are linking to us – but you’ll note I nofollowed the links to them)
- They aren’t stealing content – they are acknowledging the source of the content.
- They aren’t MFOA (made for adsense) as they (mostly) aren’t displaying ads YET.
So what’s their game? Well Susie, I got your message this morning right after I got back from the gym. I’ve just had a shower (my thinking place) and I believe I may have their strategy sussed.
I reckon they have the same opinion as me – make your outlinks count. Whilst linking out to other sites does, by definition, reduce your pagerank, the effect on your search engine positioning can actually be positive.
This is somewhere along the same lines as ‘it’s not what you know, it’s who you know’ – if you link to a lot of other sites about a topic you start to look like an authority in that topic.
A Devious Black-Hat Scheme..
So search engine positioning is really a combination of relevance and pagerank. So in this case, they are trying to gain relevance in the topic of ‘the common cold’.
I think their strategy might go something like this.
- Use adwords to find some lucrative keywords (for instance, I would imagine competition for the keyphrase ‘the common cold’ would be fierce, so it would be lucrative).
- Crawl the net looking for articles about ‘the common cold’ – or better still, just do a Google or technorati search for the phrase.
- Take small snippets of those articles, and link back to the origin, thus reducing the likelihood of being reported as spammers (after all, everyone likes being linked to).
- Cobble together a large number of snippets in such a way that it’s unlikely that the density of information from any one source is suspiciously high on the page (thus avoiding the possibility of triggering a spam flag or duplicate content penalty from Google – and being deindexed or sent to supplemental).
- Wait to be crawled by Google.
So now, what do they have – they’ve got a keyword rich page, full of relevant links to topical pages about the common cold.. If I’m an automated robot I’m beginning to figure ‘hey, this looks like an interesting page about the common cold’.
So, they’ve got relevance – all they are now missing for good search engine positioning for the phrase ‘the common cold’ is pagerank (PR). Easy fixed – buy a link from a high pagerank site, or indeed (since these people likely have heaps of sites) throw a link at the page from several of your high pagerank sites, preferably in a related field.
Now Comes the Traffic.
VOILA! You’ve got pagerank and relevance – you suddenly appear to Google to be an authority on the topic of ‘the common cold’.
So hopefully, since you’re now the new authority on the common cold, you’ve got great search engine positioning too – and with positioning comes traffic – lots of traffic.
Sir Richard Branson started his empire by standing out the front of potential locations for his record stores, and physically counting the number of people walking past each site per day. He knew that the more people walking past the better – this is the online equivalent.
Think about it – the two links you sent me are scraper sites about cancer and the common cold. Hands up anyone that doesn’t know someone who’s had a cold this winter? Hands up anyone that doesn’t know of someone affected by cancer?
These keywords weren’t chosen by accident – they both have potentially very high traffic!
Money – lots of money, with Adsense.
Here’s where the brilliance lies – since the site doesn’t really give any answers, the first thing people are going to want to do when they get to the site is go elsewhere – so, what to do with all this traffic?
BRING ON THE ADSENSE. Scatter adsense all over the site and make clicking them is the only real way of escaping. Remove the links back to the original sites (after all, you only had them there to make yourself look legit and stop people from reporting you as spammers) and you’ve successfully run the black-hat gauntlet and probably made a motza on your lucrative keyword.
These schemes are all about maximizing traffic and hence financial reward.
They don’t expect to be around long before they are taken down or detected. This is probably the reason they choose very high traffic keywords – so that they can make hay while the sun is shining.
So is having links from scraper sites bad for me?
So from a net useability perspective, sure, these sites are bad for everyone.
I can remember when I first started surfing the net way back in the mid nineties, you could search for just about anything and it would return a multitude of links to porn – back then all you really had to do to game a search engine was to have heaps of ‘keywords’ on your page (a favourite tactic was to have a huge list of smut related words at the bottom of the page). Luckily Google’s algo has matured and that just doesn’t work anymore.
Plus – as of last year, the majority of web users are using the web for commerce and business, rather than porn, which had dominated legitimate searches for the entire public history of the net (says alot about human nature hey?). So these days, the majority of these schemes are in it for adsense income.
Don’t know about you, but I don’t want to go back to the bad-old days where search results are dominated by useless crud, only this time it’s useless crud with adsense ads rather than asking for your credit card number or offering ‘free previews’. Luckily, so far, Google seems to be keeping pace with the spammers and (whilst their is doubtless still loads of money to be made) things have become a whole lot harder for them.
The verdict – good or bad? From the individual short term perspective of your site, being linked to by these sites probably has no effect (at worst) and perhaps even a small positive effect (at best) on your pagerank.
It’s those that steal your content and don’t link back to you that are bad, as (occasionally) Google deems their version of your content the ‘original’ and cans your site to the supplementals as a plagiarised copy.
What can I do about plagiarised content?
A good way to check for copies of your content online is to use the tool called COPYSCAPE.
If it really irritates you that these sites have copies of your material, there are a number of things you can do about it.
First and foremost, most of these sites use some form of spider to harvest your content.
You can try banning the rogue spiders using robots.txt as described in this article, but that approach only works for the ‘well behaved’ bots – those that obey robots.txt. Furthermore, many of these bots seem to harvest their information directly from technorati, so there is nothing you can do about that.
The second approach is to report the sites as a spam site to Google (you can do that in Google webmaster tools – it’s under the ‘tools’ menu described in this article). This gives Google a ‘heads up’ that the site is a spam site.
As for me personally – now that I’ve realised their game, I’ll be reporting these sites.
This goes against my ‘all publicity is good publicity’ ethos, but what the heck – why should they be making money at the expense of legitimate sites.
All the best,
theDuck
March 12th, 2007
A while ago one of my online buddies, JLH, wrote this tongue in cheek post (or should that be boast? 🙂 ), in which he pointed out that he now outranked Google’s own famous Blogger, Matt Cutts for one of his posts.
JLH and I have been having a light-hearted game of one-upmanship for a while now (see the now infamous Banalities of Bananas post here, and my even more ridiculous second attempt to beat JLH on the lucrative ‘Banal Bananas’ keywords here). JLH ultimately prevailed in the Banal Bananas stakes, so I’ve been wracking my brains about ways to beat him since.. so JLH – here’s my chance to match you on this one…
We now outrank Matt Cutts for the search “How to get out of the supplemental index” – granted, it’s probably a temporary fluke, but I thought I’d get mileage from it while I can 🙂

Cheers and Have a great day,
theDuck
______________
Update – for the moment we seem to be holding in 2nd position for the above search, and getting some nice traffic too… must have done something right 🙂
March 8th, 2007
YIPPEE! Today marks the end of our second month in operation.
I write about topics to do with blogging, search engine optimization and improving readership of your site – and I use this site as a ‘testbed’ for many of the strategies I talk about.
I figure that there’s no use talking the talk if you don’t walk the walk. so, here’s a little update about UtheGuru, how we’ve been going over the last month, and where we are heading.
Our Second Month
I gave my regular readers a little update about progress of this blog about a month ago – just a rewind – At that stage, utheguru had the following stats:-
In this, our first month of operation, this blog has gone from nothing to around 350 unique visitors a day, has just hit the magic 1000 inlinks stage, and continues to grow extremely rapidly.
I’m glad to say that the trend has continued:-
- Recently, we reached 100 unique blogs linking to us in technorati.
- We now have 8000 sites / pages linking to us directly according to yahoo – an 8 fold increase in one month.
- New pages from this site regularly rate in the top ten searches on Google for my intended keywords / keyphrases.
- Google is crawling the site more regularly, and new pages tend to be listed within a day of posting, which is a great improvement.
- Over 90% of our content pages are in the ‘main index’ – previously many new pages dropped straight into supplementals – so this is a great improvement.
- Daily readership has doubled, and around 20% of our visitors are repeat offenders, which is heart warming.
Forum?
Some readers have recommended that I should add a user forum to this site. I’m liking the idea, as I reckon it would be a great way to hear more of your voices / questions on this site and get more reader participation.
Having said that, I’ve made the mistake of starting a forum too early on another of my sites, and I’d hate it to be there with no users. I’ll make a promise – when we reach 3000 unique visitors a day, I’ll launch a forum. We’re at about 1000 at present, so we’ll see how long that takes
No-Follow
WordPress comes with user comments no-followed by default. As you can probably gather from my post about no-follow, I’m not exactly a fan of it. As such, I’ve now implemented a policy similar to JLH’s Do-Follow Policy – All url’s in comments posted on this site will now carry link weight (for pagerank purposes) after 14 days – meaning you can increase pagerank for your favourite sites (and your own) by posting comments on this site.
The 14 day ‘probation’ is so that I have time to remove spammy comments that don’t add to the discussion.
Income
Well, it is an aim of mine to make income from this blog. This month has seen a noticeable increase in Adsense income – we’ve gone from two pinches of salt to three 😛 .
Adsense CTR seems to be much lower on ‘technical’ sites like this than other sites, but I’ve found that by writing some articles as beginner tutorials, I’ve been able to improve this.
And anyway – income at this stage is a minor consideration with this blog – I’m more interested during this early phase in building readership, gaining pagerank (through people linking to my site) and getting a good search engine presence. So far, those aims have met with success.
Over the next month, regular readers will notice that I’ll be starting to include a few more ‘off-topic’ posts – these will be things like discussions about technical gear and other things that are likely to build the readership, and bring in a wider demographic – whilst also hopefully improving the CTR.
I’ve also gained income from helping users of my theme ‘BlixKrieg’ and others with general customization and SEO advice – thanks to those folks for their support, and I’ll be announcing a new service shortly that will expand upon that theme.
Readers and Participation are Everything!
I continue to be grateful to those members that regularly email me and write comments in support of this site – it’s really great to see people getting interactive and writing down their points of view.
Thanks Folks! I appreciate you heaps, and look forward to giving another update soon 🙂
March 8th, 2007
Adsense and Off-Topic Ads
So, you’ve got your blog / website, you’ve signed up for adsense, and you’re all ready to make money – but you keep getting weird, off-topic ads.
This is part two of my series about Tips and Tricks with Google Adsense (see part one here), and I’m going to use it to tell you about something called Adsense Section Targeting. First up, I’m going to give you a few insights I gathered from Michael Gutner (Partner Manager, Google) during my recent conversation with him.
How Google Adsense Works
When you place adsense ads on your site, an automated software robot (called ‘mediapartners’) usually comes to look at the content of your new page within a few minutes. This content is then run through a rather complex algorithm. The algorithm looks at things like:-
- The textual content of your page.
- Keyword Density (ie, what words and phrases appear regularly on your page)
- What sites your page links to.
- Your pages header, and keywords in the url.
Once that’s been done, adsense tries to work out what your page is about, and then, according to Michael, it aims to display the ads that will maximise your income by a combination of these two factors:-
- Presenting ads that are contextually relevant to the content of your page, and therefore likely to be clicked (called a high click through ratio, or CTR).
- Presenting ads with the highest possible return per click (called effective Cost Per 1000 impressions, or eCPM).
When Adsense Gets it Wrong
Sometimes, however, adsense seems to get the whole show wrong. As an example, I recently wrote a story about getting pages out of Google’s supplemental index, in which I talked about ‘infant pages’.
Next time I looked, I had ads on that page about colic and baby products.
Does this mean that Google thinks my page is about infants? NO – the adsense robot is a completely seperate entity to the google indexing robot – and I don’t think it works quite as hard at times to work out the real context of a page.
So, probably what has happened is that the adsense robot has checked my whole page and figured out that serving ads for the keyword ‘infant’ would be great, because it is a lucrative keyword.
What’s a lucrative keyword? Well it’s like this – advertisers compete for keywords – in a kind of automated auction – so if I’m wanting to sell acme widgets, and I know I make $1000 per widget, I’m likely to pay more for ads to appear on pages with the keyword ‘widget’ than someone who sells less profitable ajax brand widgets.
It seems that ‘infant’ is probably a lucrative keyword, and in a perfect world, I’d get really high earnings from having ads about infants on my page.
That’s really clever, in a way, but really, it’s quite obvious to me as a human being that the technical types on my site are probably quite unlikely to be looking for baby products, so my CTR (number of clicks per 100 ‘views’) is going to be quite poor.
Adsense is a computer algorithm, not a human, so it’s ocassionally going to make slip-ups – that’s a given.
So, to get more contextually relevant ads on that page, I can either remove the keyword that’s confusing adsense, or I can use a relatively new tool from Adsense – enter, stage right, a little thing called Adsense Section Targeting.
(stay tuned – more on this shortly)
March 7th, 2007
Did you know that it is possible to ‘steal pagerank‘ by ‘comment spamming‘ – for those of you who aren’t familiar, a few definitions:-
Pagerank – PageRank aka PR is one of the methods Google uses to determine the relevance or importance of a Web page. PageRank is a vote, by all the other Web pages on the Internet, about how important a Web page is. A link to a Web page counts as a vote of support. If there are no incoming links to a Web page then there is no support.
Comment Spam – Link spam (also called blog spam or comment spam) is a form of spamming or spamdexing that recently became publicized most often when targeting weblogs (or blogs), but also affects wikis (where it is often called wikispam), guestbooks, and online discussion boards. Any web application that displays hyperlinks submitted by visitors or the referring URLs of web visitors may be a target.
in short, it is possible to use ‘comment spam’ to gain pagerank by writing comments linking back to your own website or blog on high PR blogs, forums and websites that allow user comments.
Great right? not necessarily. The ‘gotcha’ is that it can reduce the PR of the originating blog through a process known as ‘bleeding pagerank‘. In effect, these user contributed comments look to Google like a ‘vote’ for the target web-page by the originating webpage.
Enter stage right, the NOFOLLOW attribute. Nofollow was introduced to allow website owners to ‘choose’ which links on their pages should be counted as ‘votes’ for pagerank calculation – as per this background to nofollow from Google.
For that reason and others, we’ve recently seen a number of large websites implement no-follow on the majority of posts (wikepedia is a prominent example) and wordpress is now setup to ‘nofollows’ all user comments by default.
So why do I care? Well, I think it’s well accepted that the introduction of nofollow has caused huge fluctuations in the search engine positioning of various websites, as the effect of this change has filtered through the google index.. a bit like the ‘butterfly effect’, such changes to a well establised algorithm can amplify throughout the system and cause something called ‘hysterisis’ – or instability in the algorithm – while the whole system gets back to some form equillibrium.
I wouldn’t mind betting a million bucks that a large proportion of sites that are reporting huge recent drops in their search engine rankings are probably victims of this effect – even if your own site didn’t rely to a great extent on wikipedia links or comment spam links for its page rank, it could be quite possible that a website that links to you did – and so on ad-infinitum.
As for any system in a state of flux, I’d predict the google index will reach a new equillibrium relatively quickly (at least in a few months) and people will adapt new ways of gaining pagerank – but as someone with experience in this area (I did a lot of work that used similiar types of ‘reward algorithms’ as google in my previous incarnation as an Agricultural Scientist).
I see unintended side effects of this change down the track – here’s a little extract from a note I wrote to a Googler recently:-
Your previous missives about nofollow spoke of the fact that it is a great thing, and that backlinks can be built by other white-hat SEO techniques. I’d have to say that, in a lot of cases (one example would be a blog) the backlinks actually only start to build once the content is searchable, so those of us that have rapidly evolving sites designed to answer questions of the moment never ever get listed, even though they provide great answers and unique information – comments on blogs, for example.
My basic feeling is that the big, older, more well established domains are getting bigger and the smaller ones are getting smaller because they never get a chance to have their pages crawled because they are either nofollowed or put in supplemental hell because of ‘a small number of backlinks’, which will only get worse with nofollow.
Could be worthy of a future article – are we getting supp’d because G thinks we are spammers, or are we getting Supp’d because of some other reason? It’s a massive problem that’s diluting the value of Google, for research purposes imho.
Your second point, about not giving any value to links from Blogs is another thing about the (apparent) algo of Google that I find flawed (apart from the growing incidence of supp’s).
In my daily life as a computer and communications engineer with a fairly tangential degree (agricultural science) as well, I’ve learnt a fair bit about gathering information. Obviously, in science in general, the tradition has been that knowledge is built up through lit review and original research.
The original research is then recorded in peer reviewed papers. Any good scientist knows that a good paper is one that references to as many other papers about the topic as possible. This means that that paper can be a first stop for anyone wanting to know what work has been done before in the area of research.
If you take this model, and apply it to the google algo, the lit review is the google search, the peer review is the comments, the paper is the blog, and the references are the forward links.
Google is about information, and building knowledge. For hundreds and hundreds of years, humanity has built knowledge using the above peer review system. It works.
So, what am I getting at – 3 things –
- Blogs are pretty damn close to the peer review system, closer than static pages, IMHO.
By no-following links, bloggers and Google risk penalizing new knowledge rather than encouraging it.
- Google needs to consider the effect this will have on its algorithm – new sites and people with great ideas need to be indexed to provide more balanced content and move information forward, rather than remaining static
I also spoke in my letter about the fact that I believe that sites should be rewarded, not penalized, under the PR system for linking to other sites with great information. To an extent I think they already probably are –
I’ll write more about my thoughts tomorrow in part two of this article –
Ciao,
TheDuck
March 3rd, 2007
I know this one will be fairly elementary to alot of our readers, but there was a question on the Google Webmaster Help forum recently about how to show which pages google has indexed from your site.
The answer isn’t hard – you simply use the site modifier in a google – so, for instance, to see my indexed pages – type site:www.utheguru.com in google search.
There is one ‘gotcha’ however – often google doesn’t show ‘all’ the pages it has indexed – to show them all, you need to go right through to the very last indexed page, and you’ll see a link to show all the pages – the picture (which you can click on to enlarge) below should make it self explanatory.
You’ll note I’ve circled some supplemental pages – these are pages that google, for whatever reason, doesn’t think are really ‘up to scratch’ for inclusion in their main index. You can find out more about supplementals in my other articles here
Cheers,
TheDuck

March 2nd, 2007
Today marks a few major milestones for UtheGuru – our first month of operations, and achieving 1000+ inlinks to our site.
This rapid growth hasn’t happened all by chance, however, it’s been part of a broader plan (which you can read about here).
Ultimately, we are attempting to become a place where users can find niche answers to niche IT related questions. We WANT YOUR INPUT about how best to achieve this – please see the rest of this post for more information. Your comments needed.
Continue Reading February 9th, 2007
I’ve been noticing TOO MANY SITES having what’s called canonicalization problems – where both a www and non-www version of the site is indexed.. this is a real problem if you want to maximize traffic and pagerank – in this article I talk about the problem and how to overcome it.
I’d suggest if you’re serious about site optimization, you read here about the the 301 redirect
Continue Reading February 1st, 2007