Can I be penalised by being linked to from a bad neighbourhood?

March 12th, 2007

Scraper Sites - Benevolent or Otherwise?

The ‘Scraper Site’ – Benevolent Friend or Deadly Foe?

One of our regulars, Susie J, left the following question for me this morning –

Can you have a bad link? I checked my inlinks through technorati. A few stood out with questions marks. Here’s a couple of them:
cold remedies
cancer research

These sites do not have any of their own content — just a list of other sties. There is a link to my site to a specific article — but it does not identify my site by name.

Hiya Susie – these are called ‘scraper’ sites.

I’ve got several of them linking back to me too.

There are a number of things you need to consider first before you get too worried about them.

Links from a Bad Neighbourhood – Good or Bad?

Is it bad to have them linking back to you? Well, there are a number of different perspectives on that.

I’d say this right off the bat – Google knows that you can’t help who links to you, so it is impossible to get an official Google ‘penalty’ from such a site linking to you.

If that were possible, I could set up a mean link farm violating every one of Googles webmaster guidelines, and get my competitors struck off Google’s index just by linking to them from my Uber-evil site.

The only exception, of course, is if you link back to the scrapers, in which case it is possible (but unlikely) that Google may consider you’re participating in some link exchange scheme with them and you might get penalised – that’s called linking to a ‘bad neighbourhood’.

Whether or not links from these sites is good or bad from an SEO perspective is a different matter.

What’s their game?

I had the following discussion about this with a few of my SEO friends a few months back, and the general consensus is that those sites are trying to get good search engine positioning by fooling Google into thinking that they are authorities on a particular topic – such as the common cold, in this instance.

Since they link back to me, I don’t get overly perturbed about them, but I have been puzzled about what their game is – because:-

  1. They can’t be after Pagerank – who’s going to link back to a site with no real information? (except people like us, wondering why they are linking to us – but you’ll note I nofollowed the links to them)
  2. They aren’t stealing content – they are acknowledging the source of the content.
  3. They aren’t MFOA (made for adsense) as they (mostly) aren’t displaying ads YET.

So what’s their game? Well Susie, I got your message this morning right after I got back from the gym. I’ve just had a shower (my thinking place) and I believe I may have their strategy sussed.

I reckon they have the same opinion as me – make your outlinks count. Whilst linking out to other sites does, by definition, reduce your pagerank, the effect on your search engine positioning can actually be positive.

This is somewhere along the same lines as ‘it’s not what you know, it’s who you know’ – if you link to a lot of other sites about a topic you start to look like an authority in that topic.

A Devious Black-Hat Scheme..

So search engine positioning is really a combination of relevance and pagerank. So in this case, they are trying to gain relevance in the topic of ‘the common cold’.

I think their strategy might go something like this.

  1. Use adwords to find some lucrative keywords (for instance, I would imagine competition for the keyphrase ‘the common cold’ would be fierce, so it would be lucrative).
  2. Crawl the net looking for articles about ‘the common cold’ – or better still, just do a Google or technorati search for the phrase.
  3. Take small snippets of those articles, and link back to the origin, thus reducing the likelihood of being reported as spammers (after all, everyone likes being linked to).
  4. Cobble together a large number of snippets in such a way that it’s unlikely that the density of information from any one source is suspiciously high on the page (thus avoiding the possibility of triggering a spam flag or duplicate content penalty from Google – and being deindexed or sent to supplemental).
  5. Wait to be crawled by Google.

So now, what do they have – they’ve got a keyword rich page, full of relevant links to topical pages about the common cold.. If I’m an automated robot I’m beginning to figure ‘hey, this looks like an interesting page about the common cold’.

So, they’ve got relevance – all they are now missing for good search engine positioning for the phrase ‘the common cold’ is pagerank (PR). Easy fixed – buy a link from a high pagerank site, or indeed (since these people likely have heaps of sites) throw a link at the page from several of your high pagerank sites, preferably in a related field.

Now Comes the Traffic.

VOILA! You’ve got pagerank and relevance – you suddenly appear to Google to be an authority on the topic of ‘the common cold’.

So hopefully, since you’re now the new authority on the common cold, you’ve got great search engine positioning too – and with positioning comes traffic – lots of traffic.

Sir Richard Branson started his empire by standing out the front of potential locations for his record stores, and physically counting the number of people walking past each site per day. He knew that the more people walking past the better – this is the online equivalent.

Think about it – the two links you sent me are scraper sites about cancer and the common cold. Hands up anyone that doesn’t know someone who’s had a cold this winter? Hands up anyone that doesn’t know of someone affected by cancer?

These keywords weren’t chosen by accident – they both have potentially very high traffic!

Money – lots of money, with Adsense.

Here’s where the brilliance lies – since the site doesn’t really give any answers, the first thing people are going to want to do when they get to the site is go elsewhere – so, what to do with all this traffic?

BRING ON THE ADSENSE. Scatter adsense all over the site and make clicking them is the only real way of escaping. Remove the links back to the original sites (after all, you only had them there to make yourself look legit and stop people from reporting you as spammers) and you’ve successfully run the black-hat gauntlet and probably made a motza on your lucrative keyword.

These schemes are all about maximizing traffic and hence financial reward.

They don’t expect to be around long before they are taken down or detected. This is probably the reason they choose very high traffic keywords – so that they can make hay while the sun is shining.

So is having links from scraper sites bad for me?

So from a net useability perspective, sure, these sites are bad for everyone.

I can remember when I first started surfing the net way back in the mid nineties, you could search for just about anything and it would return a multitude of links to porn – back then all you really had to do to game a search engine was to have heaps of ‘keywords’ on your page (a favourite tactic was to have a huge list of smut related words at the bottom of the page). Luckily Google’s algo has matured and that just doesn’t work anymore.

Plus – as of last year, the majority of web users are using the web for commerce and business, rather than porn, which had dominated legitimate searches for the entire public history of the net (says alot about human nature hey?). So these days, the majority of these schemes are in it for adsense income.

Don’t know about you, but I don’t want to go back to the bad-old days where search results are dominated by useless crud, only this time it’s useless crud with adsense ads rather than asking for your credit card number or offering ‘free previews’. Luckily, so far, Google seems to be keeping pace with the spammers and (whilst their is doubtless still loads of money to be made) things have become a whole lot harder for them.

The verdict – good or bad? From the individual short term perspective of your site, being linked to by these sites probably has no effect (at worst) and perhaps even a small positive effect (at best) on your pagerank.

It’s those that steal your content and don’t link back to you that are bad, as (occasionally) Google deems their version of your content the ‘original’ and cans your site to the supplementals as a plagiarised copy.

What can I do about plagiarised content?

A good way to check for copies of your content online is to use the tool called COPYSCAPE.

If it really irritates you that these sites have copies of your material, there are a number of things you can do about it.

First and foremost, most of these sites use some form of spider to harvest your content.

You can try banning the rogue spiders using robots.txt as described in this article, but that approach only works for the ‘well behaved’ bots – those that obey robots.txt. Furthermore, many of these bots seem to harvest their information directly from technorati, so there is nothing you can do about that.

The second approach is to report the sites as a spam site to Google (you can do that in Google webmaster tools – it’s under the ‘tools’ menu described in this article). This gives Google a ‘heads up’ that the site is a spam site.

As for me personally – now that I’ve realised their game, I’ll be reporting these sites.

This goes against my ‘all publicity is good publicity’ ethos, but what the heck – why should they be making money at the expense of legitimate sites.

All the best,

theDuck

Digg!

Entry Filed under: SEO Discussions

If you found this page useful, consider linking to it.
Simply copy and paste the code below into your web site (Ctrl+C to copy)
It will look like this: Can I be penalised by being linked to from a bad neighbourhood?

8 Comments Add your own

  • 1. Susie J  |  March 13th, 2007 at 2:47 am

    Thanks, Duck. Now, I have created a link from you site to the bad guys through my comment — so please delete them from your site for me please? Thanks.

  • 2. DuckMan  |  March 13th, 2007 at 4:01 pm

    haha thats ok Susie – I’ve no-followed that link, and in any case, I doubt it will be a problem as the link is non-reciprocal.

  • 3. Sebastian  |  April 15th, 2007 at 12:34 am

    Very nice article! Sadly some of these scraping suckers are that good that they get even bookmarked at delicus and wherever. Seeing a single page sometimes doesn’t reveal its character. Fire up site explorer and look at pages and inlinks as well. Usually that’s enough to spot sneaky “appealing webspam”.

  • 4. DuckMan  |  April 15th, 2007 at 2:10 am

    Cheers Sebastian!

    Yep it’s too true that some of these folks are hard to spot at first. I’d say that well over 10% of my inlinks come from them – often they get quite clever about the way they do things..

    That’s an interesting point you make about delicious etc – perhaps they aren’t even really worried about getting caught by Google? Perhaps they are relying on the social networks.. who knows.

    But I don’t think it’s quite time to stop writing original content and start stealing bits and pieces from others. Where is the fun in that? I have faith Google will get ’em eventually.

    Cheers,

    M

  • 5. ivb  |  June 4th, 2007 at 1:43 pm

    Doc you seem to know about SEO but you do not use inline nofollow in your blog anchor links, so I signed this post with Project Honeypot hornet’s nest link!

    I hope you do not mind, but it is for a good cause, to catch duplicate domain name Spwaners and nastu GWG violators!

    Check out the members area on the link forum that I signed with and you will see 8000 mambers of low scum!

    Igor

  • 6. DuckMan  |  June 4th, 2007 at 2:00 pm

    You’re right – I don’t use no-follow – because I believe that if people post really relevant links that add value to this site, those sites deserve PR – it’s a democracy.

    Having said that, I do use various automated spam blocking programs (akismet and bad-behaviour) which weed out about 99.9% of the spam that gets posted on this site.

    I manually remove all other links that I don’t feel add value.

    doc

  • 7. ivb  |  June 4th, 2007 at 3:37 pm

    Doc, that is very good!

    I am glad you are ware of this and on top of it, at the same time letting post real links on your site so you can share your PR, and not be greedy like some!

    And I do not blame you for removing the link on my last post!
    I had to remove it myself from my main Website because Google stopped serving my main Website, in its search results!

    Two days later after I removed the link, Google came back!

    Although I had the Honeypot link on my top page vs a sun page, having bad links on your Website is very bad for your PR!

    Doc, maybe you can find some place to stick the Honeypot link, does not have to be your site, but if you know some site that can suffer a little PR loss! Like PR 7 Website!
    It would be good for the project, will make it stronger, and have Google index and find the violators faster!

    Anyway I signed this post with my business clean Website!

    I hope you will not remove it!

    Or I can sign with my Google support Group, which ever you prefer!

    Igor

  • 8. DuckMan  |  June 4th, 2007 at 3:53 pm

    Yes of course I will remove the link to your site, Igor.

    why?

    It adds nothing useful to the conversation. I’m not here to promote your travel agency.

    I’m also not particularly fond of personal attacks, as per the one you launched here on Craig (Cass-Hacks) whom I consider a friend.

    Bye.

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Featured Advertiser

Buy me a beer!

This sure is thirsty work - Here's your chance to buy me a beer :)

Links

Feeds

Posts by Month