A Plague of De-indexation and Supplementals Google Now Has..

January 25th, 2007

Hmm… recently their seems to have been a rash of people on the google webmaster forums asking for assistance on the same topic – WHY HAS MY SITE DISAPPEARED FROM THE GOOGLE INDEX, OR WHY HAS MY SITE BEEN DELEGATED TO SUPPLEMENTAL HELL –

I, along with others on the forum, have been trying to lend assistance where I can, but this really does seem like an abnormal event – sites with rich, original content are being binned for no apparent reason.

I wrote a few comments on Matt Cutt’s Blog recently about the problem – I’ll quote myself here to save you looking for them –

Hi Matt,

The never ending fight against spam – a noble fight. I wonder whether this is the reason for the apparent massive increase in sites (especially the newer ones) being binned to supplemental hell? There has been alot of talk on the google webmaster forums about good, previously listed content ending up in supplementals within the last month.

The basic trend seems to be that these sites might be ending up in supplemental because their CMS creates duplicates (ala wordpress archives for example), which in turn make google think they’re being spammed, which in turn leads to Supplementals.

Your previous missives about supplementals spoke of the fact that they aren’t necessarily a bad thing, and can be built up by white hat seo techniques like building backlinks. I’d have to say that, in a lot of cases (one example would be a blog) the backlinks actually only start to build once the content is searchable, so those of us that have rapidly evolving sites designed to answer ‘questions of the moment’ never ever get listed, even though they provide great answers and unique information.

I for one would love some clarification from Google about Supps. You can see I write about it on my site, http://www.utheguru.com/optimizing-wordpress-to-stop-duplicate-content-penalty/. My basic feeling is that the big, older, more well established domains are getting bigger and the smaller ones are getting smaller because they never get a chance to get links because they are supp’d

Could be worthy of a future article – are we getting supp’d because G thinks we are spammers, or are we getting Supp’d because of some other reason? It’s a massive problem that’s diluting the value of Google, for research purposes imho.

All the best,

D

Spamhound, I agree and I disagree – I agree that there is a lot of crap written on blogs, but I disagree that all of it is crap – blogs are just a different, generally more ‘folksy’ way of presenting information. I find a heck of alot of useful information on blogs.

Your second point, about not giving any value to links from Blogs is another thing about the (apparent) algo of Google that I find flawed (apart from the growing incidence of supp’s). That is, that backlinks are considered more important than forward links.

In my daily life as a (currently bloodywell studying again) computer and communications engineer with a fairly tangential degree (agricultural science) as well, I’ve learnt a fair bit about gathering information. Obviously, in science in general, the tradition has been that knowledge is built up through lit review and original research. The original research is then recorded in peer reviewed papers. Any good scientist knows that a good paper is one that references to as many other papers about the topic as possible. This means that that paper can be a first stop for anyone wanting to know what work has been done before in the area of research.

If you take this model, and apply it to the google algo, the lit review is the google search, the peer review is the comments, the paper is the blog, and the references are the forward links.

Google is about information, and building knowledge. For hundreds and hundreds of years, humanity has built knowledge using the above peer review system. It works.

So, what am I getting at – 3 things –

1. Blogs are pretty damn close to the peer review system, closer than static pages, IMHO.
2. By concentrating on backlinks rather than forward links, Google risks penalizing new knowledge rather than encouraging it.
3. Google needs to think about incorporating forward links as well as backlinks in its algo, to provide more balanced content and direct people to places where all knowledge is at their fingertips.

Sorry – to tie it all up, so that what I have said ties up and doesn’t seem tangential to the topic – the fight against spam is one of those ‘tipping-points’ that Matt’s spoken about before – many of the strategies adopted by Google to fight the spam problem, if you look at them objectively, seem to me to come dangerously close to also fighting this kind of peer review – and this includes ‘no-follow’.

I think we need to seriously weigh up the +ve’s and -ve’s of the spam fighting techniques used lest we damage the web in the process.

Cheers,

Doc

I daresay that Matt has read what i had to say, and will probably blog about it soon, but in the meantime, I thought I’d open my blog to comments from other webmasters about the problem. Over the next few days, I hope to get an idea of who in particular this problem is affecting, and whether, as I suspect, it is due to a change in algorithm in the most recent PR upgrade / data push.

To give an idea of the extent of the problem, I’m now going to head over to the google webmasters forum and grab a random selection of quotes from the first page..

Gardening wrote:-

Subject: Why must I suffer from a penalty?
Reply | Forward | Print | Individual message | Report this message | Find messages by this author
I am desperate because I lost most of my good rankings for
http://www.hausgarten.net. Only a few very old subpages of this domain
have still today good ranks. On Saturday the site came back but just
for three days.

I really worry about that because I do not know which mistake I did.
Can you find mistakes or reasons for the penalty? Penalty started on
22nd December.
If a Googler reads that, please answer.

Many thanks!
Marcel

Alexander wrote:-

Can anyone help me please, I have had my affiliated website:
www.profit-k.com indexed by Google for about 3 months. After 12
December it has stopped indexing all of my pages.

When I log into webmaster tools and look at the indexing stats, Google
says that there are no errors to report, no broken links, no bad URL,
no webmaster violations, site map is verified and OK, no spammy content
– just absolutely nothing wrong about the site.

The message is that ‘ No pages are indexed and that indexing takes
time”

Te site is picking up, it’s well-linked with other relevant sites, but
what is the problem?

Why has it fall out of index?

Alexander

Weicool writes:- 

Since the beginning of 2007, my hits have dropped by about half. I have
just discovered today that all of the most popular keywords that people
used to type in to Google to get to my site no longer lists my site.
The weird thing is, pretty much ANY keyword I type does not work.
however, my site still has a PR of 5 and all the 2,000+ pages are
indexed. I am very frustrated at the moment and am asking for some
suggestions on what has happened.

According to the Webmaster Tools, my site is still constantly being
indexed and I have tried all my top keywords (all within the top 10
ranks) without finding my link. However, when I search site specific
(using site: pokedream.com), I can find the pages.

What on earth is happening?

Dijihed wrote:- 

My blog, http://djihed.com is being crawled regularly, nearly in its
entirety, by googlebot, but it is not being indexed.

I’m starting to believe the site is on a black list or something. I
have been using the webmaster tools and analytics for months now, I
provided a clean sitemap. Each time i visit webmaster tools it tells me
it is happily crawling the site but no pages are to be seen in index…

Recently I have launched a couple more sites and they have been indexed
in weeks 1 or 2 after launch, much to my surpise.

Is my blog being blacklisted for some reason?

Boehm writes:- 

For years when I type in a search for “property management san diego”
my website www.sdppm.com was always on the top 3 positions. Until last
week. Can anyone advise me on if I am doing something wrong? Is my site
offending Google? Did they change something? My site has not changed
content. The only changes that I made was validating it with w3.org
xhtml validator and made suggested changes.

I have been reading about this “backlink” stuff and was never aware
that Google rates a website based on who and how many other pages link
to you. If that is the case, then why does www.utopiamanagement.com
still have top position using the same search “property management san
diego”. I checked their website links to them and they don’t have very
many links to them?

I have done the sitemap avenue, the robots.txt avenue, I use Google
Tools and it shows that Google’s last crawl was Jan 12th. Don’t they
crawl more often than that? Why isn’t Google crawling my site?

Can’t anybody help me with this. Why does Google put so much weight on
backlinks? I would think that content should be a more truthful factor
as to ranking. I appreciate anyone who can give me answers and can take
a look at my website and advise me

Bbboy98 writes:- 

We were crusing right along, and all of a sudden we dropped off page
3-4 and can not find our main site anywhere. I can find other pages in
the site, not really sure what is up. The site in question is
www.supportdysautonomia.org. Can anyone tell me why I dropped so
much…or off the map or even where I am at now?

Westatl has a similar problem:- 

Around the first of the 2007, I put together a single page web site for
a friend’s real estate company. It is a simple 1-page marketing site.
http://www.thewoodallgroup.com.

Today, after a couple of weeks of seeing the Google ranking slowly
increase, I found the site completely removed from the Google index.
Here are the relevant facts:

1. The site is hosted on a GoDaddy server
2. Using Google Webmaster Tools, I had:
a. Successfully set the preferred domain
b. Successfully validated a site map
c. Successfully validated a robots.txt file
3. 1-2 weeks ago I installed StatCounter on the site
4. Yesterday I installed Google Analytics on the site
5. Today the site is nowhere to be found in the Google index, though
Google Webmaster Tools tells me it *is* in the index.

I’m stumped. After carefully following the Google guidelines for
webmasters, I was shocked to see the site removed. Based on the
information I’ve provided, could someone provide direction as to how
this happened? In addition, what steps should I take to get the site
back in the index?

Thanks for the help,

All these examples are from the first page of the google webmaster help forum… there is truly a plague of people asking the same or similar questions right at the moment. Please post your questions / examples below and we’ll try and get some answers from google.. Also, if you are having the same problem, digg this site or trackback to it from your own site so we start to get some traffic happening..

More examples as they come, below.

Cheers,

D


Mike Maynard writes:-

My site suffered the same fate, funny thing is the only change that
happened was I finally got listed in DMOZ.. Site is
http://www.portalthemes.com and I was ALWAYS in the top 5 for my
keywords.
Now I am barely showing in the serps, and it seems googlebot is just
giving me glancing looks.
I actually had to reactivate my adwords campaign just to keep some
semblance of traffic coming from google. :o(

I just had to include this post by BUD, quoted from the Google Webmaster Forums, in the thread he started entitled “Is the Law of Inverse Quality at Work Here” – he very eloquently summarizes exactly what I’m trying to get at:-

bud View profile
More options Jan 25, 1:20 am

From: bud
Date: Wed, 24 Jan 2007 07:20:35 -0800
Local: Thurs, Jan 25 2007 1:20 am
Subject: Re: Is “The Law of Inverse Quality” at work here?
Reply | Forward | Print | Individual message | Report this message | Find messages by this author
Bug report? I’ve just encountered the second occasion of having a
submitted post fail to show up after several minutes, although reported
as having been accepted. If this re-submittal proves to be a dup, my
apologies.
============================
Hi, John …

> That site has very marginal value in terms of inbound links… That

Oh, yes. It’s not exactly the destination of choice for millions. LOL
But the very marginal nature of that site is exactly what makes it
uniquely valuable for testing the behavior of The Algorithm where it
meets the real world. Which is to say that it’s only at the margin
that one can judge what’s actually happening.

I’ve no doubt that with higher PR assigned to this site, these
conditions would change, and that presently unlisted pages would be
more likely to be listed. Perhaps all of them. Nevertheless, “at the
margin”, the algo plainly does NOT choose pages based on advertised
criteria. Indeed, those (at best) look like random picks. That’s the
news, and adding more “love” would quickly bury the lead.

If unique content is actually being assigned greater weight, then
unique content should be observably favored for inclusion over that
which is not. That’s a pretty cut-and-dried certainty and we should be
able to see that principle in action “at the margins” in the real
world. But this test case reflects the direct opposite result, which
strongly suggests the algo is factually broken in a way which would
generate powerfully negative consequences for smaller sites.

That’s because when truly unique content from an obscure site is
ignored, as our evidence suggests, that site will continue to be
ignored in the future (or even totally dropped into supplemental hell)
because its precious small inventory of unique content may have been
totally ignored. Thus their small clutch of “gold” is transmuted into
mere lead on a routine basis. Indeed, that content, though unique,
could only be turned back into “gold” when stolen and republished on
other more well-linked sites. In that narrow sense, Google would be in
receipt of stolen property and complicit in aiding and abetting the
theft. That may not be a legal argument, but it most certainly is a
moral one.

This condition would also account for why smaller operators must strap
themselves to the oars and churn out miles of marginal “unique content”
in order to no longer be ignored. Google has turned this activity into
a cottage industry of vast proportions, and if the evidence at hand is
confirmed, the churning of the content-mill must ever grind faster
because Google have covertly placed their hand on the scale to benefit
larger sites, whether intentionally or by carelessness, and any
discussion of a “level playing field” is plainly a joke.

> certainly has something to do with it, perhaps the value is fluctuating
> slightly as well, sometimes getting a few pages more or less indexed
> (which would match the supplemental pages). I wonder how it would react
> if you had a stronger link or two pointing at it. Interested in a test?

Yes, but I’d like it to be a test that appears most likely to reveal
reality. It’s my assumption that 90% of pages actually linked to from
outside are those which come to the attention of someone else because
they are either deliberately promoted or were discovered through search
engine inquiries. It seems appropriate, therefore, that links
established to this test site be laid in only on pages which actually
exist at any given time in the Google SERPs. It would certainly be
interesting to see just how many outside links must be accumulated on
non-unique pages before now-obscured “unique content” pages actually
emerge into the SERPs. It’s also obvious that in the real world it
would be much more difficult to obtain links to non-unique pages, and
if pages with unique content aren’t even included in the SERPs, then
only a manhour-intensive campaign to promote these pages directly to
others would be effective. Put another way, the sites with the fewest
available resources would be required to pay the highest cost. I’m
unsure just how this fits with the “level playing field” concept.

It’s important to realize, however, that adding links won’t answer —
and will actually detract from — the most important questions.

1. Why is obviously unique content totally absent in the SERPs for this
site when it has long been available for inclusion?

2. Why are equally weighted pages (all unloved) not treated as a class.
That is, all other things equal, all these pages should be listed —
or none. Otherwise, these choices could only be made by random,
arbitrary (and thus capricious) choice. Ahhhh … but we are
frequently informed that there actually IS another vital criterion:
“uniqueness” of content. But these data reveal no evidence of that.
Quite the contrary.

3. Segue to the next important question: why is non-unique content
included under any (presumed) condition of forced choice which results
in unique content being excluded. That result isn’t merely
counter-intuitive. It’s downright counter-productive.

This may account for why small sites must generate prodigious
quantities of “unique content” (even if meritless) to attract attention
from Googlebot. So ultimately the most important question is why
Google might place such a heavy hand on the scales to create an
artificial barrier to entry for smaller sites — especially in light of
its widely promoted “level playing field” and “don’t be evil”
doctrines, and the further claim (explicitly reiterated by Adam above)
that “unique” content is more valued for inclusion than that which is
not. Really? We are left to puzzle over how these official
representations actually square with immediately observed facts.

So it may be the far more productive inquiry would lie in discovery
whether other small site operators are observing similar results.
Whether or not they do, of course, is somewhat beside the point; when
it comes to software, anything less than 100% reliability of
fundamental output raises questions. Even a single exception
discovered in the real world — particular one so contradictory to
advertised expectations — should therefore set off alarm bells which
only the deaf might be excused from hearing. Even 99.999% reliability
is a bitch to casually explain away.

But if would be interesting to see whether this site is just “unlucky”
(how does one unambiguously define that in software?), or if it
actually reflects reasonable evidence of profoundly crippling
restrictions being levied on “marginal” websites.

None of this should be taken as any implied accusation of evil
intention. The far more likely explanation would be that Google folk
have not resisted the appeal of their own propaganda and therefore
remain blissfully unaware that real world results appear — at least in
this case — to openly contradict common sense.

Digg!

Entry Filed under: SEO Discussions

If you found this page useful, consider linking to it.
Simply copy and paste the code below into your web site (Ctrl+C to copy)
It will look like this: A Plague of De-indexation and Supplementals Google Now Has..

5 Comments Add your own

  • 1. Aaron Pratt  |  January 26th, 2007 at 10:12 am

    Looks pretty basic to me, you have few to no links from other sites pointing to this blog. 🙂

  • 2. DuckMan  |  January 26th, 2007 at 10:29 am

    Yep Aaron – that’s true – this is only one of my sites, and it is a growing one, but it’s not so much this site I’m talking about – mine lost PR and got de-indexed AFTER I started talking about these issues. I’ve also had other sites go straight to Supplemental hell even though they have good content.

    What I’m trying to say is that if your site never gets indexed because everything written goes straight to Supplemental hell, it’s nearly impossible to get links in the first place.

    This low signal to noise ratio with Google is kind of a bit like, as I have said before, a social divide, where the rich and well established get richer, and the poor get poorer.

  • 3. JohnMu  |  January 26th, 2007 at 11:03 am

    Why should it be impossible to get links because of pages in the supplemental index? Most people start with a domain that has no pages indexed at all :-).

    In my opinion you’re wrong with the “social divide” thinking on the web. Yes, sites that are well known (generally high PR) will often know other sites that are well known (again, generally high PR). But on the web anyone can publish his own content and it’s fairly easy to find at least someone who likes it, writes about it and passes it on to some well known site.

    Also – the “social divide” often only comes in to play when rich and poor sites compete for the same keywords. If you try to top Dell for “computer” with a PR1 page, it just won’t work. But if you target your own niche, you can easily find something that a PR1 page could dominate. It’s the same as offline, only on the web you will almost always find people to search for your niche. Say you can get 500 people world-wide interested in your “banality of bananas” niche: if you’re a small company (“poor” site) then those 500 might be enough to grow on. Who cares about ranking for “computers”.

  • 4. Biggus_D  |  February 10th, 2007 at 4:08 pm

    I posted this on Webmasterworld:

    Call it whatever you want but it doesn’t make any sense.

    Last month our site dropped to the end of results, then we did… NOTHING.

    About 4 days later everything was fine again, until today when we dropped again.

    So I have 2 theories:

    – Google thinks that our site sucks, but just sometimes, not all the time

    – Google is just rotten

    What will we do? Nothing but curse G.

  • 5. DuckMan  |  March 12th, 2007 at 8:14 pm

    Hi Biggus – sorry for the long delay in reply 🙂

    Just remember that even though a google search looks pretty seamless to us, on a day to day basis your results can come from any number (I think it’s 21 at last count) of ‘data centers’.

    These are VERY RARELY in complete synchronisation, so prob what you are seeing is a penalty seeping through the datacentres – you’ve just happened to got different data centres on the day, one with the penalty applied, another without.

    Overall though, the news looks not good for you 🙁 Hope it all turned out ok.

    theDuck

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Featured Advertiser

Buy me a beer!

This sure is thirsty work - Here's your chance to buy me a beer :)

Links

Feeds

Posts by Month