There has been a lot of debate regarding high amount of spam search results on Google over the past few weeks, including the content farms and otherwise, so much so that Google even launched a Chrome extension to let users block certain sites from appearing on their personalized search results. But the main issue of content scrapers, aggregators and proxy websites ranking higher than the original sites still exists.

The issue of indexing the proxy sites which simply redirects the URL of the original page through their servers is nothing new. The issue dates back to 2007 and with the introduction of canonical tag in 2009, Google announced that the issue came to an end. But that doesn’t seem to be the case. The proxy hijacking is back and for some mysterious reason, it appears to be prevalent on the proxy sites hosted on Google’s very own app engine, by ignoring the “rel=canonical” tag.

Let me explain with an example. Over the past couple of days, I have been observing a considerable drop in organic traffic for some important keywords. Today, I pushed myself to dig deep and check what’s wrong. I was shocked to see a proxy site – suzetteklierocks.appspot.com getting ranked at the place where my blog was ranking previously. To my astonishment, I couldn’t find my own blog anywhere in the first couple of pages in Google search.


proxy-hijack

In the screenshot above, the first result is of the proxy site which has hijacked my page. This is the URL listed on Googlehttp://suzetteklierocks.appspot.com/techpp.com/2010/07/05/dropbox-alternatives-sync-files-online/. As you expect, this is just one of the many instances where the proxy site is ranking better than mine. The reason for not observing the huge change in traffic/revenue was that the proxy sites simply redirect to the original page, but App engine sites have a restriction on bandwidth usage per day and hence stops working towards the end of the day when they breach the limit.

A quick search for site:suzetteklierocks.appspot.com reveals more than 200,000 indexed pages for the proxy site. I do have the canonical tags on the page, which somehow is getting ignored by Google while indexing the duplicate proxy page. I am no SEO expert, and I am not sure if I am missing something here. Google’s web-spam guru, Matt Cutts has promised to have a look at the issue, and I am sure he resolves this at the earliest.

Another request to Matt and Google; There doesn’t seem to be a proper way to report sites hosted on Google app engine. Please make it easier to report them. Also, if possible, make the app engine based proxy URLs “noindex” by default.

Update: I cannot explicitly block the proxy site till I get back the SERP, since I’ll lose out the redirected traffic as well.

Update 2: I realize that I have made a mistake in the title. The right one is – “Proxy Sites REPLACING the Originals” *SIGH*

Update 3: Barry Schwartz of SEL had written about this yesterday.

Update 4: Matt Cutts was quick to act upon this as he promised. Thanks Matt!

Update 5: I’m already seeing some changes being rolled on, but the process is not complete yet.

Subscribe via RSS or Email:
Raju is the founder-editor of Technically Personal. A proud geek and an Internet freak, who is also a social networking enthusiast. You can follow him on Facebook and on Twitter. Mail Raju PP.
Post comment as twitter logo facebook logo
Sort: Newest | Oldest

my traffic went a lot down. But, i still believe in having the real time quality content would re-built that.

This is very bad Google's own appengine is not free from such attcks. Googles search algo is getting weaker

ATUL

big sites are also affected by this content farm action same happened with me. But looks like warez and autoblogs are running smoothly. I dont know why this is happening.

We've recently seen the same user-agent visit us, "AppEngine-Google; (+http://code.google.com/appengine; appid: suzetteklierocks)
". My question is how do we block just the 'suzetteklierocks' value in robots.txt?

I guess the problem has been solved now. I saw a tweet from Matt. Hope they also gets successful in curbing auto blogs or content scrapers.

This is very bad Google's own appengine is not free from such attcks. Googles search algo is getting weaker

ATUL

The same is happening here...The website Called Richfools(dot)com is just copying the articles by RSS Feed. I have tried contacting him but no reply...

Some the articles that i write are below the ones that he republishes (Copies from RSS)

Though its related, the one I have reported is much different. The offender is not another blog, it is just a proxy site, hosted on Google's servers, getting indexed by Google and REPLACING the originals in the SERPs. You don't even see the original anywhere in the top 10 pages of Google

This is so bad ... I am really feeling bad for all of us ... hope it is corrected soon ...

Custom Search
Copyright 2012 Technically Personal!