There has been a lot of debate regarding high amount of spam search results on Google over the past few weeks, including the content farms and otherwise, so much so that Google even launched a Chrome extension to let users block certain sites from appearing on their personalized search results. But the main issue of content scrapers, aggregators and proxy websites ranking higher than the original sites still exists.
The issue of indexing the proxy sites which simply redirects the URL of the original page through their servers is nothing new. The issue dates back to 2007 and with the introduction of canonical tag in 2009, Google announced that the issue came to an end. But that doesn’t seem to be the case. The proxy hijacking is back and for some mysterious reason, it appears to be prevalent on the proxy sites hosted on Google’s very own app engine, by ignoring the “rel=canonical” tag.
Let me explain with an example. Over the past couple of days, I have been observing a considerable drop in organic traffic for some important keywords. Today, I pushed myself to dig deep and check what’s wrong. I was shocked to see a proxy site – suzetteklierocks.appspot.com getting ranked at the place where my blog was ranking previously. To my astonishment, I couldn’t find my own blog anywhere in the first couple of pages in Google search.
In the screenshot above, the first result is of the proxy site which has hijacked my page. This is the URL listed on Google – http://suzetteklierocks.appspot.com/techpp.com/2010/07/05/dropbox-alternatives-sync-files-online/. As you expect, this is just one of the many instances where the proxy site is ranking better than mine. The reason for not observing the huge change in traffic/revenue was that the proxy sites simply redirect to the original page, but App engine sites have a restriction on bandwidth usage per day and hence stops working towards the end of the day when they breach the limit.
A quick search for site:suzetteklierocks.appspot.com reveals more than 200,000 indexed pages for the proxy site. I do have the canonical tags on the page, which somehow is getting ignored by Google while indexing the duplicate proxy page. I am no SEO expert, and I am not sure if I am missing something here. Google’s web-spam guru, Matt Cutts has promised to have a look at the issue, and I am sure he resolves this at the earliest.
Another request to Matt and Google; There doesn’t seem to be a proper way to report sites hosted on Google app engine. Please make it easier to report them. Also, if possible, make the app engine based proxy URLs “noindex” by default.
Update: I cannot explicitly block the proxy site till I get back the SERP, since I’ll lose out the redirected traffic as well.
Update 2: I realize that I have made a mistake in the title. The right one is – “Proxy Sites REPLACING the Originals” *SIGH*
Update 3: Barry Schwartz of SEL had written about this yesterday.
Update 4: Matt Cutts was quick to act upon this as he promised. Thanks Matt!
Update 5: I’m already seeing some changes being rolled on, but the process is not complete yet.