Tuesday, November 07, 2006

Site Rippers

There are many reasons why someone may want to "rip" a site but in my opinion, it should be illegal. Things are copyrighted and available online. If you need them offline you should have to request permission from the site owner.

I would guess most people are site ripping for the purpose of reverse engineering a site either to compete with SEO rankings or to try to find a way to hack the site. For instance they can rip the site, run tests against it without hitting your web logs, and then put the program they have developed to do whatever to you web site undected - so it looks like normal traffic in your web logs.

Some site rippers are obvious - like looking in the request headers and finding the user agent. Others are more sly, doing things to cover their tracks and appear as if they were a "normal" user.

What to do about site ripping? Good question. First block the blatant ones. Second, look for traffic anomalies that don't appear to be "normal" users clicking through a site at normal speed. Finally, frequent site changes can help ensure someone has not written a program to walk through your pages and do something malicious. You can "break" their code by finding ways to change your pages frequently.