Thursday, June 11, 2009

Today's Robot.txt file

If you're trying to prevent most automated traffic except major search engines on a particular web site heres a robots.txt file. Note that not all these are actually bots and some things like Python, Perl and Java agents running around the Internet and used by hackers don't obey or even check robots.txt so you'll have to use other ways to monitor and handle this traffic on your web site.


User-Agent: FollowSiteBot
Disallow: /

User-Agent: nambu
Disallow: /

User-Agent: uberbot
Disallow: /

User-Agent: KaloogaBot
Disallow: /

User-Agent: Yeti
Disallow: /

User-Agent: Servage
Disallow: /

User-Agent: ServageRobot
Disallow: /

User-Agent: Trident
Disallow: /

User-Agent: uw_cse_xwc
Disallow: /

User-Agent: ZupeeCrawler
Disallow: /

User-Agent: Webspider
Disallow: /

User-Agent: LinkAider
Disallow: /

User-Agent: Axonize-bot
Disallow: /

User-Agent: ips-agent
Disallow: /

User-Agent: RiceComputerArchitecture
Disallow: /

User-Agent: AISearchBot
Disallow: /

User-Agent: flatlandbot
Disallow: /

User-Agent: FairShare
Disallow: /

User-Agent: SapphireWebCrawler
Disallow: /

User-Agent: LocalBot
Disallow: /

User-Agent: LaBot
Disallow: /

User-Agent: Butterfly
Disallow: /

User-Agent: robotgenius
Disallow: /

User-Agent: WillyBot
Disallow: /

User-Agent: GingerCrawler
Disallow: /

User-Agent:larbin
Disallow: /

User-Agent: ru_com_viewer
Disallow: /

User-Agent:Yandex
Disallow: /

User-Agent:yandex
Disallow: /

User-Agent:msnbot-media
Disallow: /

Sitemap: http://www.rainierrhododendrons.com/sitemap.xml

User-Agent:del.icio.us
Disallow: /

User-Agent:Sika
Disallow: /

User-Agent:whois.de
Disallow: /

User-Agent:Isidorus
Disallow: /

User-Agent:Yanga
Disallow: /

User-Agent:MSR-ISRCCrawler
Disallow: /

User-Agent:Snappybot
Disallow: /

User-Agent:Gaisbot
Disallow: /

User-Agent:SapphireWebCrawler
Disallow: /

User-Agent:BobCrawl
Disallow: /

User-Agent:OpenX
Disallow: /

User-Agent:Axonize-bot
Disallow: /

User-Agent:KaloogaBot
Disallow: /

User-Agent:kalooga
Disallow: /

User-Agent:OnTownsBot
Disallow: /

User-Agent:Cazoodle-Bot
Disallow: /

User-Agent: REAP-Crawler
Disallow: /

User-Agent: DotBot
Disallow: /

User-Agent: Gigabot
Disallow: /

User-Agent: NetcraftSurveyAgent
Disallow: /

User-Agent: SurveyBot
Disallow: /

User-Agent: DBLBot
Disallow: /

User-Agent: AISearchBot
Disallow: /

User-Agent: Charlotte
Disallow: /

User-agent: IntegraTelecom
Disallow: /

User-agent: PSIBots
Disallow: /

User-agent:Websense
Disallow: /

User-agent:HornySexSearch
Disallow: /

User-agent: SnapPreviewBot
Disallow: /

User-agent: Snoopy
Disallow: /

User-agent: libwww-perl
Disallow: /

User-agent: nexen
Disallow: /

User-agent: phpversion
Disallow: /

User-agent: attributor
Disallow: /

User-agent: Java
Disallow: /

User-agent: bsalsa
Disallow: /

User-agent: whoisde.de
Disallow: /

User-agent: envolk
Disallow: /

User-agent: QEAVis
Disallow: /

User-agent: NextGenSearchBot
Disallow: /

User-agent: boitho.com
Disallow: /

User-agent: boitho
Disallow: /

User-agent: Wget
Disallow: /

User-agent: Rankivabot
Disallow: /

User-agent: T-Online Browser
Disallow: /

User-agent: webalta
Disallow: /

User-agent: page_prefetcher
Disallow: /

User-agent: cyberpatrol
Disallow: /

User-agent: sitecat
Disallow: /

User-agent: cyberpatrolcrawler
Disallow: /

User-agent: internetseer
Disallow: /

User-agent: searchme
Disallow: /

User-agent: dcbot
Disallow: /

User-agent: scoutjet
Disallow: /

User-agent: sphsearch
Disallow: /

User-agent: exabot
Disallow: /

User-agent: NaverBot
Disallow: /

User-agent: naverbot
Disallow: /

User-agent: twiceler
Disallow: /

User-agent: zermelo
Disallow: /

User-agent: Moozilla
Disallow: /

User-agent: kyluka
Disallow: /

User-agent: scoutjet
Disallow: /

User-agent: baiduspider
Disallow: /

User-agent: MLBot
Disallow: /

User-agent: worio
Disallow: /

User-agent: turnitinbot
Disallow: /

User-agent: exooba
Disallow: /

User-agent: ViolaBot
Disallow: /

User-agent: speedyspider
Disallow: /

User-agent: becomebot
Disallow: /

# disallow Googlebot-Image
User-agent: Googlebot-Image
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: QEAVis
Disallow: /

User-agent: VWBot
Disallow: /

User-agent: ShopWiki
Disallow: /

User-agent: SnapPreviewBot
Disallow: /

User-agent: panscient.com
Disallow: /

User-agent: panscient
Disallow: /
User-agent: sproose
Disallow: /

User-agent: voyager
Disallow: /

User-agent: grub
Disallow: /

User-agent: libwww-perl
Disallow: /

User-agent: OmniExplorer_Bot
Disallow: /

User-agent: Twiceler
Disallow: /

User-agent: WebDataCentreBot
Disallow: /

User-agent: OOZBOT
Disallow: /

User-agent: setooz
Disallow: /

User-agent: bsalsa
Disallow: /

User-agent: perl
Disallow: /

User-agent: botmobi
Disallow: /

User-agent: NextGenSearchBot
Disallow: /

User-agent: ASPSimply
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: Moozilla
Disallow: /

User-agent: voilabot
Disallow: /

User-agent: WGet
Disallow: /

User-agent: obot
Disallow: /

User-agent: Java
Disallow: /

User-agent: libcurl-agent
Disallow: /

User-agent: phpversion
Disallow: /

User-agent: therarestparser
Disallow: /

User-agent: Jakarta Commons-HttpClient
Disallow: /