Sunday, May 31, 2009

More Google Imposters

It appears (unless Google has Google bots in the cloud and on various little networks all over the place which I doubt after looking up these particular IP address networks) that there are some Googlebot impersonators out there. It would seem that this is really an impersonator because not only are these requests not coming from Google networks, they seem to only be interested in a few sites, not all the sites on our server. They are particularly interested in travel and real estate web sites. check your logs...if the IP address for Googlebots are not coming from Google networks - decide if you want them to be hitting your site.

Here are the Google impostor IP addresses:

2 2009 5 12.20.32.67 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 151.84.166.1 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
8 2009 5 209.7.26.158 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
25 2009 5 216.177.164.100 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 216.240.151.50 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
152 2009 4 24.44.206.249 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 65.213.90.26 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
87 2009 4 68.238.131.215 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 4 69.116.160.44 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 69.116.160.44 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 69.70.64.94 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 70.101.224.174 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 71.116.210.34 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
79 2009 4 71.43.155.145 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
2 2009 4 74.169.43.199 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 4 74.243.24.159 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
57 2009 4 74.243.25.64 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
10 2009 5 75.146.149.53 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
1 2009 5 76.249.223.78 Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)

Google Impostor - CoreExpress

There's a Google impostor in our logs - unless Google is on the Core Express network.

On 3/4/2009 4:47:01 AM someone at this IP address: 64.69.46.217 was putting GoogleBot in their user agent.

OrgName: CoreExpress
OrgID: COEX
Address: 600 W. 7th Street
Address: Suite 360
City: Los Angeles
StateProv: CA
PostalCode: 90017
Country: US

NetRange: 64.69.32.0 - 64.69.47.255

Saturday, May 30, 2009

owssvr.dll - attempted access

This IP: 71.112.91.22 on the Verizon network was trying to access an IE component on our site that does not exist:

/_vti_bin/owssvr.dll?UL=1&ACT=4&BUILD=6211&STRMVER=4&CAPREQ=0

Friday, May 29, 2009

Robots.txt - More bots than people

Is your web site getting more hits from bots than people? You might want to try this in your robots.txt file. It blocks out a lot of bots we've seen but not major search engines. Alter as desired:

User-Agent: OnTownsBot
Disallow: /

User-Agent: ServageRobot
Disallow: /

User-Agent: uw_cse_xwc
Disallow: /

User-Agent: ZupeeCrawler
Disallow: /

User-Agent: uberbot
Disallow: /

User-Agent: Axonize-bot
Disallow: /

User-Agent: ips-agent
Disallow: /

User-Agent: RiceComputerArchitecture
Disallow: /

User-Agent: AISearchBot
Disallow: /

User-Agent: flatlandbot
Disallow: /

User-Agent: FairShare
Disallow: /

User-Agent: SapphireWebCrawler
Disallow: /

User-Agent: LocalBot
Disallow: /

User-Agent: LaBot
Disallow: /

User-Agent: Butterfly
Disallow: /

User-Agent: robotgenius
Disallow: /

User-Agent: WillyBot
Disallow: /

User-Agent: GingerCrawler
Disallow: /

User-Agent:larbin
Disallow: /

User-Agent: ru_com_viewer
Disallow: /

User-Agent:Yandex
Disallow: /

User-Agent:yandex
Disallow: /

User-Agent:msnbot-media
Disallow: /

Sitemap: http://www.rainierrhododendrons.com/sitemap.xml

User-Agent:del.icio.us
Disallow: /

User-Agent:Sika
Disallow: /

User-Agent:whois.de
Disallow: /

User-Agent:Isidorus
Disallow: /

User-Agent:Yanga
Disallow: /

User-Agent:MSR-ISRCCrawler
Disallow: /

User-Agent:Snappybot
Disallow: /

User-Agent:Gaisbot
Disallow: /

User-Agent:SapphireWebCrawler
Disallow: /

User-Agent:BobCrawl
Disallow: /

User-Agent:OpenX
Disallow: /

User-Agent:Axonize-bot
Disallow: /

User-Agent:KaloogaBot
Disallow: /

User-Agent:kalooga
Disallow: /

User-Agent:OnTownsBot
Disallow: /

User-Agent:Cazoodle-Bot
Disallow: /

User-Agent: REAP-Crawler
Disallow: /

User-Agent: DotBot
Disallow: /

User-Agent: Gigabot
Disallow: /

User-Agent: NetcraftSurveyAgent
Disallow: /

User-Agent: SurveyBot
Disallow: /

User-Agent: DBLBot
Disallow: /

User-Agent: AISearchBot
Disallow: /

User-Agent: Charlotte
Disallow: /

User-agent: IntegraTelecom
Disallow: /

User-agent: PSIBots
Disallow: /

User-agent:Websense
Disallow: /

User-agent:HornySexSearch
Disallow: /

User-agent: SnapPreviewBot
Disallow: /

User-agent: Snoopy
Disallow: /

User-agent: libwww-perl
Disallow: /

User-agent: nexen
Disallow: /

User-agent: phpversion
Disallow: /

User-agent: attributor
Disallow: /

User-agent: Java
Disallow: /

User-agent: bsalsa
Disallow: /

User-agent: whoisde.de
Disallow: /

User-agent: envolk
Disallow: /

User-agent: QEAVis
Disallow: /

User-agent: NextGenSearchBot
Disallow: /

User-agent: boitho.com
Disallow: /

User-agent: boitho
Disallow: /

User-agent: Wget
Disallow: /

User-agent: Rankivabot
Disallow: /

User-agent: T-Online Browser
Disallow: /

User-agent: webalta
Disallow: /

User-agent: page_prefetcher
Disallow: /

User-agent: cyberpatrol
Disallow: /

User-agent: sitecat
Disallow: /

User-agent: cyberpatrolcrawler
Disallow: /

User-agent: internetseer
Disallow: /

User-agent: searchme
Disallow: /

User-agent: dcbot
Disallow: /

User-agent: scoutjet
Disallow: /

User-agent: sphsearch
Disallow: /

User-agent: exabot
Disallow: /

User-agent: NaverBot
Disallow: /

User-agent: naverbot
Disallow: /

User-agent: twiceler
Disallow: /

User-agent: zermelo
Disallow: /

User-agent: Moozilla
Disallow: /

User-agent: kyluka
Disallow: /

User-agent: scoutjet
Disallow: /

User-agent: baiduspider
Disallow: /

User-agent: MLBot
Disallow: /

User-agent: worio
Disallow: /

User-agent: turnitinbot
Disallow: /

User-agent: exooba
Disallow: /

User-agent: ViolaBot
Disallow: /

User-agent: speedyspider
Disallow: /

User-agent: becomebot
Disallow: /

# disallow Googlebot-Image
User-agent: Googlebot-Image
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: QEAVis
Disallow: /

User-agent: VWBot
Disallow: /

User-agent: ShopWiki
Disallow: /

User-agent: SnapPreviewBot
Disallow: /

User-agent: panscient.com
Disallow: /

User-agent: panscient
Disallow: /
User-agent: sproose
Disallow: /

User-agent: voyager
Disallow: /

User-agent: grub
Disallow: /

User-agent: libwww-perl
Disallow: /

User-agent: OmniExplorer_Bot
Disallow: /

User-agent: Twiceler
Disallow: /

User-agent: WebDataCentreBot
Disallow: /

User-agent: OOZBOT
Disallow: /

User-agent: setooz
Disallow: /

User-agent: bsalsa
Disallow: /

User-agent: perl
Disallow: /

User-agent: botmobi
Disallow: /

User-agent: NextGenSearchBot
Disallow: /

User-agent: ASPSimply
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: Moozilla
Disallow: /

User-agent: voilabot
Disallow: /

User-agent: WGet
Disallow: /

User-agent: obot
Disallow: /

User-agent: Java
Disallow: /

User-agent: libcurl-agent
Disallow: /

User-agent: phpversion
Disallow: /

User-agent: therarestparser
Disallow: /

User-agent: Jakarta Commons-HttpClient
Disallow: /

facebookexternalhit/1.0 (+http://www.facebook.com/externalhit_uatext.php)

Facebook external hits are apparently blocking the end user IP that clicked on the link. This makes it a bit difficult to ensure your web site is secure by blocking bad user agents and track who is visiting your web site. I wish they would stop doing this and send the information of the end user that clicked the link instead if that is what this user agent is all about.

InfoUsa - Spam, Junkmail, Telemarketing

Getting hit by a bot from this network which is selling leads...appears they are scraping emails off web sites apparently.

OrgName: InfoUSA
OrgID: INFOUS
Address: 5711 S. 86th Cir
City: Omaha
StateProv: NE
PostalCode: 68127
Country: US

NetRange: 199.125.8.0 - 199.125.14.255

ClosedChannelException

Getting a bunch of closed channel exceptions (http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/ClosedChannelException.html) from this IP in Florida:

32.156.248.113

OrgName: AT&T Global Network Services, LLC
OrgID: ATGS
Address: 3200 Lake Emma Road
City: Lake Mary
StateProv: FL
PostalCode: 32746
Country: US

NetRange: 32.0.0.0 - 32.255.255.255


Not sure why we get a series of these from various IPs and infrequently. The IP mentioned in an earlier post sent about 500 of these which was more than any other IP to date by far. They must have been doing something other normal web visitors don't do. Other IP addresses are sending a few of these randomly - maybe a couple of IPs per day with these errors showing up in the logs.

GingerCrawler

GingerCrawler was hitting out sites today. Apparently something to do with collecting information about the English language to help people with dyslexia. Apparently.

DoCoMo - calling itself a Googlebot

This is interesting - getting hits from user agent DoCoMo which is listing itself as a Googlebot:

DoCoMo/2.0 N905i(c100;TB;W24H16) (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)

I looked it up and is in fact on the Google network:

OrgName: Google Inc.
OrgID: GOGL
Address: 1600 Amphitheatre Parkway
City: Mountain View
StateProv: CA
PostalCode: 94043
Country: US

NetRange: 66.249.64.0 - 66.249.95.255

I looked into this further and apparently Google and Japanese mobile carrier NTT DoCoMo have formed some sort of partnership:

Googe DoCoMo Partnership

twiceler

The Twiceler robot is not obeying robots.txt file.

This bot repeatedly hits our web sites when we have listed in each one in the robots.txt file that they should go stay off our sites.

Additionally when I went to the link below - it says page not found. Bad bot all around!

http://www.cuil.com/twiceler/robot.htm

It's coming from that same old repeat offender network that starts with the number 38 -- which you might want to block if you are experiencing the same problems:

38.99.13.119

OrgName: PSINet, Inc.
OrgID: PSI
Address: 1015 31st St NW
City: Washington
StateProv: DC
PostalCode: 20007
Country: US

ReferralServer: rwhois://rwhois.cogentco.com:4321/

NetRange: 38.0.0.0 - 38.255.255.255

ru_com_viewer larbin2.6.3@unspecified.mail

Seeing a new bot in the logs: ru_com_viewer

I can only guess that this is something that fetches pages for viewing by Russian users based on "ru" but that's only a guess.

This bot is using the larbin web crawler.

We'll see if they obey robots.txt or not.

Coming from Vrtservers network:

OrgName: Vrtservers, Inc
OrgID: VRTSE
Address: 801 S. Grand Ave #1204
City: Los Angeles
StateProv: CA
PostalCode: 90017
Country: US

ReferralServer: rwhois://rwhois.vrtservers.net:4321

NetRange: 64.56.64.0 - 64.56.79.255

Thursday, May 28, 2009

Intega Telecom - More strange traffic

458 hits on the same image gif today by this IP address: 68.178.4.202 which is in the Integra Telecom network in Oregon:

OrgName: Integra Telecom, Inc.
OrgID: ITCM
Address: 1201 NE Lloyd
Address: Suite 500
City: Portland
StateProv: OR
PostalCode: 97232
Country: US

ReferralServer: rwhois://whois.integraonline.com:43

NetRange: 68.178.0.0 - 68.178.127.255

Not sure why we get so much strange traffic from the Integra Telecom network.

Monday, May 25, 2009

Yahoo doesn't support end to end TLS

Yahoo servers do not support end to end TLS. I tried to enforce this and various users complained that they could not email me due to this problem. I could email them so apparently Yahoo supports inbound but not outbound TLS.

Google Adsense - not working?

Google Adsense is doing something very strange. On a home page advertising in a specific industry of local businesses it is displaying ads for : IE8, Verisign and Splunk? What? This is a non-tech site and has no words related to such topics. Why is Google Adsense showing these unrelated links

[Update: Since learned that Google is very tricky -- and brilliant -- tracking information about users to show them relevant ads regardless of the web site content]

US Marshalls, FBI infected by virus

Virus affects US Marshalls, FBI

Posted from Google news.

Friday, May 22, 2009

Google Ad Competitor Filtering Not Working

I block certain domains in Google's competitor ad filtering and it doesn't work. I go back to the site and the ads are still there from competitors and unrelated sites I have blocked.

Additionally Google should allow blocking by keywords in ads and landing pages rather than URLs, otherwise some companies just keep changing their domain names and putting up new URLs.

Google should provide a way to do run ads in such a way that the competition cannot basically steal traffic from a site by sending a bunch of fake hits and increasing the cost of ads and using up the allotted hits per day with traffic that will never lead to a positive conversion rate.

Tuesday, May 19, 2009

Postini Blocking People that Aren't Blocked

Having Postini problems again. Postini is blocking people from IP addresses that I have not blocked in my system. Gathering the information to prove this is a royal pain - not to mention that I don't know how many people I don't know personally who are trying to email me this way are getting blocked.

Does anyone out there besides me consider the fact that if people you don't know who are potential customers are emailing you and they get directed somewhere else and think it is you - that you may be losing business and never even know it?

Everyone has been pooh pooh-ing my email and security concerns for quite a while. I'd say a majority of them have turned out to be true. I'm trying to figure out a way to better validate email - where it is coming from and where it is going - and that emails are getting to the people I think they are - and only the people I want to read them.

I am still not convinced there is any such secure email solution in existence at this time.

Google AdSense - Specifying Advertisers Didn't Work

I just used Google AdSense and tried entering specifications to only allow ads from specific advertisers. It didn't work. I kept getting ads from everyone.

Thursday, May 14, 2009

Downloaded Software - Permissions Wish List

It would be very cool if Microsoft and other operating systems would allow you to configure permissions for each executable and what they can access in the system (if there is not a way to do this already).

For instance, I just downloaded some code from some guy I never met that had something I needed posted in a newsgroup. The guy's been on the newsgroup for a while but how do I know he's legit? He sent me exe's not source code so who knows what's in that - but I really need this little functionality because it will save me a ton of time.

So anyway, I'm sitting here debating if I should use this thing or not and that's when I was thinking it would be really cool if I could just right click on this little exe and set up permissions for it - whether or not it can access the Internet inbound or outbound - specify which IPs and ports it can access for some internal testing I need to do with it (is related to TCP/IP and sockets). Additionally I would like to be able to specify which user accounts it can run under and what directories and files it can access - and whether it can read/write/modify/delete those files or in those directories.

It would also be nice if I could set my default permissions for new files and executables and then alert me if some exe or program of some sort is trying to access something for which it doesn't have permission and let me decide if I want to give it permission or not.

Saturday, May 09, 2009

Email Providers - Half a TLS Solution

Recently a person I had problems emailing due to issues with Postini told me that they were responding to my messages - but I am not getting them. I had looked up this person's mail server information and it looks as though that mail server supports TLS. However apparently that is only TLS inbound, and not outbound.

What is the point of mail services that only provide one way TLS encryption? That's only half a solution.

I believe the mail provider in this case is BlueHost - an ISP which I believe is out of Denver - however there are so many other webmail and Exchange and other mail solutions that do not provide two way TLS encrytpion it is almost impossible to find a complete end to end solution.

In fact, if you try to find a mail provide that does provide two way TLS enforcement that works with Exchange and allows you to have your own Postini account...good luck.

On top of that even if you find TLS enforcement both ways, I've been following the email list from the ITEF on TLS and apparently depending on how each aspect of TLS is set up and implemented may affect whether or the particular implementation of TLS is actually very secure. It's like a chain - and a chain is only as strong as it's weakest link.

I'm not a TLS expert but I can figure out enough from reading what's going on that there may be one small piece of the TLS implementation that basically undermines the whole set up.

Friday, May 01, 2009

Firefox 3.0.10 - listening for INCOMING requests?

Just installed 3.0.10

Norton reports this version of Firefox is listening for INCOMING requests? Why?

When I block this Firefox dooesn't work.

Http is to go out, get info and pull it down, not listen for and allow other computers to connect to my machine. What is going on here?