2005- 08-20

Woohoo! I caught a Googlebot!!!

August 20, 2005

66.249.66.51 - - [20/Aug/2005:14:59:05 +0100] "GET /skills.php?rnk=1&lvl=1&pri=15&sec=12&learn=2&lvl_a=i HTTP/1.1" 200 2578 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.51 - - [20/Aug/2005:14:59:05 +0100] "GET /skills.php?rnk=7&lvl=4&pri=13&sec=11&learn=0&learn_a=i HTTP/1.1" 200 2597 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.51 - - [20/Aug/2005:14:59:06 +0100] "GET /skills.php?rnk=1&lvl=5&pri=6&sec=17&learn=2&rnk_a=d HTTP/1.1" 200 2586 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.51 - - [20/Aug/2005:14:59:06 +0100] "GET /skills.php?rnk=1&lvl=4&pri=18&sec=10&learn=2&lvl_a=i HTTP/1.1" 200 2596 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

When Eve first came out it had a built in web browser - the "IGB". The first version of the IGB didn't support forms, but by dyanmicaly creating pages and using links as buttons you could do interactive websites.

I worked out how to calculate how long it takes to train a skill with different attributes and built a web based skill calculator that worked with the IGB. Using the links as buttons hack worked, but had the side effect of making an infinate url space. The page was was useful while I played eve but I'd pretty much abandoned the site since I stopped playing.

Today I was doing some packet dumps as part of an effort to start dealing with the forwarding problem for my brothers email, and found the logs where full of googlebot hits on /skills.php.

It started with about a hit every 3 secs or so, and just before it stoped was getting up to 2 hits a second.

I dug out the robots exclusion protocol and added the meta robots: nofollow tag to the page in the hope that the bot would give up, but it carried on, (I expect it interpreted the tag as "Don't follow the links in the page you are parsing", rather than "Don't follow any links in any version of this page that you have ever parsed". So although it didn't stop the bot there and then, it would of stoped it eventually).

I also set up a /robots.txt with:

User-agent: Googlebot
Disallow: /skills.php

But google dosn't grab /robots.txt more that once a day.

Looking back through my log file the eairliest access to a /skills.php url with arguments to it from googlebot was:

66.249.66.16 - - [17/Jun/2005:01:11:23 +0100] "GET /skills.php?rnk=1&lvl=3&pri=8&sec=10&learn=0&rnk_a=d HTTP/1.1" 200 2535 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Update:

It found the new robots.txt:

66.249.66.51 - - [20/Aug/2005:15:00:32 +0100] "GET /skills.php?rnk=4&lvl=4&pri=10&sec=17&learn=0&rnk_a=i HTTP/1.1" 200 2597 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.51 - - [20/Aug/2005:15:00:33 +0100] "GET /skills.php?rnk=1&lvl=5&pri=6&sec=19&learn=0&pri_a=d HTTP/1.1" 200 2585 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.66.51 - - [20/Aug/2005:15:00:33 +0100] "GET /robots.txt HTTP/1.1" 200 126 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

and that was the end of that...