Contents of this page
Search engine robots and others
Link Checkers, Link monitors and bookmark managers
FTP clients and download managers
Offline browsers and other agents
Other miscellaneous agents
Sites that regularly visit
Awards for this page
Search engines and other sites send robots to read and index your pages. This page
reverses that process and indexes the robots. This information has been
gleaned by looking at the server logs for www.jafsoft.com. Whenever a page is
read from a web site, the log file records a number of details including the time, the IP address and usually the referrer page and the user agent. You can see this in our analysis of a server log sample.
Unlike many pages that list web robots, this page actually tries to go visit the robots themselves. Where possible links are provided to the robots home pages, and descriptions are given of what they're up to. This page is updated regularly as more information is found (the last update was on 12-May-2001).
Well behaved robots will identify themselves, often supplying web or email addresses you can contact. In any case, the pattern of pages being read and the IP addresses being used soon sorts the men from the robots.
Good robots will read robots.txt to see what your site policy is, but there are other ways of spotting robots. In addition to the search engine robots, other "user agents" will visit your site, e.g. to validate links to your site from other people's pages. Often these will just access the HEAD of the file, rather than doing a GET on the whole file.
You can also visit our page describing the engines in some detail.
This page is regularly converted from this text file by the author's own text to HTML converter AscToHTM. The last update was on 12-May-2001. This software is available as shareware (cost $40)
The following table lists the search engines that spider the web, the IP addresses that they use, and the robot names they send out to visit your site. Version numbers are usually included in the robot names, but are omitted here except where it implies a visit from a different IP address or (as in inktomi) a different search engine.
Often multiple IP addresses are used, in which case we just give a flavour of the names or numbers. Inktomi is a company that offers search engine technology and is used by a number of sites (e.g. www.snap.com and www.hotbot.com)
Wherever <nn> appears this indicates a number of different digits may be used.
|Home page/search engine
||Musical instrumentss are used
in the name such as viola.excite.com
(and the rest of the band)
more recently first names are being
used like philip.excite.com
|(see also www.powerinter.net
(Genealogical Search Engine)
but it won't let us in :-(
||various (fakes agent on each access)
||speedfind ramBot xtreme
||Surfnomore Spider v1.1
||UK Searcher Spider
Link checkers and bookmark managers are run by people wanting to keep their pages and bookmarks up to date. Being visited by a link checker is good news as it means that someone has linked to you, and cares that you're still alive. Link monitors regularly check your pages for changes, usually because someone has selected your page as "one to watch".
(pause for warm glow :-)
If you have access to the server log, check the referrer page to try and get the URL from which you are linked. Sometimes these URLs are inside password protected parts of sites, so you won't be able to view the page.
If you build up a list of sites that link to you, these are the guys you should tell when you move (moral - never move)
It's also quite common for the Link checker to give no indication of which URL it's coming from. Some link checkers always come from the same IP address, more usually they come from the client's site. It depends on whether the site owner has purchased a copy of the link checking software, or signed up to some centralized link checking service. If you get the client's IP address you can always try visiting that if they blank the referrer URL field, and surfing their site.
Some of these tools appear to imply they're extracting email addresses (e.g. emailSiphon). As such they're probably unwelcome visitors since these addresses are probably being collected for spammers. You can read more about this at www.csc.ncsu.edu/~brabec/antispam.html
A page listing various link checkers (and other tools) can be found at www.softwareqatest.com/qatweb1.html#LINK
||Link Checker home page
||http://checkget.udm.net/ (also shown as referrer page)
(only if you have software listed at that site)
||<email collector> We don't list information
like this on this site.
(checks links in the dmoz directory)
|WebTrends Link Analyzer
|Xenu's Link Sleuth
Validators check your web pages for HTML correctness and standards compliance. Since other people are unlikely to send a validator to your site, you don't usually see much of this. Consequently the "list" below is restricted to the on-line validators I've used myself.
However if you choose to validate your own site, then the validation attempts will appear in your logs. The following list is thus limited to the on-line validator I use (and recommend) and a URL submission service that I use.
||Validator home page
||www.selfpromotion.com. This is
used as part of a link submission
If you offer files for download, then you'll start to be visited by various FTP clients. Clients like Go!Zilla and GetRight are smart in that they can resume downloads that have been interrupted. This relies on your web server supporting the necessary protocol, but that's fairly standard these days.
If your download files are over 1Mb in size (or if your server is slow), you'll often see the same IP address make multiple partial downloads of your file (look at the file size). In the case of Clients line Go!Zilla and GetRight if these add up to the right number of bytes, then chances are the download succeeded.
||FTP Client home page
(Chinese download utility)
|JetCar (or FlashGet)
Most browsers identify themselves with a string that begins "Mozilla...". I've chosen not to document those (as yet). Here are a few of the rarer browser identifiers that I've seen.
(DOS-compatible browser. Linux version under development)
||http://www.hisoft.co.uk/ (search for IBrowse)
(Linux KDE browser)
(Cross-platform text based browser)
(Cross-platform, small, efficient and standards lead browser)
||http://sunsite.auc.dk/qweb/ (Linux browser)
(see also http://browswerwatch.internet.com/news/story/qweb8.html)
(OpenVMS only version of Mosaic, a pre-Netscape browser)
(Macintosh text-only browser)
||Agent home page
||www.answerchase.com/advan.html a personal
||Possibly Adobe Acrobat or Reader or Adobe Acrobat Reader
used with MSIE (I have been unable to confirm this)
(Japanese software from the "Eir Project")
|Excalibur Internet Spider
A German pay-to-surf client
A utility written by PC Magazine to fetch icons files
(favicon.ico) for your IE favorites
Another "favorites" tidy-up utility
(Trivia note: Giskard is probably named after the Isaac Asimov robot)
||www.isilo.com/screensh.htm (for palm pilot)
(Link management cgi script)
(Internet search agent)
(notifies webmasters when your pages have moved)
|NEC Research Agent
Research "Inquirus" (meta?) search engine
(Data mining bot on IP 126.96.36.199)
||Offline browser www.tympani.com/store/NAProDownload.html
||www.phoaks.com/index.html. An index or web resources
listed in UseNet. See also
crawls from weasel.poly.edu and grampus.poly.edu
A web filter that is "ShonenWare", i.e. you should
purchase a Shonen Knife CD if you use it. Shonen Knife
are a great Japanese band, much loved by the late Kurt
Cobain. Sometimes this sets the referrer page to the
band's home page at http://www.mmjp.or.jp/knife/ (or maybe
the users just happen to go there themselves).
(IE add-on that organizes your browsing)
(on holiday last time I looked)
Another coming soon search tool. Crawls from IP address
188.8.131.52. Hawk holdings is the holding company. The
venture is between qwest.net and Baxter Investments
A broswer plug-in (initially IE only) that searches for
related pages and categories. In my experience this
seems to entail accessing a favicon.ico file on a daily
basis (presumably to refresh the "favorites" list)
Search engine technology, as used at sites such as
www.maplesearch.com. Now called mnoGoSearch.
A commercial spidering product.
||Form collage from randomly select web images
www.jwz.org/webcollage/ pet project of one of
the authors of Netscape. Seems to come from
differing IP nodes.
||??? (quarterdeck search engine software)
Chinese search project
Convert websites into help files.
|Zeus 1500 Webster Pro
Zeus 2500 Webster Pro
Zeus 4300 Webster Pro
These agents are ones that we've seen, but been unable to get information for, or which are slightly unusual in origin. If you have any additional information on any of these, feel free to send it to email@example.com
||Seems to be from a yet-to-be launched site
www.girafa.com. Spiders using IP 184.108.40.206
which also seems to be Aranha.girafa.com
||Seems to be the AltaVista personal search agent. The
crawling site is sometimes referred to in the agent name
||Seems to come from www.oxxfordinfo.com who offer B2B
||Digimarc search images on the web looking for digital watermatrs
More details at www.digimarc.com/about/index.shtml
||Spiders from 220.127.116.11, which would seem to be part
of www.voila.com, a French-based search engine.
|The www.expressus.com site describes an Interactive Natural
Language encyclopedia that will become a search engine
at www.final-e.com. Good name, but at present it just
maps back onto the ExpressUs site (not such a good name).
Crawls from IP address 18.104.22.168
||Some sort of spider that usually visits using
an IP address from within www.research.att.com or
|Gulper Web Bot
(Open research project to produce opinion-based search engine)
This was a child-safe browser, nut it seems no associated
||Presumably www.internetarchive.com, but that's in "stealth mode"
||www.ifour.co.jp (Japanese Macintosh browser?)
||A web monitoring service.
More details at www.internetseer.com/support/faq.jsp
||Something to do with the European Regional Internet Registry (RIPE)
Browses using IP address 22.214.171.124
||I'm unable to prove this, but this looks like Fireball
technology being used by www.bora.net, which is a Korean portal
|And from the people that brought you xyro (see below),
comes another, newer bot. This one seems to crawl from
the IP address cremant.inria.fr. Update more recently
it's also been seen coming from barracutta.lcs.mit.edu
And then there was "cosmos", crawling from pomelos.inria.fr
Seems these people are a webbot factory. Cosmos doesn't
offer an email address.
||Unable to find
(Too many "Star Wars" references get in the way)
||The PERL programming language comes with a number of
routines for constructing web-aware scripts. This and
related strings are the default user agent identifiers,
although it's perfectly easy to change this to be whatever
||Research project to index the last weeks' news items
It's not clear to me which of these products this might be,
but I'm assuming it's one of them.
Identifier used in a sample perl program in the online
book "Web Client Programming with Perl". The program is
used to check links. Obviously people have tried it, and it works :-)
||Unable to find But the spider came from www.cnet.fr
||Unable to find
A bot indexinx pictures. Crawls from ps.direct2internet.com
|RepoMonkey Bait & Tackle
||A bit of detective work here. Recent entries in the
the log file link this to the site www.hungryhippo.com,
although the robot always appears to come from an IP
address at backflip.com (a bookmarking service).
Visiting www.hungryhippo.com reveals a "coming soon"
site. Looking at the HTML source leads to another page
at http://www.mezzaluna.net/hungryhippo.com/ (appears
The META tags for this page all appear to be references
to day trading, futures, training and the like, although
we did spot the word "fibonacci" (our favourite :-).
So... possibly a future search engine related to stock
trading?, or maybe the Monkey and Hippo are just feeding
me a red herring?
There's more. The picture on the Kenjin site at
www.kenjin.com/kenjin/info.html is currently the same as
that at HungryHippo. Kenjin is an Autonomy company.
|Unable to find details on this, but I'm guessing it's
a research spider from www.rutgers.edu. Crawls using
the IP teal.rutgers.edu
||Unable to find
||Unable to find
|www.unlost.com is "under construction". The robot came
from IP address 126.96.36.199 which is in France.
|Coming soon at www.utopy.com (requires flash). This
venture-capital funded site is "running in stealth mode"
before launching the "new new thing" (is that a typo?).
One of the Flash pages defines Utopia (geddit?), and some
of the browsing is done by IP addresses at ...myutopy.com.
||Unable to find
||Web browser object, that may be incorporated into software
||A set of Delphi components offered to build Internet
applications from www.transerve.com
||Unable to find
Or rather, I found several different "web hounds", so can't tell
which this was,
this appears to be a tool used by this web consultancy.
||Originates in Korea, and is possibly related to their
National Computerization Agency. Uses IP address
Software that tracks Trademark usage
|Seems to be a spider associated with a French
research institute. Usually crawls using the IP
Some IP addresses, or sites may regularly visit you, although the user agent may be obscure, or even change.
Here are a few that I've been able to work out
||This is a site thet offers a speed-up
to your surfing, in return for being able to
monitoring people's surfing habits. The speed-ups
are acheived through a variety of techniques,
and the monitoring info is sold on, although your
privacy is protected. Visit www.netsetter.org
for more details.
||This site daily reads any xml files submitted to
a shareware site in PAD format. PAD is a means for
describing shareware devised by the Association of
Shareware Professionals (www.asp-shareware.org). This site
is performing daily checks, looking to automatically
update its lists with any changes.
All awards gratefully received :-)
This page is © 2000-2001 John A Fotheringham. It may not be
reproduced without permission,
although you are welcome to save a copy for personal use to your hard disk.