Google is Crawling IRC

General talk about EFnet

Moderators: Website/Forum Admins, EFnet/General Moderators

User avatar
HM2K
Posts: 209
Joined: Thu Jul 24, 2003 5:34 pm
Location: UK
Contact:

Google is Crawling IRC

Postby HM2K » Mon Nov 10, 2003 9:46 pm

Tony Collen http://manero.org/weblog/archives/000133.html#000133reports (confirmed by Google and others, apparently) that Google is sending robots to IRC channels (internet chat rooms) as part of some experiment.
Since things said in chat rooms are traditionally considered very private, I can't imagine what they plan to do.

source: http://google.blogspace.com/

more information here: http://www.theregister.co.uk/content/6/33835.html

It is possible that google is going to offer a search similar to http://www.searchirc.com/ or http://www.packetnews.com/ as this would generate alot of traffic towards their service. It is thought that the bot is only there to gather links in topics and some user information such as real name etc to further their link database. Though it is doubtful that they will be logging channels due to privacy. At this time it is unknown what google want with information such as this, and if at all this is the information they are gathering.

They may be creating some kind of irc search, or file search or even simply expanding their links database, who knows, i'm sure time will tell.

Though recently it seems google has been expanding, especially since they were recently in talks with Microsoft about a possible merger...

source: http://news.bbc.co.uk/1/hi/business/3232209.stm

RE: I was wondering if anyone had experianced such bots from EFnet yet, or if anyone has any further information on them...
User avatar
Osc
Posts: 75
Joined: Mon Aug 11, 2003 8:08 pm
Location: Atlanta, GA

Postby Osc » Mon Nov 10, 2003 10:51 pm

I plan on treating these just like the Media Force bots. <shrug>
irc.he.net Notice -- Osc (osc@irc.packetmonkeys.com) is now an operator
<CHANFIX> You're now logged in with the following flags: ADMIN.
<OCF> Authentication successful. Welcome, Osc.
User avatar
HM2K
Posts: 209
Joined: Thu Jul 24, 2003 5:34 pm
Location: UK
Contact:

Postby HM2K » Mon Nov 10, 2003 11:09 pm

To be honest, I like the idea of this, though I didn't like the idea of media sentry... as you may well know... As far as I know they don't join channels like the media sentry bots did, so there's no need to k-lined/ban them, as there's already website bots and such that do this on a regular basis, and those arn't k-lined/banned.
User avatar
Jon
Posts: 42
Joined: Wed Jul 16, 2003 5:33 am
Location: NB.CA

Postby Jon » Tue Nov 11, 2003 5:52 am

doesn`t bother me any.

If Google wants to gather urls on animal por, i mean, someone`s age/sex/location, go ahead :P
User avatar
munky
Site Admin
Posts: 826
Joined: Wed Jul 02, 2003 4:54 pm
Location: Phoenix AZ
Contact:

Postby munky » Tue Nov 11, 2003 1:13 pm

perhaps google labs are doing something similar to searchirc.com's offshoot: ircimages.com (for their image search)
In God we trust,
Everyone else must have an X.509 certificate.
User avatar
HM2K
Posts: 209
Joined: Thu Jul 24, 2003 5:34 pm
Location: UK
Contact:

Postby HM2K » Wed Nov 12, 2003 3:31 am

thats one thing i was thinking, thats also something that'd be quite cool, but i doubt it'd work in their current setup for images.google.com
bobjuh
Posts: 39
Joined: Wed Jul 16, 2003 6:43 am
Location: Assen, Netherlands
Contact:

Postby bobjuh » Sat Nov 15, 2003 12:47 am

and google aren't the only ones.
After the recent events where IRC users noticed Google bots monitoring their chat, another similar project has came to the attention of IRCJunkie. Although this time there might be a bit more to the reason behind the bots then feeding a new search engine alone.

On the 30th of October CoSCo, part of Helsinki Institute for Information Technology (HIIT), loaded a small army of bots on the Freenode.net IRC network and started logging the chat from practically all channels on the network there without asking permission to the users inside the channels.

"We launched the bots practically to every channel on Freenode almost at the same time", Ville H. Tuulos of CoSCo explained to IRCJunkie. "After about 10 minutes of action, we got k-lined. Naturally our large scale effort didn't remain unnoticed and it might have seemed being hostile to many who know pests of IRC, clones, floodbots etc."

In a follow up email we asked Mr. Tuulos if he found it ethical that chats were being logged and used outside the channel without the users being explicitly told before doing so. We have not received an answer to this question.

A discussion between Freenode and CoSCo started after which CoSCo opened a FAQ section on their website explaining the reason of their bot logging chat on IRC, which you can find on a /WHOIS on the bot, and the K:Line was eventually lifted. The FAQ and website explain the logs are being used for scientific and academic research. As the FAQ says: "Our main effort is to develop a full-fledged open source Web search engine, including the crawlers."

If you will further browse the website of CoSCo you can eventually find a page about a "Search-Ina-Box" project. Quote from the page: "Search-Ina-Box is an open source, plug-and-play server that will provide power search and content-based targeted advertising facilities to intranets and their mobile community."

"Content-based targeted advertising facilities", could this mean IRC chat is being analyzed and used to produce "content based targeted adverts"? We asked Mr. Tuulos if there is a connection between the bots logging IRC chats, and the Search-Ina-Box project. "Both Search-Ina-Box and Irchiver are based on the same statistical models and language processing. This is the science part", he replied. Science and commerce seems to go hand in hand sometimes.

As a follow up question we asked Mr. Tuulos if users can make up a good decision to let the bots log their chat or not based on the information on the Irchiver page and FAQ alone. This question was left unanswered.

"We are doing hard science as can be seen from our publication list, but we don't want to be just academic nichés. We've seen that all the science is best to evaluate in the real world. That's why we have SIB and Irchiver."

CoSCo says the project will be released as opensource, probally under the GPL license when finished. This of course raises concern what others will do with this source. "We launched the bots practically to every channel on Freenode almost at the same time", Ville H. Tuulos said to IRCJunkie. Potentially script kiddies could "borrow" this code to create bot nets that mass join channels and negatively affect an IRC network this way.

We asked if CoSCo was interested in connecting the bots to other networks as well. "We did test our bots shortly on IRCNet, but we didn't collect any data. We asked QuakeNet administration about their willingness to help us in our effort, but the problem seems to be their strict connection limits" Mr. Tuulos replied. "We would be happy to cooperate with any interested IRC networks."

The host being used by the Irchive bots is irchive.it.hiit.fi.
Jason
Posts: 2
Joined: Sat Nov 15, 2003 12:59 am

Postby Jason » Sat Nov 15, 2003 1:01 am

JFYI - The text that bobjuh pasted was taken from my Friend Asmo's website, http://irc-junkie.org

I like for credit to be given when due.
Developer - http://searchirc.com
Operator - irc.servercentral.net
User avatar
HM2K
Posts: 209
Joined: Thu Jul 24, 2003 5:34 pm
Location: UK
Contact:

Postby HM2K » Sun Nov 16, 2003 3:02 pm

Hmmm, it seems this has very little or nothing to do with my original post... thanks all the same :p

Oh and Jason I think it was blatently obvious where the information was taken from just by reading the article, no need for the unnessisary plug, though bobjuh a simple link to the ircjunkie website would have been more appropreate, as your submittion didn't contain any links.
bobjuh
Posts: 39
Joined: Wed Jul 16, 2003 6:43 am
Location: Assen, Netherlands
Contact:

Postby bobjuh » Tue Nov 18, 2003 7:21 am

If you read the i would be avious what the source was.
IRC junkie was named several times in the article.

But oke a should had named the source.
User avatar
slushey
Posts: 43
Joined: Sat Aug 09, 2003 4:11 pm
Location: Newfoundland, Canada

Postby slushey » Tue Nov 25, 2003 8:19 pm

This would be very interesting, but if they do that, they may have some legal problems, ie. linking to a site with pirated software, etc.
Humor is the best sense we ALL have in common.

slushey ....just me
nothing more.....nothing less

"In Canada we play Duck, Duck, Moose."
User avatar
munky
Site Admin
Posts: 826
Joined: Wed Jul 02, 2003 4:54 pm
Location: Phoenix AZ
Contact:

Postby munky » Tue Nov 25, 2003 8:50 pm

google is not responsible for content that it links to, it is meerly an indexing system.
see http://www.google.com/terms_of_service.html for their 'quote'
there's nothing stoping a user from searching for "warez download" in google now, so adding links found through irc traffic wouldn't change the automation of their indexing system, or their legality of linking to it.
Last edited by munky on Wed Nov 26, 2003 1:42 pm, edited 1 time in total.
In God we trust,
Everyone else must have an X.509 certificate.
User avatar
HM2K
Posts: 209
Joined: Thu Jul 24, 2003 5:34 pm
Location: UK
Contact:

Postby HM2K » Tue Nov 25, 2003 10:37 pm

would or wouldn't?
User avatar
munky
Site Admin
Posts: 826
Joined: Wed Jul 02, 2003 4:54 pm
Location: Phoenix AZ
Contact:

Postby munky » Wed Nov 26, 2003 1:53 pm

wouldn't, sorry
In God we trust,
Everyone else must have an X.509 certificate.
User avatar
junkiexl
Posts: 2
Joined: Sun Jun 06, 2004 4:25 am
Location: boston, usa & birmingham, uk
Contact:

google crawlers

Postby junkiexl » Sun Jun 06, 2004 11:19 am

With this recent activity of crawlers & spybots, & the possibility of soaking some of Microsoft's cash drawer, tends me to believe that Google is turning into another bad internet circus (see http://www.aol.com)...
" We all have what it takes to fail." -Edgar Degas

Who is online

Users browsing this forum: Bing [Bot] and 4 guests