Monday, August 08, 2005

Search Engine Divergence


Nearly everyone I know has a favorite Internet search engine, and most non-librarians I are confident that they get the information they need from their chosen search engine.

However, in a recent SearchEngineWatch article, Chris Sherman elaborates on the tendency for search results to vary widely among search engines.
Looking at the organic or natural listings for more than 485,000 first page search results, the study found that:

* 84.9% of total results are unique to one engine
* 11.4% of total results were shared by any two engines
* 2.6% of total results were shared by any three engines
* 1.1% of total results were shared by any four engines

I'm relatively certain that the folks I know who are devotees of a particular search engine are probably not aware that they could find completely different information by searching an engine other than their chosen one. I wonder if an awareness of this fact would influence their searching behavior.

I also need to think about how this may or may not relate to Clay Shirky's article on folksonomies. In the article, Shirky talks about how traditional library cataloging tends to be focused on pulling together all works on a particular subject, regardless of the language used within the work or the point of view of the author. He argues that this collapsing of subject terminology leads to 'signal loss'. Using the example of 'movies' and 'cinema', Shirky argues that the two terms actually represent very different concepts, and that collapsing the two into one category means that information retrieval is less precise. He goes on to say:
When we get to really contested terms like queer/gay/homosexual, by this point, all the signal loss is in the collapse, not in the expansion. "Oh, the people talking about 'queer politics' and the people talking about 'the homosexual agenda', they're really talking about the same thing." Oh no they're not. If you think the movies and cinema people were going to have a fight, wait til you get the queer politics and homosexual agenda people in the same room.

Looking at these two articles together, I have to think that the success of a given search engine for a given user has to do with the degree to which the search enginge does (or does not) collapse categories in its results and the degree to which the user wants (or doesn't want) to have categories collapsed.