A few times in the last month, I’ve had conversations with people that went something like, “Oh, I wonder how Google’s editorial staff keeps up to construct relevant search results for all those terms.” Apologies to the speakers, but that’s a little like wondering how those elves make the cookies taste so good.
My former coworker Craig Pfeifer points to the original journal papers that underlie the theory of how Google ranks content on the web, including PageRank, some data mining algorithms, and Google itself. If looking at these papers tells the reader anything about Google, it’s that the relevance isn’t built editorially. The rules of the underlying algorithms might be tweaked a little bit from time to time, but the heavy editorializing of results isn’t really necessary.
Interesting point: a lot of the exploits that have been done on Google in the last few years, such as propping up a page by using lots of pointers from the home page of low-ranked sites and Google-bombing through the use of link text, are either implicitly or explicitly called out in these papers. In the case of the former, Brin and Page admit that it might be possible to outsmart the relevance algorithms with a lot of low-ranked sites, and say “At worst, you can have manipulation in the form of buying advertisements (links) on important sites. But, this seems well under control since it costs money.” Apparently they didn’t predict link farms or blog memes too well, but that isn’t to say that their work is a miserable failure…