Best seats in the house this weekend: IRC

SJ’s Wiki Hut of Horror: Credibility in Journalism and on the Web. SJ, who previously liveblogged the Votes, Bits and Bytes conference at Berkman, will be at the closed Blogging, Journalism, and Credibility conference, which has stirred up a bit of a ruckus in the wake of the Armstrong Williams scandal and the attempted conservative follow-up smear of Kos (who blogged with a big disclaimer on his page while being paid by the Dean campaign, unlike Williams, who didn’t disclose his pay by the DofEd until well after he was called on it). (Anyone want another subordinate clause there?)

The IRC backchannel is always the most fun place to be at a wired conference. This one should definitely be no exception.

Molasses

Our Internet service has been slower than molasses in January for the past few days. First I tested the individual components (cable modem, wireless base station), but they didn’t seem to be having problems, and the Internet as a whole, aside from some isolated problems, seemed to be OK too.

Then I turned off my wireless access and plugged directly into the router. Bing. Full connectivity. This is the second time the wireless card in this laptop has gone funky on me. I really don’t want to replace it right now, but I might not have a choice.

Folksonomies

The folksonomy meme is well underway, with a well timed announcement from Technorati feeding the frenzy. I think that one thing that needs to be addressed, though, is the sense of triumphalism—folksonomy over all organized taxonomies—that I hear in some of the posts.

Clay Shirky starts to address this issue in his excellent post about the economic costs of controlled vocabularies. He points out correctly that for systems at the scale of the Internet there is just no way to control and manage tagging information in a centralized way that is even remotely economically feasible.

This is fine, but it’s not the whole story. There are plenty of systems at smaller scales than the Internet where some combination of controlled and uncontrolled tagging is necessary. Put it another way, you don’t really want your users generating all your metadata in an uncontrolled fashion. Examples: accounting codes for general ledger systems; country codes; languages; lists of employees; and so on. On the flip side, as Scott Rosenberg points out, often user-generated metadata is a lot more tractable and ultimately more useful than trying to cook up “official” lists in a clean room.

There is also discussion about issues of synonym control, browseability, and so forth. Yep. Actually, I’m not convinced about synonym control. If the system offers a way to browse by frequency, it’s likely that users will find the tag that the majority of users are using and want to be a part of it—this happened on Orkut with a number of groups, including the Mac and Macintosh groups. Of course, one of the issues there is that changing groups on Orkut was fairly frictionless, whereas changing tags (categories) on one’s blog is quite a bit stickier as a problem. Where in the infrastructure might one want to see synonyms established?

Incidentally, this is one online topic where the discussion at Slashdot, even at the Score:4 level, was completely unhelpful.

Update: Forgot to blog this even as I was commenting on it; Jeff Jarvis talks about folksonomies for people (thanks to Doc for the reminder).

Worshipping librarians

I found myself in Via Valverde on Friday with the head librarian of Goucher College, explaining the concept of folksonomies over a glass of wine. Lisa became friends with Nancy when they were on a curriculum committee together when Lisa was an undergrad, and Nancy pinged her when she was coming up here for the ALA conference. I was amazed as we talked to her to realize how many of the things I’ve become really interested in on line—such as online communication, information taxonomy and classification, RSS, and so on—are also issues of deep interest in library science, and not just to blogging librarians like Jenny and Jessamyn.

On the difficulty of measuring online traffic

Boing Boing: BoingBoing traffic stats are back. John Battelle talks about the difficulties in interpreting web statistics. A few comments based on my own experience at Microsoft.com:

…of the columns you see, only the first one – “Unique Visitors,” and the last two “Hits” and “Bandwidth” can be taken at face value. “Unique Visitors” counts unique IP addresses that are hitting the site, so it’s a fairly accurate count of actual humans reading Boing Boing. (If anything, its count is a bit low, as it does not account for sites like AOL which may have one IP address for thousands of unique users.)

There are more problems with the Unique Visitors stat than Battelle lets on, of course. AOL will always be the big problem in any attempt to measure Internet users for the reason that John mentions, namely AOL’s proxy looking like one big, extraordinarily active user. However, AOL is certainly not the only place that you see a proxy server that only presents one IP address to the outside world—this is pretty common at large corporations, as well as wifi hotspots. Also, IP addresses can change from session to session if you are doing dial-up, if you reboot a lot, or even if your broadband modem goes down a lot. End result: IP addresses are a good approximation of unique visitors, but I wouldn’t take them at face value.

Another way to count UVs (or Unique Users) is to issue a cookie and count the number of unique cookies hitting the site. There are problems here too—users clean their cookies or refuse to accept them in the first place—but this gets around the proxy server problem.

Neither of these solutions deal with the possibility that you have users who visit from multiple machines, which will have both different IPs (unless they are behind the same proxy server) and different cookies (unless you explicitly require authentication each time you set the cookie).

Nonetheless, one or the other of these methods is in use in most major web stats programs.

…the other two columns – “Pages” and “Number of Visits” – are more difficult to understand. They are AWStats’ best guess as to how many total visits a site gets, as well as how many pages are actually viewed by those visitors. These columns have always disregarded image and video files, but because a lot of our traffic comes from RSS readers, they are certainly inflated by some amount.

Ah yes. Tracking visits means you divide all the hits up from a given user into periods of time when the user was on your site without interruption. As you can imagine there are a lot of assumptions there, starting with how you identify users (your count of visits will be thrown off by the proxy server assumptions discussed above), the time frames you pick (if you expect users to spend up to five minutes on each hit, when a user takes six minutes to read a page before requesting the next one, his activity counts as two visits), and so on. And pages… What is a page? Does it include server-side included pages? Images? What if the images are part of the reason people come to your site? And what about those RSS feeds? As I wrote a long time ago, tracking RSS upsets a lot of the assumptions you make when tracking plain old web traffic.

I did a lot of work in this area when I worked at Microsoft; hopefully the part of my experience that I can actually share will be relevant to the ongoing discussion.

Authentication blues

Why do mailing lists authenticate posters based on email address? In this day of “permanent” forwarding addresses (of which I have about four), I would think that the return address would be an imprecise attribute to use to validate the sender’s identity.

(Background: I’m unable to post to a neighborhood mailing list because it only accepts mail from subscribers. However, I subscribed with my permanent forwarding address, not my “real” email address—and it’s the latter that appears as my return address in outgoing mail and is used to authenticate me.)

Sigh. Passport appears to be edging closer and closer to the dustbin of history, and the Liberty Alliance is no closer than it was over three years ago (when I first wrote about single sign in) to delivering true identity services. When is someone going to solve this problem?

Catch up #2: Global Voices Covenant

Following up on my notes from the Day 2 sessions at the VBB conference, here’s the newly drafted Global Voices Covenant, already translated into multiple languages:

We believe in free speech: in protecting the right to speak — and the right to listen. We believe in universal access to the tools of speech.

To that end, we want to enable everyone who wants to speak to have the means to speak — and everyone who wants to hear that speech, the means to listen to it.

Thanks to new tools, speech need no longer be controlled by those who own the means of publishing and distribution, or by governments that would restrict thought and communication. Now, anyone can wield the power of the press. Everyone can tell their stories to the world.

We want to build bridges across the gulfs of culture and language that divide people, so as to understand each other more fully. We want to work together more effectively, and act more powerfully.

We believe in the power of direct connection. The bond between individuals from different worlds is personal, political and powerful. We believe conversation across boundaries is essential to a future that is free, fair, prosperous and sustainable – for all citizens of this planet.

While we continue to work and speak as individuals, we also want to identify and promote our shared interests and goals. We pledge to respect, assist, teach, learn from, and listen to one other.

We are Global Voices.

And, a la Jeff Jarvis, at whose blog I found the link, here are the first words of the manifesto, as translated by grassroots effort into 19 languages:

إننا نؤمن بحريّة الكلمة: بحماية الحق في إسماع الآخرين والاستماع لهم. لكل فرد في العالم الحق في الوصول إلى أدوات التخاطب.

Wir glauben an Meinungsfreiheit: Schutz des Rechtes, seine Meinung zu äußern. Und des Rechtes, zuzuhören. Wir glauben an unbeschränkten Zugang zu den Instrumenten von Meinungsäußerung.

Creemos en la libertad de expresión, en el derecho a hablar y en el derecho a ser escuchado. Creemos en el acceso universal a todas las herramientas que contribuyen a la expresión.

Uskomme ilmaisunvapauteen: siihen, että oikeutta puhua – ja oikeutta kuunnella – tulee suojella. Uskomme siihen, että kaikilla tulee olla yhtäläinen pääsyoikeus puheen työkaluihin.

Nous croyons à la liberté d’expression, à la protection du droit de parole et du droit d’écouter. Nous croyons en l’accès universel aux outils d’expression.

Crediamo nella libertà di parola: nella protezione del diritto di parlare — e del diritto d’ascoltare. Crediamo nell’accesso universale agli strumenti di comunicazione.

私達はフリースピーチを信じる。自由に発言する権利、そして自由に聞く権利を信じる。万人がスピーチを行うためのツールにアクセスする権利を持つことを信じる。

Nós acreditamos na liberdade de expressão: protegendo o direito de falar — e o direito de ouvir. Nós acreditamos no acesso universal as ferramentas de expressão.

Now I guess we‘ll see how well MarsEdit copes with posting in multiple scripts… (Thanks to Jeff Jarvis for the link.)