The Technorati Top 100: A Lesson in How Not to Calculate Weblog Popularity

August 14, 2005

@ 02:00 PM

In recent weeks there have been a number of blog postings critical of the Technorai Top 100 List of popular web logs. The criticisms have primarily been of two flavors; some posts have been critical of the idea of blogging as popularity contests which such lists encourage and others have criticized the actual mechanism of calculating popularity used by Technorati. I agree with both criticisms especially the former. There have been a number of excellent posts arguing both points which I have think are worth sharing.

Mary Hodder, in her post Link Love Lost or How Social Gestures within Topic Groups are More Interesting Than Link, argues that more metrics besides link count should be used for calculating popularity and influence. Some of the additional metrics she suggests include comment counts and number of subscribers to the site's RSS feed. She also suggests creating topic specific lists instead of one ber list for the entire blogosphere. It seems a primary motivation for encouraging this approach is to increase the pool of bloggers that are targetted by PR agencies and the like. Specifically Mary writes

However, I'm beginning to see many reports prepared by PR people, communications consultants etc. that make assessments of 'influential bloggers' for particular clients. These reports 'score' bloggers by some random number based on something: maybe inbound links or the number of bloglines subscribers or some such single figure called out next to each blog's name.

Shelley Powers has a different perspective in her post Technology is neither good nor evil. In arguing against the popularity contests inherent in creating competing A-lists or even just B-lists to complement the A-lists she writes

Even if we tried to analyze a persons links to another, we cant derive from this anything other than person A has linked to person B several times. If we use these to define a community to which we belong, and then seek to rank ourselves within these communities, all weve done is create a bunch of little Technorati 100s and communities that are going to form barriers to entry. We see this communal behavior all too often: a small group of people who know each other link to each other frequently and to outsiders infrequently; basically shutting down the discussion outside of the community.
...
I think Mary should stop with I hate rankism. I understand the motivations behind this work, but ultimately, whatever algorithm is derived will eventually end up replicating the existing patterns of authority rather than replacing them. This pattern repeated itself within the links to Jay Rosens post; it repeated itself within the speaker list that Mary started for women ("where are the women speakers"), but had its first man within a few hours, and whose purpose was redefined within a day to include both men and women.

Rankings are based on competition. Those who seek to compete will always dominate within a ranking, no matter how carefully we try to 'route' around their own particular form of 'damage'. What we need to challenge is the pattern, not the tools, or the tool results.

I agree with Shelley that attempts to right the so called "imbalance" created by lists such as the Technorati Top 100 will encourage competition and stratification within certain blogging circles. I also agree that despite whatever algorithms are used, a lot of the same names will still end up on the lists for a variety of reasons. A major one being that a number of the so-called A-list blogs actually work very hard to be "popular" and changing the metrics by which their popularity is judged won't change this fact.

So Shelley has given us some of the social arguments while popularity lists such as the Technorati Top 100 aren't a good idea. But are the technical flaws in Technorati's approach to calculating weblog popularity so bad? Yes, they are.

Danah Boyd has a post entitled The biases of links where she did some research to show exactly how flawed simply counting links on web pages isn't an accurate way to calculate popularity or influence. There are a lot of excellent points in Danah's post and the entire post is worth reading multiple times. Below are some key excerpts from Danah's post

I decided to do the same for non-group blogs in the Technorati Top 100. I hadn't looked at the Top 100 in a while and was floored to realize that most of those blogs are group blogs and/or professional blogs (with "editors" and clear financial backing). Most are covered in advertisements and other things meant to make them money. It's very clear that their creators have worked hard to reach many eyes (for fame, power or money?).
...
Blogrolls:

All MSNSpaces users have a list of "Updated Spaces" that looks like a blogroll. It's not. It's a random list of 10 blogs on MSNSpaces that have been recently updated. As a result, without special code (like in Technorati), search engines get to see MSNSpace bloggers as connecting to lots of other blogs. This would create the impression of high network density between MSNSpaces which is inaccurate.

Few LiveJournals have a blogroll but almost all have a list of friends one click away. This is not considered by search tools that look only at the front page.
...

Blogrolls seem to be very common on politically-oriented blogs and always connect to blogs with similar political views (or to mainstream media).

Blogrolls by group blogging companies (like Weblogs, Inc.) always link to other blogs in the domain, using collective link power to help all.
...

Male bloggers who write about technology (particularly social software) seem to be the most likely to keep blogrolls. Their blogrolls tend be be dominantly male, even when few of the blogs they link to are about technology. I haven't found one with >25% female bloggers (and most seem to be closer to 10%).

On LJ (even though it doesn't count) and Xanga, there's a gender division in blogrolls whereby female bloggers have mostly female "friends" and vice versa.

I was also fascinated that most of the mommy bloggers that i met at Blogher link to Dooce (in Top 100) but Dooce links to no one. This seems to be true of a lot of topical sites - there's a consensus on who is in the "top" and everyone links to them but they link to no one.
...

Linking patterns:

The Top 100 tend to link to mainstream media, companies or websites (like Wikipedia, IMDB) more than to other blogs (Boing Boing is an exception).

Blogs on blogging services rarely link to blogs in the posts (even when they are talking about other friends who are in their blogroll or friends' list). It looks like there's a gender split in tool use; Mena said that LJ is like 75% female, while Typepad and Moveable Type have far fewer women.

Bloggers often talk about other people without linking to their blog (as though the audience would know the blog based on the person). For example, a blogger might talk about Halley Suitt's presence or comments at Blogher but never link to her. This is much rarer in the Top 100 who tend to link to people when they reference them.

Content type is correlated with link structure (personal blogs contain few links, politics blogs contain lots of links). There's a gender split in content type.

When bloggers link to another blog, it is more likely to be same gender.

I began this investigation curious about gender differences. There are a few things that we know in social networks. First, our social networks are frequently split by gender (from childhood on). Second, men tend to have large numbers of weak ties and women tend to have fewer, but stronger ties. This means that in traditional social networks, men tend to know far more people but not nearly as intimately as those women know. (This is a huge advantage for men in professional spheres but tends to wreak havoc when social support becomes more necessary and is often attributed to depression later in life.)

While blog linking tends to be gender-dependent, the number of links seems to be primarily correlated with content type and service. Of course, since content type and service are correlated by gender, gender is likely a secondary effect.
...
These services are definitely measuring something but what they're measuring is what their algorithms are designed to do, not necessarily influence or prestige or anything else. They're very effectively measuring the available link structure. The difficulty is that there is nothing consistent whatsoever with that link structure. There are disparate norms, varied uses of links and linking artifacts controlled by external sources (like the hosting company). There is power in defining the norms, but one should question whether or companies or collectives should define them. By squishing everyone into the same rule set so that something can be measured, the people behind an algorithm are exerting authority and power, not of the collective, but of their biased view of what should be. This is inherently why there's nothing neutral about an algorithm.

There is a lot of good stuff in the excerpts above and it would take an entire post or maybe a full article to go over all the gems in Danah's entry. One random but interesting point is that LiveJournal bloggers are penalized by systems such as the Technorati Top 100. For example, Jamie Zawinski has over 1900 people who link to him from their Friend's page in LiveJournal but he somehow doesn't make the cut for the Technorati Top 100. Maybe the fact that most of his popularity is within the LiveJournal community makes his "authority" less valid than others with less incoming links that are in the Technorati Top 100 list.

Yeah, right.

Categories: Mindless Link Propagation

Tracked by:
"Measuring The ACTUAL Blogosphere Part 1 - Technorati View" (Sacred Cow Dung) [Trackback]

« Some Thoughts on MSN Filter | Home | Podcasting with Atom 1.0: More Than One ... »

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for The Technorati Top 100: A Lesson in How Not to Calculate Weblog Popularity - Dare Obasanjo's weblog