David Sifry has posted another one of his State of the Blogosphere blog entries. He writes

In summary:

  • Technorati now tracks over 35.3 Million blogs
  • The blogosphere is doubling in size every 6 months
  • It is now over 60 times bigger than it was 3 years ago
  • On average, a new weblog is created every second of every day
  • 19.4 million bloggers (55%) are still posting 3 months after their blogs are created
  • Technorati tracks about 1.2 Million new blog posts each day, about 50,000 per hour

As usual for this series of posts, Dave Sifry plays fast and lose with language by interchangeably using blogosphere and number of blogs Technorati is tracking. There is a big difference between the two but unfortunately many people seem to fail at critical thinking and repeat Technorati's numbers as gospel. It's now general knowledge that services like MySpace and MSN Spaces have more blogs/users than Technorati tracks overall.

I find this irritating because I've seen lots of press reports underreport the estimated size of the blogosphere by quoting the Technorati numbers. I suspect that the number of blogs out there is closer to 100 million (you can get that just by adding up the number of blogs on the 3 or 4 most popular blogging services) and not hovering around 35 million. One interesting question for me is whether private blogs/journals/spaces count as part of the blogosphere or not? Then again for most people the blogosphere is limited to their limited set of interests (technology blogs, mommy blogs, politics blogs, etc) so that is probably a moot question.

PS: For a good rant about another example of Technorati playing fast and lose language, see Shelley Powers's Technology is neither good nor evil which riffs on how Technorati equates number of links to a weblog with authority.


 

April 17, 2006
@ 04:03 PM

Robert Scoble has a blog post entitled Halfway through my blog vacation (change in comment policy)

But, mostly, this past week was about change.

Some things I've changed? 1) No more coffee. 2) No more soda. 3) Xercising. 4) No more unhappy people in my life. 5) Get balance back in my own life.
...
One of my most memeorable conversations, though, was with Buzz Bruggeman, CEO of ActiveWords and a good friend. He told me to hang around people who are happy. And I realized I had been listening to too many people who were deeply unhappy and not bringing any value into my life. He told me to listen to this recording on NPR about "finding happiness in a Harvard Classroom." He also told me about the four agreements, which are Don Miguel Ruiz's code for life. Good stuff.

Over the past year I've been on a mission to simplify my life piece by piece. Along the line I've made some promises to myself which I've kept and others which have been more difficult to stick with.

Health: I enrolled in the 20/20 Lifestyles program in the health club near Microsoft about six months ago. Since then I've lost just over 60 pounds (27.5 kilos for my metric peeps). This week is my last week with my personal trainer and dietician before I'm on my own. I had hoped to lose more weight but last month was somewhat disruptive to my schedule with my mom being in town for two weeks and travelling for ETech 2006, SPARK and New York to see my dad. I am somewhat proud that I gained less than 2 pounds even though my schedule was complete mess. I've kept two promises to myself about my health; I'll work out 5 days a week and will keep my daily caloric intake to within 2000 calories a day 5 days a week [but never over 3000 calories in one day]. The excercise promise has been easy to keep but the diet promise has been harder than I've liked. Eating out is the hard part. Giving up soda for water was easier than I thought.

Work/Life Balance: I also decided to be better at compartmentalizing my work and home life. I promised myself not to spend more than 10.5 hours a day at work [in by 9AM, out by 7-7.30 PM at the latest] and to stop using the VPN to connect to work when I'm at home. I've also tried to stop checking work email from home on weekday evenings and will check it only once a day on weekends. If I'm in crunch mode for a particular deadline then this may chaneg temporarily. Last week I averaged about 14 hours a day at work because I had a deadline I wanted to hit for Friday. However I didn't want this to mean I got home late since I value spending dinner time with my girlfriend so I left for work much earlier in the day last week. This week I'm back to my regular schedule. 

Professional Work Load: Last year, I worked on lots of things I was interested in simultaneously. I worked on the social networking platform for Windows Live, replacing MSN Member Directory with MSN Spaces Profiles, photo storage for Windows Live services from MSN Spaces to Windows Live Expo, and a bunch of other stuff which hasn't shipped so I can't mention here. This was just the stuff my boss had me working on. There was also stuff I was interested in that I just worked on without being explicitly told to such as organizing efforts around the MSN Windows Live developer platform (see http://msdn.microsoft.com/live AND keeping the spark alive on us getting an RSS platform built for Windows Live. This was a lot of stuff to try to fit into a workday besides all the other crap that fits into your day (meetings, meetings, meetings). At my last review, I got some feedback that some folks on my team felt they weren't getting my full attention because I spent so much time on 'extracurricular' activities. Although I was initially taken aback by this feedback I realized there some truth to it. Since then I've been working on handing off some of the stuff I was working on that wasn't part of my job requirements. Thanks in part to the positive response to my ThinkWeek paper there is now an entire team of people working on the stuff I was driving around the Windows Live developer platform last year. You should keep an eye on the blogs of folks like Ken Levy and Danny Thorpe to learn what we have planned in this arena. The RSS platform for Windows Live spark has now been fanned into a flame and I worked hard to get Niall Kennedy to join us to drive those efforts. Realizing I can't work on everything I am interesed in has been liberating.

Geeking at Home: I've cut down on how much time I spend reading blogs and don't subscribe to any mailing lists. Even on the blogs I read, I try to cut down on reading comment sections that have more negative energy than I can stomach which means skipping the comments section of Mini-Microsoft blog most days of the week. Even at work, I subscribe to only two or three distribution lists that aren't for my team or specific projects I am working on. I don't plan to have concurrent side projects going on at home anymore. I'll keep working on RSS Bandit for the forseeable future. Whenever there is a lull in development such as after a major release, I may work on an article or two. However I won't have two or three article drafts going at the same time while also being in bug fixing mode which used to be the norm for me a year or two ago.


I wish Robert luck in his plan to simplify his life and improve his health.


 

Categories: Personal

April 17, 2006
@ 03:05 PM

I'm still continuing my exploration of the philosophy behind building distributed applications following the principles behind the REpresentational State architectural style (REST) and Web-style software. Recent comments in my blog have introduced a perspective that I hadn't considered much before. 

Robert Sayre wrote

Reading over your last few posts, I think it's important to keep in mind there are really two kinds of HTTP. One is HTTP-For-Browsers, and one is HTTP-For-APIs.

API end-points encounter a much wider variety of clients that actually have a user expecting something coherent--as opposed to bots. Many of those clients will have less-than robust HTTP stacks. So, it turns out your API end-points have to be much more compliant than whatever is serving your web pages.

Sam Ruby wrote

While the accept header is how you segued into this discussion, Ian's and Joe's posts were explicitly about the Content-Type header.

Relevant to both discussions, my weblog varies the Content-Type header it returns based on the Accept header it receives, as there is at least one popular browser that does not support application/xhtml+xml.

So... Content-Type AND charset are very relevant to IE7. But are completely ignored by RSSBandit. If you want to talk about “how the Web r-e-a-l-l-y works”, you need to first recognize that you are talking about two very different webs with different set of rules. When you talk about how you would invest Don's $100, which web are you talking about?

This is an interesting distinction and one that makes me re-evaluate my reasons for being interested in RESTful web services. I see two main arguments for using RESTful approaches to building distributed applications on the Web.  The first is that it is simpler than other approaches to building distributed applications that the software industry has cooked up. The second is that it has been proven to scale on the Web.

The second reason is where it gets interesting. Once you start reading articles on building RESTful web services such as Joe Gregorio's How to Create a REST Protocol and Dispatching in a REST Protocol Application you realize that how REST advocates talk about how one should build RESTful applications is actually different from how the Web works. Few web applications support HTTP methods other than GET and POST, few web applications send out the correct MIME types when sending data to clients, many Web applications use cookies for storing application state instead of allowing hypermedia to be the engine of application state (i.e. keeping the state in the URL) and in a suprisingly large number of cases the markup in documents being transmitted is invalid or malformed in some ways. However the Web still works. 

REST is an attempt to formalize the workings of the Web ex post facto. However it describes an ideal of how the Web works and in many cases the reality of the Web deviates significantly from what advocates of RESTful approaches preach. The question is whether this disconnect invalidates the teachings of REST. I think the answer is no. 

In almost every case I've described above, the behavior of client applications and the user experience would be improved if HTTP [and XML]  were used correctly. This isn't supposition, as the developer of  an RSS reader my life and that of my users would be better if servers emitted the correct MIME types for their feeds, the feeds were always at least well-formed and feeds always pointed to related metadata/content such as comment feeds (i.e. hypermedia is the engine of application state).

Let's get back the notion of the Two Webs. Right now, there is the primarily HTML-powered Web which whose primary clients are Web browsers and search engine bots. For better or worse, over time Web browsers have had to deal with the fact that Web servers and Web masters ignore several rules of the Web from using incorrect MIME types for files to having malformed/invalid documents. This has cemented hacks and bad practices as the status quo on the HTML web. It is unlikely this is going to change anytime soon, if ever.

Where things get interesting is that we are now using the Web for more than serving Web documents for Web browsers. The primary clients for these documents aren't Web browsers written by Microsoft and Netscape AOL Mozilla and bots from a handful of search engines. For example, with RSS/Atom we have hundreds of clients with more to come as the technology becomes more mainstream. Also Web APIs becoming more popular, more and more Web sites are exposing services to the world on the Web using RESTTful approaches. In all of these examples, there is justification in being more rigorous in the way one uses HTTP than one would be when serving HTML documents for one's web site. 

In conclusion, I completely agree with Robert Sayre's statement that there are really two kinds of HTTP. One is HTTP-For-Browsers, and one is HTTP-For-APIs.

When talking about REST and HTTP-For-APIs, we should be careful not to learn the wrong lessons from how HTTP-For-Browsers is used today.
 

Charlene Li of Forrester research has a blog post entitled Google Calendar creates a platform for "time" applications where she writes

Having trialed a half dozen of them (including Airset, CalendarHub, 30Boxes, Planzo, and SpongeCell), Google Calendar is truly a best of breed in terms of ease of use and functionality. Here’s a quick overview of what’s different about the new product:

-          Manage multiple calendars. ....

-          Easy to use. ....

-          Sharing. ....

-          Open platform. I think this is the most interesting aspect of Google's calendar. The iCal standard along with RSS means that I will be able to synch my work calendar with my Google calendar. Although tie-ins with programs like Outlook aren’t yet available, Carl Sjogreen, Google Calendar’s product manager, said that such functionality will be coming "soon". Google is also partnering with Trumba to enabled "one-click" addition of events to your calendar (Trumba already works with calendar products from Yahoo!, Outlook, MSN Hotmail, and Apple). Also promised are synching capabilities to mobile phones. Carl also said that an API was in the works, which would enable developers to create new products on top of Google Calendar.

I've always thought that Web-based calendaring and event-based products haven't hit the sweet spot with end users because they are too much work to use for little benefit. The reason I use calendaring software at work is mainly to manage meetings. If I didn't have to attend meetings I'd never use the calendaring functionality of Outlook. In my personal life, the only times calendaring software would have been useful is integrating invitation services like eVite into my calendars at work and/or at home (I use both Yahoo! Mail and Windows Live Mail).  However either eVite doesn't provide this functionality or it's unintuitive since I've never discovered it. So web-based calendaring software has been pretty much a bust for me. AJAXifying it doesn't change this in any way. 

On the other hand, I could probably build the integration I want between my work calendar and my eVite calendar if they had an API [and I was invited to enough parties to make this a worthy excercise]. It seems there is now an awareness of this in the industry at the big three (Google, Yahoo and Microsoft) which is going to turn online calendaring into an interesting space over the next few months. Google Calendar is a step in the right direction by providing RSS feeds and announcing a forthcoming API. Yahoo! is already thinking about the same thing and also announced an upcoming Calendar API last month. As for Windows Live, our CTO has been talking to folks at work about using RSS+SSE as a way to share events and I'm sure they are paying attention [or at least will now that both Yahoo! and Google have thrown down].

With the increased use of RSS by Web-based calendaring applications perhaps it is time for RSS readers to also become more calendar aware?


 

To follow up my post asking Is HTTP Content Negotiation Broken as Designed?, I found a post by Ian Hickson on a related topic. In his post entitled Content-Type is dead he writes

Browsers and other user agents largely ignore the HTTP Content-Type header, relying on undefined sniffing heuristics to determine what the content of a page really is.

  • RSS feeds are always sniffed, regardless of their MIME type, because, to quote a Safari engineer, "none of them have the right mime type".
  • The target of img elements is almost always assumed to be an image, regardless of the declared type.
  • IE in particular is well known for ignoring the Content-Type header, despite this having been the source of security bugs in the past.
  • Browsers have been forced to implement heuristics to handle text/plain files as binary because video files are widely served with the wrong MIME types.

Unfortunately, we're now at a stage where browsers are continuously having to reverse-engineer each other to determine why they are handling content differently. A browser can't afford to render any less content than a browser with more market share, because otherwise users won't switch, and the new browser will not be adopted.

I think it may be time to retire the Content-Type header, putting to sleep the myth that it is in any way authoritative, and instead have well-defined content-sniffing rules for Web content.

Ian is someone who's definitely been around the block when it comes to HTTP given that he's been involved in Web standards groups for several years and used to work on the Opera Web Browser. On the other side of the argument is Joe Gregorio who posts Content-Type is dead, for a short period of time, for new media-types, film at 11 which does an excellent job of the kind of dogmatic arguing based on theory that I criticized in my previous post. In this case, Joe uses the W3C Technical Architecture Groups (TAG) findings on Authoritative Metadata

MIME types and HTTP content negotiation are good ideas in practice that have failed to take hold on the Web. Arguing that this fact contravenes stuff written in specs from last decade or from findings by some ivory tower group of folks from the W3C seems like religous dogmatism and not fodder for decent technical debate. 

That said, I don't think MIME types should be retired. However I do think some Web/REST advocates need to look around and realize what's happening on the Web instead of arguing from an "ideal" or "theoretical" perspective.


 

Categories: Web Development

While you were sleeping, Windows Live Academic Search was launched at http://academic.live.com. From the Web site we learn

Welcome to Windows Live Academic

Windows Live Academic is now in beta. We currently index content related to computer science, physics, electrical engineering, and related subject areas.

Academic search enables you to search for peer reviewed journal articles contained in journal publisher portals and on the web in locations like citeseer.

Academic search works with libraries and institutions to search and provide access to subscription content for their members. Access restricted resources include subscription services or premium peer-reviewed journals. You may be able to access restricted content through your library or institution.

We have built several features designed to help you rapidly find the content you are searching for including abstract previews via our preview pane, sort and group by capability, and citation export. We invite you to try us out - and share your feedback with us.

I tried a comparison of a search for my name on Windows Live Academic Search and Google Scholar.

  1. Search for "Dare Obasanjo" on Windows Live Academic Search

  2. Search for "Dare Obasanjo" on Google Scholar

Google Scholar finds almost 20 citations while Windows Live Academic Search only finds one. Google Scholar seems to use sources other than academic papers such as articles written on technology sites like XML.com. I like the user interface for Windows Live Academic Search but we need to expand the data sources we query for me to use it regularly.


 

Categories: Windows Live

Working on RSS Bandit is my hobby and sometimes I retreat to it when I need to unwind from the details of work or just need a distraction. This morning was one of such moments. I decided to look into the issue raised in the thread from our forums entitled MSN Spaces RSS Feeds Issues - More Info where some of our users complained about a cookie parsing error when subscribed to feeds from MSN Spaces.

Before I explain what the problem is, I'd like to show an example of what an HTTP cookie header looks like from the Wikipedia entry for HTTP cookie

Set-Cookie: RMID=732423sdfs73242; expires=Fri, 31-Dec-2010 23:59:59 GMT; path=/; domain=.usatoday.com

Note the use of a semicolon as a delimiter for separating cookies. So it turned out that the error was in the following highlighted line of code


if (cookieHeaders.Length > 0) {
container.SetCookies(url, cookieHeaders.Replace(";", ","));
}

You'll note that we replace the semicolon delimiters with commas. Why would we do such a strange thing when the example above shows that cookies can contain commas? It's because the CookieContainer.SetCookies method in the .NET Framework requires the delimiters to be commas. WTF ?

This seems so fundamentally broken I feel that I must be mistaken. I've tried searching for possible solutions to the problem online but I couldn't find anyone else who has had this problem. Am I using the API incorrectly? Am I supposed to parse the cookie by hand before feeding it to the method? If so, why would anyone design the API in such a brain damaged manner?

*sigh*

I was having more fun drafting my specs for work.

Update: Mike Dimmick has pointed out in a comment below that my understanding of cookie syntax is incorrect. The cookie shown in the Wikipedia example is one cookie not four as I thought. It looks like simply grabbing sample code from blogs may not have been a good idea.:) This means that I may have been getting malformed cookies when fetching the MSN Spaces RSS feeds after all. Now if only I can repro the problem...


 

Categories: RSS Bandit | Web Development

In a recent mail on the ietf-types mailing list Larry Masinter (one of the authors of the HTTP 1.1 specification) had the following to say about content negotiation in HTTP

> > GET /models/SomeModel.xml HTTP/1.1
>
> Host: www.example.org
>
> Accept: application/cellml-1.0+xml; q=0.5, application/cellml-1.1+xml; q=1

HTTP content negotiation was one of those "nice in theory" protocol additions that, in practice, didn't work out. The original theory of content negotiation was worked out when the idea of the web was that browsers would support a handful of media types (text, html, a couple of image types), and so it might be reasonable to send an 'accept:' header listing all of the types supported. But in practice as the web evolved, browsers would support hundreds of types of all varieties, and even automatically locate readers for content-types, so it wasn't practical to send an 'accept:' header for all of the types.

So content negotiation in practice doesn't use accept: headers except in limited circumstances; for the most part, the sites send some kind of 'active content' or content that autoselects for itself what else to download; e.g., a HTML page which contains Javascript code to detect the client's capabilities and figure out which other URLs to load. The most common kind of content negotiation uses the 'user agent' identification header, or some other 'x-...' extension headers to detect browser versions, among other things, to identify buggy implementations or proprietary extensions.

I think we should deprecate HTTP content negotiation, if only to make it clear to people reading the spec that it doesn't really work that way in practice. .

HTTP content negotiation has always seemed to me something that seems like a good idea in theory but didn't really seem to work out in practice. It's good to see one of the founding fathers of HTTP actually admit that it is an example of theory not matching reality. It's always good to remember that just because something is written in a specification from some standards body doesn't make it a holy writ. I've seen people debate online who throw out quotes from Roy Fieldings's dissertation and IETF RFCs as if they are evangelical preachers quoting chapter and verse from the Holy Bible.

Some of the things you find in specifications from the W3C and IETF are good ideas. However they are just that ideas. Sometimes technological advances make these ideas outdated and sometimes the spec authors simply failed to consider other perspectives for solving the problem at hand. Expecting a modern browser to send an itemized list of every file type that can be read by the applications on your operating system on every single GET request plus the priority in which these file types are preferred is simply not feasible or really useful in practice. It may have been a long time ago but not now. 

Similar outdated and infeasible ideas litter practically every W3C and IETF specification out there. Remember that the next time you quote chapter and verse from some Ph.D dissertation or IETF/W3C specification to justify a technology decision. Supporting standards is important but more important is applying critical thinking to the problem at hand. .

Thanks to Mark Baker for the link to Larry Masinter's post.


 

Categories: Web Development

I just noticed that last week the W3C published a working draft specification for The XMLHttpRequest Object. I found the end of the working draft somewhat interesting. Read through the list of references and authors of the specifcation below

References

This section is normative

DOM3
Document Object Model (DOM) Level 3 Core Specification, Arnaud Le Hors (IBM), Philippe Le Hégaret (W3C), Lauren Wood (SoftQuad, Inc.), Gavin Nicol (Inso EPS), Jonathan Robie (Texcel Research and Software AG), Mike Champion (Arbortext and Software AG), and Steve Byrne (JavaSoft).
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner.
RFC2616
Hypertext Transfer Protocol -- HTTP/1.1, R. Fielding (UC Irvine), J. Gettys (Compaq/W3C), J. Mogul (Compaq), H. Frystyk (W3C/MIT), L. Masinter (Xerox), P. Leach (Microsoft), and T. Berners-Lee (W3C/MIT).

B. Authors

This section is informative

The authors of this document are the members of the W3C Web APIs Working Group.

  • Robin Berjon, Expway (Working Group Chair)
  • Ian Davis, Talis Information Limited
  • Gorm Haug Eriksen, Opera Software
  • Marc Hadley, Sun Microsystems
  • Scott Hayman, Research In Motion
  • Ian Hickson, Google
  • Björn Höhrmann, Invited Expert
  • Dean Jackson, W3C
  • Christophe Jolif, ILOG
  • Luca Mascaro, HTML Writers Guild
  • Charles McCathieNevile, Opera Software
  • T.V. Raman, Google
  • Arun Ranganathan, AOL
  • John Robinson, AOL
  • Doug Schepers, Vectoreal
  • Michael Shenfield, Research In Motion
  • Jonas Sicking, Mozilla Foundation
  • Stéphane Sire, IntuiLab
  • Maciej Stachowiak, Apple Computer
  • Anne van Kesteren, Opera Software

Thanks to all those who have helped to improve this specification by sending suggestions and corrections. (Please, keep bugging us with your issues!)

Interesting. A W3C specification that documents a proprietary Microsoft API which not only does not include a Microsoft employee as a spec author but doesn't even reference any of the IXMLHttpRequest documentation on MSDN.

I'm sure there's a lesson in there somewhere. ;)


 

Categories: Web Development | XML