Yahoo! announced some cool stuff last week. In his blog post Yahoo! UI JavaScript treats Simon Willison writes

The Yahoo! Developer Network was updated yesterday with a veritable gold-mine of Exciting New Stuff, coinciding with the launch of the brand new Yahoo! User Interface Blog. #

Here are some of the highlights: #

The code is all under a BSD Open Source license, which means you can use it freely in your own projects, including for commercial development. #

This is a fantastic contribution to the Web developer community by Yahoo!. It's taken me a week to blog about it because I wanted to try it out first. Unfortunately  I still haven't gotten around to trying out the code but I decided to give it a shout out anyway. This is basically what I expected Microsoft to provide developers with Atlas but not only has Yahoo! done it first, it has done so in a way that is completely free (as in speech and as in beer). Wow. 

The folks at Yahoo! are definitely understand what it means to build a developer platform and a developer community on the Web. Kudos to everyone involved in getting this out. 


 

Categories: Web Development

February 23, 2006
@ 10:37 PM

In his post More SOAP vs. REST arguments Stefan Tilkov askes

I just noticed an interesting thing in the most recent iteration of the SOAP-vs.-REST debate: this time, nobody seems to have mentioned the benefit — if you believe that’s what it is — of protocol independence. Why is that?

For the record, I personally believe it’s one of the weakest arguments.

When I was on the XML team, we used to talk about XML infosets and data format independence to imply that it made sense if people could use transfer formats that were optimized for their use cases but still get the benefit of the XML machinary such as XML APIs (DOM, SAX, etc), XPath querying, XSLT transformations, etc. This philosophy is what has underpined the arguments for protocol independence at the SOAP level.

Now that I'm actually a customer of web services toolkits as opposed to a builder of the framework that they depend on, my perspective has changed. Protocol independence isn't really that important at the SOAP level. Protocol independence is important at the programming model/toolkit level. I should be able to write some business logic once and then simply be able to choose to expose it as SOAP, XML-RPC, RSS, or a proprietary binary protocol without rewriting a bunch of code. That's what is important to my business needs.  

It took a while but I eventually convinced some of the key Indigo folks that this was the right direction to go with in the Windows Communication Foundation. I damn near clapped when Doug demoed a WS-Transfer RSS service exposing a HTTP/POX endpoint, a HTTP/SOAP endpoint, and a TCP/Binary SOAP endpoint at an internal summit a couple of months ago.

Bottom Line: Protocol independence is important to providers of Web services. However it isn't required at the SOAP level.


 

Categories: XML Web Services

From Matt Cutt's blog post about the Google Page Creator we learn

Oh, and by the way, it looks like Google has released a tool to make mini-websites. The Google Page Creator at http://pages.google.com/ lets you throw up a quick set of pages without a ton of hassle. Looks like a bunch of different look ‘n’ feel choices:

I feel like I'm in a time warp. Did Google just ship their own version of GeoCities? Isn't this space dead? End users have graduated from personal home pages to blogs and social networking tools which is why sites like MySpace, MSN Spaces and Xanga have tens of millions of users. Business users are likely to want an entire package like Office Live instead of just a web page creation tool.

Who exactly is the target audience for this offering?

Update: I just noticed that username@gmail.com is equal to username.googlepages.com. How do you ship a product with such an obvious privacy bug? I guess if you are creating a 20% project you don't need to have privacy reviews. Doh!


 

Categories: Web Development

When you build web applications that have to scale up to millions of users, you sometimes end up questioning almost every aspect of your design as you hit scalability problems. One thing I hadn't expected was to notice that a number of people in our shoes had begun to point out the limitations of SQL databases when it comes to building modern Web applications. Below are a sampling of such comments primarily gathered together so I have easy access to them next time I want an example of what I mean by limitations of SQL databases when building large scale Web applications.

  1. From Google's Adam Bosworth we have the post Where Have all the Good databases Gone

    The products that the database vendors were building had less and less to do with what the customers wanted...Google itself (and I'd bet a lot Yahoo too) have similar needs to the ones Federal Express or Morgan Stanley or Ford or others described, quite eloquently to me. So, what is this growing disconnect?

    It is this. Users of databases tend to ask for three very simple things:

    1) Dynamic schema so that as the business model/description of goods or services changes and evolves, this evolution can be handled seamlessly in a system running 24 by 7, 365 days a year. This means that Amazon can track new things about new goods without changing the running system. It means that Federal Express can add Federal Express Ground seamlessly to their running tracking system and so on. In short, the database should handle unlimited change.

    2) Dynamic partitioning of data across large dynamic numbers of machines. A lot people people track a lot of data these days. It is common to talk to customers tracking 100,000,000 items a day and having to maintain the information online for at least 180 days with 4K or more a pop and that adds (or multiplies) up to a 100 TB or so. Customers tell me that this is best served up to the 1MM users who may want it at any time by partioning the data because, in general, most of this data is highly partionable by customer or product or something. The only issue is that it needs to be dynamic so that as items are added or get "busy" the system dynamically load balances their data across the machines. In short, the database should handle unlimited scale with very low latency. It can do this because the vast majority of queries will be local to a product or a customer or something over which you can partion...

    3) Modern indexing. Google has spoiled the world. Everyone has learned that just typing in a few words should show the relevant results in a couple of hundred milliseconds. Everyone (whether an Amazon user or a customer looking up a check they wrote a month ago or a customer service rep looking up the history for someone calling in to complain) expects this. This indexing, of course, often has to include indexing through the "blobs" stored in the items such as PDF's and Spreadsheets and Powerpoints. This is actually hard to do across all data, but much of the need is within a partioned data set (e.g. I want to and should only see my checks, not yours or my airbill status not yours) and then it should be trivial.
    ...
    Users of databases don't believe that they are getting any of these three. Salesforce, for example, has a lot of clever technology just to hack around the dynamic schema problem so that 13,000 customers can have 13,000 different views of what a prospect is.

    If the database vendors ARE solving these problems, then they aren't doing a good job of telling the rest of us.

  2. Joshua Schachter of del.icio.us is quoted as saying the following in a recent talk

    Scaling: avoid early optimization. SQL doesn't map well to these problems - think about how to split up data over multiple machines. Understand indexing strategies, profile every SQL statement. Nagios or similar for monitoring.

    Tags don't map well to SQL. Sometimes you can prune based on usage - only index the first few pages for example. This keeps indexes small and fast.

  3. Mark Fletcher of Bloglines wrote the following in his post Behind the Scenes of the Bloglines Datacenter Move (Part 2)

    The Bloglines back-end consists of a number of logical databases. There's a database for user information, including what each user is subscribed to, what their password is, etc. There's also a database for feed information, containing things like the name of each feed, the description for each feed, etc. There are also several databases which track link and guid information. And finally, there's the system that stores all the blog articles and related data. We have almost a trillion blog articles in the system, dating back to when we first went on-line in June, 2003. Even compressed, the blog articles consist of the largest chunk of data in the Bloglines system, by a large margin. By our calculations, if we could transfer the blog article data ahead of time, the other databases could be copied over in a reasonable amount of time, limiting our downtime to just a few hours.

    We don't use a traditional database to store blog articles. Instead we use a custom replication system based on flat files and smaller databases. It works well and scales using cheap hardware.

Interesting things happen when you question everything. 


 

Categories: Web Development

February 21, 2006
@ 06:59 PM

Yesterday, Mark Baker asked Why all the WS Interop problems?.

If you'd have asked me six or seven years ago - when this whole Web services things was kicking off - how things were likely to go with them, I would have said - and indeed, have said many times since - that they would fail to see widespread use on the Internet, as their architecture is only suitable for use under a single adminstrator, i.e. behind a firewall. But if you'd asked me if I would have thought that there'd be this much trouble with basic interoperability of foundational specifications, I would have said, no, I wouldn't expect that. I mean, despite the architectural shortcomings, the job of developing interoperable specifications, while obviously difficult, wouldn't be any more difficult because of these shortcomings... would it?

In my opinion, the answer to his question is obvious. A few months ago I wrote in my post The Perils of Premature Standardization: Attention Data and OPML that

I used be the program manager responsible for a number of XML technologies in the .NET Framework while I was on the XML team at Microsoft. The technology I spent the most time working with was the XML Schema Definition Language (XSD). After working with XSD for about three years, I came to the conclusion that XSD has held back the proliferation and advancement of XML technologies by about two or three years. The lack of adoption of web services technologies like SOAP and WSDL on the world wide web is primarily due to the complexity of XSD.

If you read the three posts that Mark Baker links to about SOAP interop problems, you'll notice that two of them are about the issues with mapping xsi:nil and minOccurs="0" to concepts in traditional object oriented programming languages [specifically C#]. If a value is null does that map to xsi:nil or minOccurs="0"? How do I differentiate the two in a programming language that doesn't have these concepts? How do I represent xsi:nil in a programming  language where primitive types such as integers can't be null? 

The main problem with WS-* interop is that vendors decided to treat it as a distributed object programming technology but based it on a data typing language (i.e. XSD) which does not map at all well with traditional object oriented programming languages. On the other hand, if you look at other XML-Web-Services-as distributed-objects technology like XML-RPC, you don't see as many issues. This is because XML-RPC was meant to map cleanly to traditional object oriented programming languages.

Unfortunately I don't see the situation improving anytime soon unless something drastic is done.


 

Categories: XML Web Services

February 21, 2006
@ 06:05 PM

Mike Gunderloy hits a number of my pet peeves in his Daily Grind 823 post where he writes

The Problem with Single Sign-In Systems - Dare Obesanjo attempts to explain why Passport is so annoying. But hey, you know what - your customers don't care about the technical issues. They just want it to work. Explaining how hard the problem is just makes you look like a whiner. This principle applies far beyond Passport.

Pet peeve #1, my name is spelled incorrectly. I find this really irritating especially since anyone who is writing about something I blogged about can just cut and paste my name from their RSS reader or my webpage. I can't understand why there are so many people who mispell my name as Dare Obesanjo or  Dare Obsanjo. Is it really that much of a hassle to use cut & paste? 

Pet peeve #2, projecting silly motives as to why I wrote a blog post. My blog is a personal weblog where I talk about [mostly technical] stuff that affects me in my professional and personal life. It isn't a Microsoft PR outlet aimed at end users. 

Pet peeve #3, not being able to tell the difference between what I wrote and what someone else did. Trevin is the person who wrote a blog post trying to explain the user experience issues around Passport sign-in, I just linked to it.


 

Categories: Ramblings

February 20, 2006
@ 09:14 PM

Patrick Logan has a post on the recently re-ignited discussion on REST vs. SOAP entitled REST and SOAP where he writes

Update: Mike Champion makes an analogy between messaging technologies (SOAP/WSDL and HTTP) and road vehicle types (trucks and cars). Unfortunately this is an arbitrary analogy. That is, saying that SOAP/WSDL is best "to haul a lot of heavy stuff securely and reliably, use a truck" does not make it so. The question is how to make an objective determination.

Mike is fond of implying that you need to use WS-* if you want security and reliability while REST/POX is only good for simple scenarios. I agree with Patrick Logan that this seems to be an arbitrary determination not backed by empirical evidence. As an end user, the fact that my bank allows me to make financial transactions using REST (i.e. making withdrawals and transfers from their website) is one counter example to the argument that REST isn't good enough for secure and reliable transactions. If it is good enough for banks why isn't it good enough for us?

Of course, the bank's website is only the externally focused aspect of the service and they probably do use systems that ensure reliability and security internally that go beyond the capabilities of the Web's family of protocols and formats. However as someone who builds services that enable tens of millions of end users communicate with each other on a daily basis I find it hard to imagine how WS-* technologies would significanlty improve the situation for folks in my situation.

For example, take the post Clemens Vasters entitled The case of the missing "durable messaging" feature where he writes

I just got a comment from Oran about the lack of durable messaging in WCF and the need for a respective extensibility point. Well... the thing is: Durable messaging is there; use the MSMQ bindings. One of the obvious "problems" with durable messaging that's only based on WS-ReliableMessaging is that that spec (intentionally) does not make any assertions about the behavior of the respective endpoints.

There is no rule saying: "the received message MUST be written do disk". WS-ReliableMessaging is as reliable (and unreliable in case of very long-lasting network failures or an endpoint outright crashing) and plays the same role as TCP. The mapping is actually pretty straightforward like this: WS-Addressing = IP, WS-ReliableMessaging = TCP.

So if you do durable messaging on one end and the other end doesn't do it, the sum of the gained reliability doesn't add up to anything more than it was before.

The funny thing about Clemens's post is that scenarios like the hard drive of a server crashing are the exact kind of reliability issues that concern us in the services we build at MSN Windows Live. It's cool that specs like WS-ReliableMessaging allow me to specify semantics like AtMostOnce (messages must be delivered at most once or result in an error) and InOrder (messages must be delivered in the order they were sent) but this only scratches the surface of what it takes to build a reliable world class service. At best WS-* means you don't have to reinvent the building blocks when building a service that has some claims around reliability and security. However the specifications and tooling aren't mature yet. In the meantime, many of us have services to build.   

I tend to agree with Don's original point in his Pragmatics post. REST vs. SOAP is mainly about reach of services and not much else. If you know the target platform of the consumers of your service is going to be .NET or some other platform with rich WS-* support then you should use SOAP/WSDL/WS-*. On the other hand, if you can't guarantee the target platform of your customers then you should build a Plain Old XML over HTTP (POX/HTTP) or REST web service.


 

Categories: XML Web Services

For the past few releases, we've had work items in the RSS Bandit roadmap around helping users deal with information overload. We've added features like newspaper views and search folders to make it easier for users to manage the information in feeds they consume. Every release I tried to make sure we add a feature that I know will make it easier for me to get the information I want from the feeds I am subscribed to without being overwhelmed.

For the Jubilee release I had planned that the new feature we'd add in the "dealing with information overload" bucket would be the ability to rate posts and enable filtering based on this rating. After thinking about this for a few weeks, I'm not sure this is the right route any more. There are tough technical problems to surmount to make the feature work well but I think the bigger problems are the expected changes to user behavior. Based on my experiences with rating systems and communities, I suspect that a large percentage of our user base will not be motivated to start rating the feeds they are subscribed to or the new items that show up in their aggregator.

On a related note, I've recently been using meme trackers like Memeorandum and TailRank which try to show the interesting topics among certain technology blogs. I think this is very powerful concept and is the next natural evolution of information aggregators such as RSS Bandit. The big problem with these sites is that they only show the current topics of interest among a small sliver of blogs which in many cases do not overlap with the blogs one might be interested in. For example, today's headline topic on Tech.Memeorandum is that a bunch of bloggers attended a house party which I personally am not particularly interested in. On the other hand, I'd find it useful if another way I could view my subscriptions in RSS Bandit pivoted around the current hot topics amongst the blogs I read. This isn't meant to replace the existing interface but instead would be another tool for users to customize their feed reading experience the same way that newspaper views and search folders do today. 

If you are an RSS Bandit user and this sounds like a useful feature I'd like to hear your thoughts on what functionality you'd like to see here. A couple of opening questions that I'd like to get opinions on include

  • Would you like to see the most popular links in new posts? For an example of what this looks like, see the screenshot in Nick Lothian's post on  Personalized meme tracking
  • How would you like to classify new posts; unread posts or items posted within the last day? The reason I ask this is that you may already have read a few posts that linked to a very popular topic, in which case should it be ranked higher than another link for which you haven't read any of the related posts but hasn't been linked to as much?
  • Would you like a 'mark discussion as read' feature? Would it be nice to be able to mark all posts that link to a particular item as read?

I have a bunch of other questions but these should do for now.


 

Categories: RSS Bandit

February 19, 2006
@ 06:33 PM

This is basically a "me too" post. Dave Winer has a blog post entitled Blogging is part of life where he writes

I agree with the author of the Slate piece that’s getting so much play in the blogosphere, up to a point. The things that called themselves blogs that came from Denton and Calacanis are professional publications written by paid journalists that use blogging software for content management. That’s fine and I suppose you can call them blogs, but don’t get confused and think that their supposed death (which itself is arguable) has anything to do with the amateur medium that is blogging. They’re separate things, on separate paths with different futures.

To say blogging is dead is as ridiculous as saying email or IM or the telephone are dead. The blog never belonged on the cover of magazines, any more than email was a cover story (it never was) but that doesn’t mean the tool isn’t useful inside organizations as a way to communicate, and as a way for businesses to learn how the public views them and their competitors.

Whenever Dave Winer writes about blogging I tend to agree with him completely. This time is no exception. Blogs are social software, they facilitate communication and self expression between individuals. Just like with email and IM, there are millions of people interacting using blogs today. There are more people reading and writing blogs on places like MySpace and MSN Spaces than the populations of a majority of the countries on this planet. Blogs are here to stay. 

Debating on whether companies that build companies around blogs will survive is orthogonal to discussing the survival of blogging as a medium. It's not like debating whether companies that send out email newsletters or make mailing list software will survive is equivalent to discussing the survival of email as a communication medium. Duh. 


 

Categories: Social Software

Recently there was a question asked on the RSS Bandit forums from a user who was Unable to Import RSSBandit-Exported OPML into IE7. The question goes

I exported my feeds from RSSBandit 1.3.0.42 to an OPML file in hopes of trying the feed support in IE7. IE7 seems to try to import, but ultimately tells me no feeds were imported. The exported file must have over a 100 feeds, so it's not that. Has anyone else been able to import feeds from RSSBandit into IE7?

I got an answer for why this is the case from the Internet Explorer's RSS team.  The reason is provided in the the RSS Bandit bug report, Support type="rss" for export of feeds, where I found out that somewhere along the line someone came up with the convention of adding a type="rss" attribute to indicate which were the RSS feeds in an OPML file. The Internet Explorer RSS team has decided to enforce this convention for indicating RSS feeds in an OPML file and will ignore entries that don't have this annotation.

Since RSS Bandit supports both RSS/Atom feeds and USENET newsgroups, I can see the need to be able to differentiate which are the feeds in an OPML file without having applications probe each URL. However I do think that type="rss" is a misnomer since it should also apply to Atom feeds. Perhaps type="feed" instead?