Regular readers of this blog may have noticed that accessing this blog has been flaky all weekend. I've gotten numerous reports via Twitter that my blog was displaying the following error message when being visited

Server Error in '/site1/weblog' Application.
--------------------------------------------------------------------------------
Exception of type 'System.OutOfMemoryException' was thrown.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
Source Error:
An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.
Stack Trace:
[OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.]
Go1999(RegexRunner ) +0
System.Text.RegularExpressions.CompiledRegexRunner.Go() +14
System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text, Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean quick) +144
System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen, String input, Int32 beginning, Int32 length, Int32 startat) +134
System.Text.RegularExpressions.Regex.Match(String input) +44
newtelligence.DasBlog.Web.Core.UrlMapperModule.HandleBeginRequest(Object sender, EventArgs evargs) +458
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +68
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +75

This seems like a straightforward issue. The machine my blog is hosted on is running out of memory probably because the server is either overloaded from having too many sites hosted on it or one of the sites is badly written and is using up more than its fair share of memory. This has happened once or twice in the past and when it did I used the Live Chat feature of WebHost4Life, my hosting company, to contact a support rep who quickly moved my site to a different server. Thus when I got the first reports of this issue I thought this would be a routine support issue. I was sorely mistaken.

Earlier this year, WebHost4Life started migrating their customers to a new infrastructure with a different set of support staff. It seems this was a cost cutting measure because the new support staff seem to be a lot less technical than their previous counterparts and seem to have less access to their infrastructure. Over the weekend I chatted with about three or four different support folks over IM and opened a ticket that was closed multiple times. Here is a sampling of some of the messages that were written as part of closing the support ticket by WebHost4Life

Hello,

Thank you for contacting Support.

We apologize for any inconvenience this has caused you. I have checked your website at the URL: http://www.25hoursaday.com/weblog/ and it is working fine. Could you please check it once again after clearing browser cache and cookies? If the issue persists, please get back to us with the exact error message, so that we will investigate on your issue.

Thank you!

Sincerely,

Sharon <redacted>
Customer Support

~~~ TICKET CLOSED

Hello,

Thank you for your reply.

I have checked your website URL: http://www.25hoursaday.com/weblog/ and I was able to duplicate the issue. I have received the error message. Hence, I have asked a member of our team who specializes in website management to review your issue. You should be hearing from this specialist within 24-48 hours. If you have any questions in the meantime, please let us know and be sure to refer to the link http://www.webhost4life.com/member/sconsole for the quickest service.

Thank you for choosing webhost4life, we appreciate your support.

Sincerely,

Sharon <redacted>
Customer Support

~~~ TICKET ESCALATED TO TIER 2 SUPPORT

Hello,

I have checked the issue and and was not able to duplicate it. It seems to be issue with your ISP. Please check once again and if the issue still persists, please get back to us, tracert results of the website when you experience the issue and also the exact time and location so that we can investigate on it further.

If you have any further questions, please update the Support Console.

Sincerely,

Aleta <redacted>
Technical Specialist

~~~ TICKET CLOSED

Hello,

Thank you for getting back to us.

Currently, your website http://www.25hoursaday.com/weblog/ is loading without any slowness. I suggest you to check the website functionality again and get back to us, if the issue persists.

If you have any further questions, please update the Support Console.

Sincerely,

Kara
Technical Specialist

~~~ TICKET CLOSED

I’m still getting reports that my blog is throwing out of memory exceptions infrequently and as you can see from the above there isn’t anything being done to fix this problem. At first I wondered if the poor support I was getting was because this happened on the weekend and perhaps the technical folks only work weekdays. Unfortunately my hopes were dashed when someone on Twitter pointed me to webhosting review site with dozens of complaints about WebHost4Life which echoed my experience. It seems the company is under new management and quality has suffered under some of their cost cutting moves.

Although I paid for a year’s worth of service, I’ve decided that I probably need to switch hosting companies. Thus I’m seeking recommendations for a web hosting company that supports ASP.NET and .NET 2.0 or higher from any of my readers. Also if anyone is familiar with the process of cancelling your service with a web host including getting refunded for services not provided, I’d love any experience or tips you have to share.


 

Categories: Rants

In a move that was telegraphed by Fred Wilson’s (Twitter investor) post titled The Twitter Platform's Inflection Point where he criticized Twitter platform developers for “filling holes” in Twitter’s user experience, the Twitter team have indicated they will start providing official Twitter clients for various mobile platforms. There have been announcements of an official Blackberry client and the purchase of Tweetie so it can become the official iPhone client. The latter was announced in the blog post Twitter for iPhone excerpted below

Careful analysis of the Twitter user experience in the iTunes AppStore revealed massive room for improvement. People are looking for an app from Twitter, and they're not finding one. So, they get confused and give up. It's important that we optimize for user benefit and create an awesome experience.

We're thrilled to announce that we've entered into an agreement with Atebits (aka Loren Brichter) to acquire Tweetie, a leading iPhone Twitter client.

This has led to some anger on the part of Twitter client developers with some of the more colorful reactions being the creation of the Twitter Destroyed My Market Segment T-shirt and a somewhat off-color image that is making the rounds as representative of what Twitter means by “filling holes”.

As an end user and someone who works on web platforms, none of this is really surprising. Geeks consider having to wade through half a dozen Twitter clients before finding one that works for them a feature even though paradox of choice means that most people are actually happier with less choices not more. This is made worse by the fact that in the mobile world, this may mean paying for multiple apps until you find one that you’re happy with.

Any web company that cares about their customers will want to ensure that their experience is as simple and as pleasant as possible. Trusting your primary mobile experience to the generosity and talents of 3rd party developers means you are not responsible for the primary way many people will access your service. This loss of control isn’t great especially when the design direction you want to take your service in may not line up with what developers are doing in their apps. Then there’s the fact that forcing your users to make purchasing decisions before they can use your site conveniently on their phone isn’t a great first time experience either. 

I expect mobile clients are just the beginning. There are lots of flaws in the Twitter user experience that are due to Twitter’s reliance on “hole fillers” that I expect they’ll start to fill. The fact that I ever have to go to http://bit.ly as part of my Twitter workflow is a bug. URL shorteners really have no reason to exist in the majority of use cases except when Twitter is sending an SMS message. Sites that exist simply as image hosting services for Twitter like Twitpic and YFrog also seem extremely superflous especially when you consider that since only power users know about them not every Twitter user is figuring out how to use the service for image sharing. I expect this will eventually become a native feature of Twitter as well. Once Twitter controls the primary mobile clients for accessing their service, it’ll actually be easier for them to make these changes since they don’t have to worry about whether 3rd party apps will support Twitter image hosting vs. Twitpic versus rolling their own ghetto solution.

The situation is made particularly tough for 3rd party developers due to Twitter’s lack of a business model as Chris Dixon points out in his post Twitter and third-party Twitter developers

Normally, when third parties try to predict whether their products will be subsumed by a platform, the question boils down to whether their products will be strategic to the platform. When the platform has an established business model, this analysis is fairly straightforward (for example, here is my strategic analysis of Google’s platform).  If you make games for the iPhone, you are pretty certain Apple will take their 30% cut and leave you alone. Similarly, if you are a content website relying on SEO and Google Adsense you can be pretty confident Google will leave you alone. Until Twitter has a successful business model, they can’t have a consistent strategy and third parties should expect erratic behavior and even complete and sudden shifts in strategy.

So what might Twitter’s business model eventually be?  I expect that Twitter search will monetize poorly because most searches on Twitter don’t have purchasing intent.  Twitter’s move into mobile clients and hints about a more engaging website suggest they may be trying to mimic Facebook’s display ad model.

The hard question then is what opportunities will be left for developers on Twitter’s platform once the low hanging fruit has been picked by the company. Here I agree with frequent comments by Dave Winer and Robert Scoble, that there needs to be more metadata attached to tweets so that different data aggregation and search scenarios can be built which satisfy thousands of niches. I especially like what Dave Winer wrote in his post How Twitter can kill the Twitter-killers where he stated

Suppose Twitter wants to make their offering much more competitive and at the same time much more attractive to developers. Sure, as Fred Wilson telegraphed, some developers are going to get rolled over, esp those who camped out on the natural evolutionary path of the platform vendor. But there are lots of things Twitter Corp can do to create more opportunities for developers, ones that expand the scope of the platform and make it possible for a thousand flowers to bloom, a thousand valuable non-trivial flowers. Permalink to this paragraph

The largest single thing Twitter could do is open tweet-level metadata. If I want to write an app for dogs who tweet, let me add a "field" to a tweet called isDog, a boolean, that tells me that the author of the tweet is a dog. That way the dog food company who has a Twitter presence can learn that the tweet is from a dog, from the guy who's developing a special Twitter client just for dogs, even though Twitter itself has no knowledge of the special needs of dogs. We can also add a field for breed and age (in dog years of course). Coat type. Toy preference. A link to his or her owner. Are there children in the household?

I probably wouldn’t have used the tweeting dog example but the idea is sound. Location is an example of metadata that is added to tweets which can be used for interesting applications on top of the core news feed experience as shown by Twittervision and Bing's Twitter Maps. I think there’s an opportunity to build interesting things in this space especially if developers can invent new types of metadata without relying on Twitter to first bless new fields like they’ve had to do with location (although their current implementation is still inadequate in my opinion).

Over the next few months, Twitter will likely continue to encroach on territory which was once assumed to belong to 3rd party developers. The question is whether Twitter will replace these opportunities they’ve taken away with new opportunities or instead if they’ve simply used developers as a means to an end and now they are no longer useful?

Note Now Playing: Notorious B.I.G. - One More Chance Note


 

The debate on the pros and cons of non-relational databases which are typically described as “NoSQL databases” has recently been heating up. The anti-NoSQL backlash is in full swing from the rebuttal to one of my recent posts of mine I saw mentioned in Dennis Forbes’s write-up The Impact of SSDs on Database Performance and the Performance Paradox of Data Explodification (aka Fighting the NoSQL mindset) and similar thoughts expressed in typical rant-y style by Ted Dziuba in his post I Can't Wait for NoSQL to Die.

This will probably be my last post on the topic for a while given that the discussion has now veered into religious debate territory similar to vi vs. emacs OR functional vs. object oriented programming. With that said…

It would be easy to write rebuttals of what Dziuba and Forbes have written but from what I can tell people are now talking past each other and are now defending entrenched positions. So instead I’ll leave this topic with an analogy. SQL databases are like automatic transmission and NoSQL databases are like manual transmission. Once you switch to NoSQL, you become responsible for a lot of work that the system takes care of automatically in a relational database system. Similar to what happens when you pick manual over automatic transmission. Secondly, NoSQL allows you to eke more performance out of the system by eliminating a lot of integrity checks done by relational databases from the database tier. Again, this is similar to how you can get more performance out of your car by driving a manual transmission versus an automatic transmission vehicle.

However the most notable similarity is that just like most of us can’t really take advantage of the benefits of a manual transmission vehicle because the majority of our driving is sitting in traffic on the way to and from work, there is a similar harsh reality in that most sites aren’t at Google or Facebook’s scale and thus have no need for a Bigtable or Cassandra. As I mentioned in my previous post, I believe a lot of problems people have with relational databases at web scale can be addressed by taking a hard look at adding in-memory caching solutions like memcached to their infrastructure before deciding the throw out their relational database systems.

Note Now Playing: Lady Gaga - Bad Romance Note


 

Categories: Web Development

I was recently on a panel at the South by South West interactive conference (SXSW) where we discussed multiple applications of the real-time Web and the things that might prevent us from seeing its true potential. I’ve found it interesting that the key take away from the panel is that privacy issues will be one of the biggest problems we will face as we move forward. You can see this perspective in CNN’s coverage of the panel in the story Privacy concerns hinder 'real-time Web' creation, developers say and GigaOm’s write-up SXSW: Is Privacy on the Social Web a Technical Problem? 

This overlap of privacy and real-time web features is brought into sharp relief when you look at services such as Foursquare and Gowalla which provide a mechanism for people to broadcast their physical location to a group of friends in real-time. I started using Foursquare last week and I’ve noticed I’m even more careful about who I accept friend requests from than on Facebook or Windows Live Messenger. The fact is that I may share status updates and photos with people but it doesn’t mean I want them to be aware of where I am on an up to the minute basis especially if I’m out spending time with my family and friends. This difference in how we view location data from other sorts of real-time data we share is captured by the co-founder of Foursquare in the article Facebook Isn't For Real Life Friends Anymore, Says Foursquare's Dennis Crowley where it states

Facebook plans to clone Foursquare's central service -- the ability for site members to use their phones to "check-in" from restaurants and bars -- and make it a mere Facebook feature.

But Foursquare cofounder Dennis Crowley says there's something Facebook can't clone: the real-life friendships between Foursquare users.

"Facebook used to be who your friends are, now it's everyone," Dennis told us in an interview.

"[Foursquare] is more tightly curated to who you want to have as your check-in friends. Facebook is good place for status updates and sharing photos, not to keep tabs on where people are going."

I think Crowley is on to something when he says Facebook can’t clone the Foursquare relationship model. I suspect that like Twitter, Foursquare has created a social network whose value proposition is differentiated enough from Facebook’s that it can grow into a relatively popular albeit smaller service that will not be “killed” by Facebook*. Secondly, there is a lot of synergy between Foursquare and Facebook as evidenced by the fact that Facebook is the largest referrer of traffic to Foursquare thanks to their implementation of Facebook Connect. So I think the claims that one will kill the other is just the usual tech press creating conflict to generate page views.

One thing I have noticed is that location can’t just be a field you bolt on to a status update. It has to be a key part of the information you are sharing with others otherwise it adds little value to the user experience and in fact may detract from it by adding clutter. For example, compare what a location-based update from Foursquare looks like on Facebook versus what the exact same update looks like on Twitter

VS

 

The difference between both updates is almost night and day even though the actual status text I shared is the same. The way Twitter has approached location is to treat it as a bunch of “poorly translated” GPS coordinates that are bolted on to the end of my status update. The Facebook update not only gives you that but also a human readable location for where I am down to the room number and includes some social context such as the fact that I was attending the talk with two coworkers from Windows Live.

As real-time location data starts to permeate social experiences, there’s a lot to learn from the above screenshots. In the example above, people who are interested in the topic based on my status knew which room to find danah’s talk from the Facebook update whereas they were told “downtown austin” in the Twitter update.  As designers of social software applications, we should be mindful that location data enhances the experience and the information being shared. Adding location simply for buzzword compliance or to add metadata to the status update without enhancing the experience actually ends up crufting it up.

* Twitter’s value proposition is that it is the place to interact with celebrities and microcelebrities that you care about. It is useful to note that the much maligned Suggested Users List was key in establishing this value proposition in the minds of users. This is different from Facebook’s position as the social network for your real world friends, family, coworkers and acquaintances.

Note Now Playing: B.O.B. - Nothin' On You (featuring Bruno Mars) Note


 

A few weeks ago Todd Hoff over on the High Scalability blog penned a blog post titled MySQL and Memcached: End of an Era? where he wrote

If you look at the early days of this blog, when web scalability was still in its heady bloom of youth, many of the articles had to do with leveraging MySQL and memcached. Exciting times. Shard MySQL to handle high write loads, cache objects in memcached to handle high read loads, and then write a lot of glue code to make it all work together. That was state of the art, that was how it was done. The architecture of many major sites still follow this pattern today, largely because with enough elbow grease, it works.

With a little perspective, it's clear the MySQL+memcached era is passing.

LinkedIn has moved on with their
Project Voldemort. Amazon went there a while ago.

Digg declared their entrance into a new era in a post on their blog titled Looking to the future with Cassandra,

Twitter has also declared their move in the article
Cassandra @ Twitter: An Interview with Ryan King.

Todd’s blog has been a useful source of information on the topic of scaling large scale websites since he catalogues as many presentations as he can find from industry leaders on how they’ve designed their systems to deal with millions to hundreds of millions of users pounding their services a day. What he’s written above is really an observation about industry trends and isn’t really meant to attack any technology. I did find it interesting that many took it as an attack on memcached and/or relational databases and came out swinging.

One post which I thought tried to take a balanced approach to rebuttal was Dennis Forbes’ Getting Real about NoSQL and the SQL-Isn't-Scalable Lie where he writes

I work in the financial industry. RDBMS’ and the Structured Query Language (SQL) can be found at the nucleus of most of our solutions. The same was true when I worked in the insurance, telecommunication, and power generation industries. So it piqued my interest when a peer recently forwarded an article titled “The end of SQL and relational databases”, adding the subject line “We’re living in the past”. [Though as Michael Stonebraker points out, SQL the query language actually has remarkably little to actually to do with the debate. It would be more clearly called NoACID]

From a vertical scaling perspective — it’s the easiest and often the most computationally effective way to scale (albeit being very inefficient from a cost perspective) — you have the capacity to deploy your solution on powerful systems with armies of powerful cores, hundreds of GBs of memory, operating against SAN arrays with ranks and ranks of SSDs.

The computational and I/O capacity possible on a single “machine” are positively enormous. The storage system, which is the biggest limiting factor on most database platforms, is ridiculously scalable, especially in the bold new world of SSDs (or flash cards like the FusionIO).

From a horizontal scaling perspective you can partition the data across many machines, ideally configuring each machine in a failover cluster so you have complete redundancy and availability. With Oracle RAC and Sybase ASE you can even add the classic clustering approach. Such a solution — even on a stodgy old RDBMS — is scalable far beyond any real world need because you’ve built a system for a large corporation, deployed in your own datacenter, with few constraints beyond the limits of technology and the platform.

Your solution will cost hundreds of thousands of dollars (if not millions) to deploy, but that isn’t a critical blocking point for most enterprises.This sort of scaling that is at the heart of virtually every bank, trading system, energy platform, retailing system, and so on.

To claim that SQL systems don’t scale, in defiance of such obvious and overwhelming evidence, defies all reason.

There’s lots of good for food for thought in both blog posts. Todd is right that a few large scale websites are moving beyond the horizontal scaling approach that Dennis brought up in his rebuttal based on their experiences. What tends to happen once you’ve built a partitioned/sharded SQL database architecture is that you tend to notice that you’ve given up most of the features of an ACID relational database. You give up the advantages of the relationships by eschewing foreign keys, triggers and joins since these are prohibitively expensive to run across multiple databases. Denormalizing the data means that you give up on Atomicity, Consistency and Isolation when updating or retrieving results. And the end all you have left is that your data is Durable (i.e. it is persistently stored) which isn’t much better than you get from a dumb file system. Well, actually you also get to use SQL as your programming model which is nicer than performing direct file I/O operations.

It is unsurprising that after being at this point for years, some people in our industry have wondered whether it doesn’t make more sense to use data stores that are optimized for the usage patterns of large scale websites instead of gloriously misusing relational databases.  A good example of the tradeoffs is the blog post from the Digg team on why they switched to Cassandra. The database was already sharded which made performing joins to calculate results of queries such as “which of my friends Dugg this item?” to be infeasible. So instead they had to perform two reads from SQL (all Diggs on an item and all of the user’s friends) then perform the intersection operation on the PHP front end code. If the item was not already cached, this leads to disk I/O which could take seconds. To make the situation worse, you actually want to perform this operation multiple times on a single page view since it is reasonable to expect multiple Digg buttons on a page if it has multiple stories on it.

An alternate approach is to denormalize the data and for each user store a list of stories that have been Dugg by at least one of their friends. So whenever I Digg an item, an entry is placed in each of my friends’ lists to indicate that story is now one that has been Dugg by a friend. That way when the a friend of mine shows up, it is a simple lookup to say “is this story ID on the list of stories Dugg by one of their friends?” The challenge here is that it means Digging an item can result in literally thousands of logical write operations. It has been traditionally prohibitively expensive to incur such massive amounts of write I/O in relational databases with all of their transactionality and enforcing of ACID constraints. NoSQL databases like Cassandra which assume your data is denormalized are actually optimized for write I/O heavy operations given the necessity of having to perform enormous amounts of writes to keep data consistent.

Digg’s usage of Cassandra actually serves as a rebuttal to Dennis Forbes’ article since they couldn’t feasibly get what they want with either horizontal or vertical scaling of their relational database-based solution. I would argue that introducing memcached into the mix would have addressed disk I/O concerns because all records of who has Dugg an item could be stored in-memory so comparisons of which of my friends have Dugg an item never have to go to disk to answer any parts of the query. The only caveat with that approach is that RAM is more expensive than disk so you’ll need a lot more servers to store 3 terabytes of data in memory than you would on disk.

However, the programming model is not the only factor one most consider when deciding whether to stay with a sharded/partitioned relational database versus going with a NoSQL solution. The other factor to consider is the actual management of the database servers. The sorts of questions one has to ask when choosing a database solution are listed in the interview with Ryan King of Twitter where he lists the following checklist that they evaluated before deciding to go with Cassandra over MySQL

We first evaluated them on their architectures by asking many questions along the lines of:

  • How will we add new machines?
  • Are their any single points of failure?
  • Do the writes scale as well?
  • How much administration will the system require?
  • If its open source, is there a healthy community?
  • How much time and effort would we have to expend to deploy and integrate it?
  • Does it use technology which we know we can work with?

The problem with database sharding is that it isn’t really a supported out of the box configuration for your traditional relational database product especially the open source ones. How your system deals with new machines being added to the cluster or handles machine failure often requires special case code being written by application developers along with special hand holding by operations teams. Dealing with issues related to database replication (whether it is multi-master or single master) also often takes up unexpected amounts of manpower once sharding is involved.

For these reasons I expect we’ll see more large scale websites decide that instead of treating a SQL database as a denormalized key-value pair store that they would rather use a NoSQL database. However I also suspect that a lot of services who already have a sharded relational database + in-memory cache solution can get a lot of mileage from more judicious usage of in-memory caches before switching. This is especially true given that you still caches in front of your NoSQL databases anyway. There’s also the question of whether traditional relational database vendors will add features to address the shortcomings highlighted by the NoSQL movement? Given that the sort of companies adopting NoSQL are doing so because they want to save costs on software, hardware and operations I somehow doubt that there is a lucrative market here for database vendors versus adding more features that the banks, insurance companies and telcos of the world find interesting.

Note Now Playing: Birdman - Money To Blow (featuring Drake & Lil Wayne Note


 

Categories: Web Development