Dare Obasanjo's weblog

December 10, 2009

@ 01:33 AM

When is a privacy feature not a privacy feature?

From the Facebook blog post Updates on Your New Privacy Tools

Can I limit access to my Friend List?

Many of you have mentioned that you want a way to hide your list of friends. In response to your feedback, we've removed the "View Friends" link from search results, making your Friend List less visible on the site.

In addition, you can further limit the visibility of your Friend List to other people on Facebook if you want. After you've completed the transition to the new privacy settings, you'll be able to click on the pencil icon in the top-right corner of the "Friends" box on your profile. Unchecking "Show my friends on my profile" will prevent your Friend List from appearing in your profile when it is viewed by people who are logged in to Facebook. Keep in mind, however, that because Friend List is publicly available, it will be visible to people who are viewing your profile while not logged in. Again, you will only have this option once you've completed the transition to the new privacy settings.

Remember, you can also limit who can find you in searches on Facebook and control whether your information can be indexed by public search engines under "Search" on the Privacy Settings page.

That's awesome. I didn't realize when I joined Facebook that the service would retroactively decide that my list of friends was public knowledge and then would add a privacy setting to "hide" it from Facebook users that could be worked around by logging out. Join me as I say goodbye to my old privacy settings and old public version of my Facebook profile which kept my private information private.

R.I.P. Old Facebook Privacy settings

R.I.P. Old Facebook Public Profile

Categories: Social Software

November 23, 2009

@ 04:24 PM

Comments [6]

The Many Flaws of Twitter's Retweet Feature

I've been a Twitter user for almost two years now and I have always been impressed by the emergent behavior that has developed from simply giving people a text box with 140 character limit. The folks at Twitter have also done a good job of noticing some these emergent behaviors and making them formal features of the site. Both hashtags and @replies are examples of emergent community conventions in authoring tweets that are now formal features of the site.

Twitter recently added retweets to this list with Project Retweet. After using this feature for a few days I've found that unlike hashtags and @replies, the way this feature has been integrated into the Twitter experience is deeply flawed. Before I talking about the problems with Project Retweet, I should talk about how the community uses retweeting today.

Retweeting 101: What is it and why do people do it?

Retweeting is akin to the practice of forwarding along interesting blog posts and links to your friends via email. A retweet repeats the content of a person's tweet (sometimes edited for brevity) along with a reference to the user who is being retweeted. Often times people also add some commentary to the retweets. Examples of both styles of retweets are shown below.

Figure 1: Retweet without commentary

Figure 2: Retweet with added comment

Unlike hashtags and @replies, the community conventions aren't as consistent with retweets. Below are two examples of retweets from my home page which use different prefixes and separators from the one above to indicate the item is a retweet and the user's comment respectively.

Figure 3: Different conventions in retweeting

However there are many issues with retweeting not being a formal feature of Twitter. For one, it is often hard for new users to figure out what's going on when they see people posting updates prefixed with strange symbols and abbreviations. Another problem is that users who want to post a retweet now have to deal with the fact that the original tweet may have taken up all or most of the 140 character limit so there may be little room to credit the author let alone add commentary.

Thus I was looking forward to retweeting becoming a formal feature of Twitter so that these problems would be addressed. Unfortunately, while one of these problems was fixed more problems were introduced.

Flaw #1: Need to visit multiple places to see all retweets of your content

Before the introduction of the retweet feature, users could go to http://www.twitter.com/replies to see all posts that reference their name which would include @replies and retweets. The new Twitter features fragments this in an inconsistent manner.

Figure 4: Current Twitter sidebar

Now users have to visit http://www.twitter.com/replies to see people who has retweeted their posts using community conventions (i.e. copy and pasting then prefixing "RT" to a tweet) and then visit http://twitter.com/retweeted_of_mine to see who has retweeted their posts by clicking the Retweet link in the Twitter web user interface. There will be different people in both lists.

Figure 5: Retweets in the Replies/Mentions page

Figure 6: Retweets on the "Your tweets, retweeted" page

It is surprising to me that Twitter didn't at least include posts that start with RT followed by your username in http://twitter.com/retweeted_of_mine as well.

Flaw #2: No way to add commentary on what you are retweeting

As I mentioned earlier, it is fairly common for people to retweet a status update and then add their own commentary. The retweet feature built into Twitter ignores this common usage pattern and provides no option to add your own commentary.

Figure 7: The Retweet prompt

This omission is particularly problematic if you disagree with what you are sharing and want to clarify to your followers that although you find the tweet interesting you aren't endorsing the opinion.

Flaw #3: Retweets don't show up in Twitter apps

One of the other surprising changes is that Twitter retweets have been introduced in a backwards-incompatible manner into the API. This means that retweets created using the Twitter retweet button do not show up in 3rd party applications that use the Twitter API. See below for an example of what I see in Echofon versus the Twitter web experience and notice the missing tweet.

Figure 8: Twitter website showing a retweet

Figure 9: The retweet is missing in Echofon

Again, I find this surprising since it would have been straightforward to keep retweets in the API and exposing them as if they were regular old school retweets prefixed with "RT".

Flaw #4: Pictures of people I don't know in my stream

The last major problem with the Twitter retweet feature is that it breaks user expectation of the stream. Until this feature shipped, users could rest assured that the only content they saw in their stream was content they had explicitly asked for by subscribing to a user. Thus when you see someone in your stream the person's user name and avatar are familiar to you.

With the new retweet feature, the Twitter team has decided to highlight the person being retweeted and treat the person who I've subscribed to that did the retweeting as an afterthought. Not only does this confuse users at first (who is this person showing up in my feed and why?) but it also assumes that the content being retweeted is more important than who did the retweeting. This is an unfortunate assumption since in many cases the person who did the retweeting adds all the context.

Note Now Playing: Jason Derulo - Whatcha Say Note

Categories: Rants | Social Software

November 23, 2009

@ 02:46 PM

Comments [2]

Building Scalable Databases: Perspectives on the War on Soft Deletes

In the past few months I've noticed an increased number of posts questioning practices around deleting and "virtually" deleting data from databases. Since some of the concerns around this practice have to do with the impact of soft deletes on scalability of a database-based application, I thought it would be a good topic for my ongoing series on building scalable databases.

Soft Deletes 101: What is a soft delete and how does it differ from a hard delete?

Soft deleting an item from a database means that the row or entity is marked as deleted but not physically removed from the database. Instead it is hidden from normal users of the system but may be accessible by database or system administrators.

For example, let's consider this sample database of XBox 360 games I own

Name	Category	ESRB	GamespotScore	Company
Call of Duty: Modern Warfare 2	First Person Shooter	Mature	9.0	Infinity Ward
Batman: Arkham Asylum	Fantasy Action Adventure	Teen	9.0	Rocksteady Studios
Gears of War 2	Sci-Fi Shooter	Mature	9.0	Epic Games
Call of Duty 4: Modern Warfare	First Person Shooter	Mature	9.0	Infinity Ward
Soul Calibur IV	3D Fighting	Teen	8.5	Namco

Now consider what happens if I decide that I'm done with Call of Duty 4: Modern Warfare now that I own Call of Duty: Modern Warfare 2. The expected thing to do would then be to remove the entry from my database using a query such as

DELETE FROM games WHERE name='Call of Duty 4: Modern Warfare';

This is what is considered a "hard" delete.

But then what happens if my friends decide to use my list of games to decide which games to get me for Christmas? A friend might not realize I'd previously owned the game and might get it for me again. Thus it might be preferable if instead of deleting items from the database they were removed from consideration as games I currently own but still could be retrieved in special situations. To address this scenario I'd add an IsDeleted column as shown below

Name	Category	ESRB	GamespotScore	Company	IsDeleted
Call of Duty: Modern Warfare 2	First Person Shooter	Mature	9.0	Infinity Ward	False
Batman: Arkham Asylum	Fantasy Action Adventure	Teen	9.0	Rocksteady Studios	False
Gears of War 2	Sci-Fi Shooter	Mature	9.0	Epic Games	False
Call of Duty 4: Modern Warfare	First Person Shooter	Mature	9.0	Infinity Ward	True
Soul Calibur IV	3D Fighting	Teen	8.5	Namco	False

Then for typical uses an application would interact with the following view of the underlying table

CREATE VIEW current_games AS SELECT Name, Category, ESRB, GameSpotScore, Company FROM games WHERE IsDeleted=False;

but when my friends ask me for a list of all of the games I have, I can provide the full list of all the games I've ever owned from the original games table if needed. Now that we understand how one would use soft deletes we can discuss the arguments against this practice.

Rationale for War: The argument against soft deletes

Ayende Rahien makes a cogent argument against soft deletes in his post Avoid Soft Deletes where he writes

One of the annoyances that we have to deal when building enterprise applications is the requirement that no data shall be lost. The usual response to that is to introduce a WasDeleted or an IsActive column in the database and implement deletes as an update that would set that flag.

Simple, easy to understand, quick to implement and explain.

It is also, quite often, wrong.

The problem is that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues.
...
Let us say that we want to delete an order. What should we do? That is a business decision, actually. But it is one that is enforced by the DB itself, keeping the data integrity.

When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.

Ayende is right that adding an IsDeleted flag mean that you can no longer take advantage of database triggers for use when cleaning up database state when a deletion occurs. This sort of cleanup now has to moved up into the application layer.

There is another set of arguments against soft deletes in Richard Dingwall's post entitled The Trouble with Soft Delete where he points out the following problems

Complexity

To prevent mixing active and inactive data in results, all queries must be made aware of the soft delete columns so they can explicitly exclude them. It’s like a tax; a mandatory WHERE clause to ensure you don’t return any deleted rows.

This extra WHERE clause is similar to checking return codes in programming languages that don’t throw exceptions (like C). It’s very simple to do, but if you forget to do it in even one place, bugs can creep in very fast. And it is background noise that detracts away from the real intention of the query.

Performance

At first glance you might think evaluating soft delete columns in every query would have a noticeable impact on performance. However, I’ve found that most RDBMSs are actually pretty good at recognizing soft delete columns (probably because they are so commonly used) and does a good job at optimizing queries that use them. In practice, filtering inactive rows doesn’t cost too much in itself.

Instead, the performance hit comes simply from the volume of data that builds up when you don’t bother clearing old rows. For example, we have a table in a system at work that records an organisations day-to-day tasks: pending, planned, and completed. It has around five million rows in total, but of that, only a very small percentage (2%) are still active and interesting to the application. The rest are all historical; rarely used and kept only to maintain foreign key integrity and for reporting purposes.

Interestingly, the biggest problem we have with this table is not slow read performance but writes. Due to its high use, we index the table heavily to improve query performance. But with the number of rows in the table, it takes so long to update these indexes that the application frequently times out waiting for DML commands to finish.

These arguments seem less valid than Ayende's especially when the alternatives proposed are evaluated. Let's look at the aforementioned problems and the proposed alternatives in turn.

Trading the devil you know for the devil you don't: Thoughts on the alternatives to soft deletes

Richard Dingwall argues that soft deletes add unnecessary complexity to the system since all queries have to be aware of the IsDeleted column(s) in the database. As I mentioned in my initial description of soft deletes this definitely does not have to be the case. The database administrator can create views which the core application logic interacts with (i.e. the current_games table in my example) so that only a small subset of system procedures need to actually know that the soft deleted columns even still exist in the database.

A database becoming so large that data manipulation becomes slow due to having to update indexes is a valid problem. However Richard Dingwall's suggested alternative excerpted below seems to trade one problem for a worse one

The memento pattern

Soft delete only supports undoing deletes, but the memento pattern provides a standard means of handling all undo scenarios your application might require.

It works by taking a snapshot of an item just before a change is made, and putting it aside in a separate store, in case a user wants to restore or rollback later. For example, in a job board application, you might have two tables: one transactional for live jobs, and an undo log that stores snapshots of jobs at previous points in time:

The problem I have with this solution is that if your database is already grinding to a halt simply because you track which items are active/inactive in your database, how much worse would the situation be if you now store every state transition in the database as well? Sounds like you're trading one performance problem for a much worse one.

The real problem seems to be that the database has gotten too big to be operated on in an efficient manner on a single machine. The best way to address this is to partition or shard the database. In fact, you could even choose to store all inactive records on one database server and all active records on another. Those interested in database sharding can take a look at a more detailed discussion on database sharding I wrote earlier this year.

Another alternative proposed by both Ayende Rahien and Richard Dingwall is to delete the data but use database triggers to write to an audit log in the cases where auditing is the primary use case for keeping soft deleted entries in the database. This works in the cases where the only reason for soft deleting entries is for auditing purposes. However there are many real world situations where this is not the case.

One use case for soft deleting is to provide an "undo" feature in an end user application. For example, consider a user synchronizes the contact list on their phone with one in the cloud (e.g. an iPhone or Windows Mobile/Windows Phone connecting to Exchange or an Android phone connecting to Gmail). Imagine that the user now deletes a contact from their phone because they do not have a phone number for the person only to find out that person has also been deleted from their address book in the cloud. At that point, an undo feature is desirable.

Other use cases could be the need to reactivate items that have been removed from the database but with their state intact. For example, when people return to Microsoft who used to work there in the past their seniority for certain perks takes into account their previous stints at the company. Similarly, you can imagine a company restocking an item that they had pulled from their shelves because they have become popular due to some new fad (e.g. Beatles memorabilia is back in style thanks to The Beatles™: Rock Band™).

The bottom line is that an audit log may be a useful replacement for soft deletes in some scenarios but it isn't the answer to every situation where soft deletes are typically used.

Not so fast: The argument against hard deletes

So far we haven't discussed how hard deletes should fit in a world of soft deletes. In some cases, soft deletes eventually lead to hard deletes. In the example of video games I've owned I might decide that if a soft deleted item is several years old or is a game from an outdated console then it might be OK to delete. So I'd create a janitor process that would scan the database periodically to seek out soft deleted entries to permanently delete. In other cases, some content may always be hard deleted since there are no situations where one might consider keeping them around for posterity. An example of the latter is comment or trackback spam on a blog post.

Udi Dahan wrote a rebuttal to Ayende Rahien's post where he question my assertion above that there are situations where one wants to hard delete data from the database in his post Don’t Delete – Just Don’t where he writes

Model the task, not the data

Looking back at the story our friend from marketing told us, his intent is to discontinue the product – not to delete it in any technical sense of the word. As such, we probably should provide a more explicit representation of this task in the user interface than just selecting a row in some grid and clicking the ‘delete’ button (and “Are you sure?” isn’t it).

As we broaden our perspective to more parts of the system, we see this same pattern repeating:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

Statuses

In all the examples above, what we see is a replacement of the technical action ‘delete’ with a relevant business action. At the entity level, instead of having a (hidden) technical WasDeleted status, we see an explicit business status that users need to be aware of.

I tend to agree with Udi Dahan's recommendation. Instead of a technical flag like IsDeleted, we should model the business process. So my database table of games I owned should really be called games_I_have_owned with the IsDeleted column replaced with something more appropriate such as CurrentlyOwn. This is a much better model of the real-life situation than my initial table and the soft deleted entries are now clearly part of the business process as opposed to being part of some internal system book keeping system.

Advocating that items be never deleted is a tad extreme but I'd actually lean closer to that extreme than most. Unless the data is clearly worthless (e.g. comment spam) or the cost is truly prohibitive (e.g. you're storing large amounts of binary data) then I'd recommend keeping the information around instead of assuming the existence of a DELETE clause in your database is a requirement that you use it.

Note Now Playing: 50 Cent - Baby By Me (feat. Ne-Yo) Note

Categories: Web Development

November 14, 2009

@ 03:03 PM

Comments [2]

Joe Hewitt on Irony

Joe Hewitt, the developer of the Facebook iPhone application, has an insightful blog post on the current trend of developers favoring native applications over Web applications on mobile platforms with centrally controlled app stores in his post On Middle Men. He writes

The Internet has been incredibly empowering to creators, and just as destructive to middle men. In the 20th century, every musician needed a record label to get his or her music heard. Every author needed a publishing house to be read. Every journalist needed a newspaper. Anyone who wanted to send a message needed the post office. In the Internet age, the tail no longer wags the dog, and those middle men have become a luxury, not a necessity.

Meanwhile, the software industry is moving in the opposite direction. With the web and desktop operating systems, the only thing in between software developers and users is a mesh of cables and protocols. In the new world of mobile apps, a layer of bureacrats stand in the middle, forcing each developer to queue up for a series of patdowns and metal detectors and strip searches before they can reach their customers.
...
We're at a critical juncture in the evolution of software. The web is still here and it is still strong. Anyone can still put any information or applications on a web server without asking for permission, and anyone in the world can still access it just by typing a URL. I don't think I appreciated how important that is until recently. Nobody designs new systems like that anymore, or at least few of them succeed. What an incredible stroke of luck the web was, and what a shame it would be to let that freedom slip away.

Am I the only one who thinks the above excerpt would be similarly apt if you replaced the phrase "mobile apps" with "Facebook apps" or "OpenSocial apps"?

Note Now Playing: Lady GaGa - Bad Romance Note

Categories: Web Development

November 7, 2009

@ 04:41 PM

Comments [1]

Does OAuth Have a Dark Side?

There was an article in the The Register earlier this week titled Twitter fanatic glimpses dark side of OAuth which contains the following excerpt

A mobile enthusiast and professional internet strategist got a glimpse of OAuth's dark side recently when he received an urgent advisory from Twitter. The dispatch, generated when Terence Eden tried to log in, said his Twitter account may have been compromised and advised he change his password. After making sure the alert was legitimate, he complied.

That should have been the end of it, but it wasn't. It turns out Eden used OAuth to seamlessly pass content between third-party websites and Twitter, and even after he had changed his Twitter password, OAuth continued to allow those websites access to his account.
…
Eden alternately describes this as a "gaping security hole" and a "usability issue which has strong security implications." Whatever the case, the responsibility seems to lie with Twitter.

If the service is concerned enough to advise a user to change his password, you'd think it would take the added trouble of suggesting he also reset his OAuth credentials, as Google, which on Wednesday said it was opening its own services to work with OAuth, notes here.

I don't think the situation is as cut and dried as the article makes it seem. Someone trying to hack your account by guessing your password and thus triggering a warning that your account is being hacked is completely different from an application you've given permission to access your data doing the wrong thing with it.

Think about it. Below is a list of the applications I've allowed to access my Twitter stream. Is it really the desired user experience that when I change my password on Twitter that all of them break and require that I re-authorize each application?

list of applications that can access my Twitter stream

I suspect Terence Eden is being influenced by the fact that Twitter hasn't always had a delegated authorization model and the way to give applications access to your Twitter stream was to handout your user name & password. That's why just a few months ago it was commonplace to see blog posts like Why you should change your Twitter password NOW! which advocate changing your Twitter password as a way to prevent your password from being stolen from Twitter apps with weak security. There have also been a spate of phishing style Twitter applications such as TwitViewer and TwitterCut which masquerade as legitimate applications but are really password stealers in disguise. Again, the recommendation has been to change your password if you fell victim to such scams.

In a world where you use delegated authorization techniques like OAuth, changing your password isn't necessary to keep out apps like TwitViewer & TwitterCut since you can simply revoke their permission to access your account. Similarly if someone steals my password and I choose a new one, it doesn't mean that I now need to lose Twitter access from Brizzly or the new MSN home page until I reauthorize these apps. Those issues are ~~orthogonal~~ unrelated given the OAuth authorized apps didn't have my password in the first place.

Thus I question the recommendation at the end of the article that it is a good idea to even ask users about de-authorizing applications as part of password reset since it is misleading (i.e. it gives the impression the hack attempt was from one of your authorized apps instead of someone trying to guess your password) and just causes a hassle for the user who now has to reauthorize all the applications at a later date anyway.

Note Now Playing: Rascal Flatts - What Hurts The Most Note

Categories: Web Development

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Dare Obasanjo's weblog

R.I.P. Old Facebook Privacy settings

R.I.P. Old Facebook Public Profile

Retweeting 101: What is it and why do people do it?

Flaw #1: Need to visit multiple places to see all retweets of your content

Flaw #2: No way to add commentary on what you are retweeting

Flaw #3: Retweets don't show up in Twitter apps

Flaw #4: Pictures of people I don't know in my stream

Soft Deletes 101: What is a soft delete and how does it differ from a hard delete?

Rationale for War: The argument against soft deletes

Complexity

Performance

Trading the devil you know for the devil you don't: Thoughts on the alternatives to soft deletes

The memento pattern

Not so fast: The argument against hard deletes

Model the task, not the data

Statuses