I should probably start out by pointing out that the title of this post is a lie. By definition, RESTful protocols can not be truly SQL-like because they depend on Uniform Resource Identifiers (URIs aka URLs) for identifying resources. URIs on the Web are really just URLs and URLs are really just hierarchical paths to a particular resource similar to the paths on your local file system (e.g. /users/mark/bobapples, A:\Temp\car.jpeg). Fundamentally URIs identify a single resource or aset of resources. On the other hand, SQL is primarily about dealing with relational data which meansyou write queries that span multiple tables (i.e. resources). A syntax for addressing single resources (i.e. URLs/URIs) is fundamentally incompatible with a query language that operates over multiple resources. This was one ofthe primary reasons the W3C created XQuery even though we already had XPath.
/users/mark/bobapples
A:\Temp\car.jpeg
That said, being able to perform sorting, filtering, and aggregate operations over a single set of resources via a URI is extremely useful and is a fundamental aspect of the Web today. As Sam Ruby points out in his blog post Etymology, a search results page is fundamentally RESTful even though its URI identifies a query as opposed to a specific resource or set of resources [although you could get meta and say it identifies the set of resources that meet your search criteria].
Both Google's Google Base data API and Microsoft's Project Astoria are RESTful protocols for performing sorting, filtering and aggregate operations similar to what you find in SQL over a hierarchical set of resources. What follows is an overview of the approaches taken by both protocols.
Although Astoria provides an abstraction over relational data, it does so in a way that supports the hierarchical nature of HTTP URIs. The primary resource also known as an entity set is placed at the root of the hierarchy (e.g. the set of allmy customers) and each relationship to another set of resources is treated as anotherlevel in the hierarchy (e.g. each customer's orders). Each step in the hierarchy can be filtered using a predicate. Below are some query URLs and the results they return
Query:http://astoria.sandbox.live.com/encarta/encarta.rse/AreasResults:All content areas in the Encarta online encyclopediaQuery:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas[name eq 'Geography']Results:The content area whose name is 'Geography'Query:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas[name eq 'Geography']/ArticlesResults:All articles for the content area whose name is 'Geography'Query:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas[name eq 'Geography']/Articles[Title eq 'Zimbabwe']Results:The article with the title 'Zimbabwe' from the 'Geography' content areaQuery:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas[name eq 'Geography']/Articles[761569370]Results:The article from the 'Geography' content area with the ID 761569370. Query:http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[City eq 'London']/Orders[Freight gteq 100]Results:All orders shipped to customers from the city of London who paid $100 or more in freight charges
Google Base does not treat the data within it as a hierarchy. Instead filters/predicates can be applied to one of two Atom feeds; http://www.google.com/base/feeds/snippets and http://www.google.com/base/feeds/items which represent all the items within Google Base and all the items a specific user has stored within Google Base respectively. The latter URL requires the HTTP request to be authenticated.
The first way one can filter results from a Google Base feed is by placing one or more categories as part of the path component. For example
Query:http://www.google.com/base/feeds/snippets/-/hotelsResults:All items from the 'hotels' category within Google Base Query:http://www.google.com/base/feeds/snippets/-/jobs|personalsResults:All items from the 'jobs' or the 'personals' category within Google Base Query:http://www.google.com/base/feeds/snippets/-/-recipesResults:All items in Google Base except those from the 'recipes' category
The second way is to filter results in a Google Base feed is by performing a full text query using the q query parameter. For example,
q
Query:http://www.google.com/base/feeds/snippets?q=atlantaResults:All items within Google Base that contain the string 'atlanta' in one of their fields Query:http://www.google.com/base/feeds/snippets/-/hotels?q=atlantaResults:All items from the 'hotels' category within Google Base that contain the string 'atlanta' in any of their fields Query:http://www.google.com/base/feeds/snippets/-/hotels|housing?q=seattle|atlantaResults:All items from the 'hotels' or 'housing' categories within Google Base that contain the string 'seattle' or 'atlanta' in any of their fields Query:http://www.google.com/base/feeds/snippets/-/hotels?q=washington+-dcResults:All items from the 'hotels' category within Google Base that contain the string 'washington' but not the string 'dc'
The final way to filter results from Google Base feed is by applying a predicate on a field of the item using the bq query parameter. For example
bq
Query:http://www.google.com/base/feeds/snippets/-/hotels?bq=[location:seattle]Results:All items from the 'hotels' category that have 'seattle' in their location field Query:http://www.google.com/base/feeds/snippets/-/hotels?bq=[location:seattle]&q=ramadaResults:All items from the 'hotels' category that have 'seattle' in their location field and 'ramada' in any of their other fields Query:http://www.google.com/base/feeds/snippets/-/-hotels?bq=[location:@"1 Microsoft Way, Redmond, WA, USA" + 5mi]Results:All items from the 'hotels' category whose location is within 5 miles of "1 Microsoft Way, Redmond, WA, USA" Query:http://www.google.com/base/feeds/snippets/-/products?q=zune&bq=[price(float USD)>=250.0 USD]Results:All items from the 'products' category whose price is greater than $250.00 and have 'zune' in one of their fields
As can be seen from the previous examples, both the Google Base data API and Astoria support operations on fields searching for string matches. Astoria supports the major SQL datatypes, a list of which can be obtained from the table describing the System.Data.SqlTypes namespace in the .NET Frameworks. The operations that can be performed on the various fields of an entity are the following comparisons
The list of datatypes supported by Google Base is provided in the Google Base data API documentation topic on Attribute Types. In addition to the comparison operators supported by Astoria, the Google Base data API also supports
In addition to these operators, it turns out that the Google Base data API also support a full blown expression language for use within predicates. This includes a library of over 20 functions from math functions like sin and cos to aggregation functions like sum and count as well as more esoteric functions like dist, exists and join. Below are some queries which use these operators and functions in action
sin
cos
sum
count
dist
exists
join
Query:http://www.google.com/base/feeds/snippets?q=sale&bq=[item type:vehicles][location(location)] &orderby=[x=location(location):neg(min(dist(x,@'Seattle,WA')))]Results:All vehicles whose listing contain the text 'sale' and orders results by those that are geographically closest to the city of Seattle, WA Query:http://www.google.com/base/feeds/snippets/-/events?bq=[event date range:2007-05-20..2007-05-25]Results:All events that fall between May 20th 2007 and May 25th 2007
Sorting query results is often necessary when working with large amounts of data. Both Google Base data API and Astoria provide a way to indicate that the results should be sorted based on one or more fields. In Astoria, sorting is done using the $orderby query parameter. For example,
$orderby
Query:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas?$orderby=NameResults:All areas in the Encarta encyclopedia sorted alphabetically by their Name Query:http://astoria.sandbox.live.com/northwind/northwind.rse/Orders?$orderby=OrderDate descResults:All customer orders sorted in descending order by order date Query:http://astoria.sandbox.live.com/northwind/northwind.rse/Orders?$orderby=RequiredDate,FreightResults:All customer orders sorted by the required date and the cost of freight
The Google Base data API uses the orderby and sortorder query parameters to control sorting and sort order respectively. Examples are shown below
orderby
sortorder
Query:http://www.google.com/base/feeds/snippets/-/jobs?q=program+manager&orderby=salary(int)Results:All job listings containing the string 'program manager' sorted by the salary field Query:http://www.google.com/base/feeds/snippets?q=sale&bq=[item type:vehicles][location(location)] &orderby=[x=location(location):neg(min(dist(x,@'Cupertino,CA')))]Results:All vehicles whose listing contain the text 'sale' and orders results by those that are geographically closest to the city of Seattle, WA Query:http://www.google.com/base/feeds/snippets/-/housing?bq=[location:@"1 Microsoft Way, Redmond, WA, USA" + 5mi]&orderby=[x=bedrooms(int): if exists(x) then max(x) else 0]Results:All items within the 'housing' category that are within 5 miles of Microsoft's headquarters sorted by number of bedrooms. For items that don't have a bedrooms element use the value 0 when sorting
bedrooms
When dealing with large numbers of items, it often isn't feasible to return all of them in a single XML document for a variety of reasons. Both the Google Base data API and Astoria provide mechanisms to retrieve results as multiple "pages".
In Astoria, this is done using a combination of the top and skip query parameters which indicate the number of items to return and what item to start the list from respectively. Examples below
top
Query:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas?$orderby=Name&$top=3Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name, restricted to only showing 3 items per page Query:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas?$orderby=Name&$top=3&$skip=1Results:All areas in the Encarta encyclopedia sorted alphabetically by their Name starting from the second item, restricted to only showing 3 items per page
The Google Base data API uses the max-results and start-index query parameters to indicate the number of items to return and what item to start the list from respectively. The default value of max-results is 25 while its maximum value is 250. The total number of results is emitted in the returned feed as the element openSearch:totalResults. Examples below
max-results
openSearch:totalResults
Query:http://www.google.com/base/feeds/snippets/-/hotels?bq=[location:seattle]&max-results=10Results:All hotels within the seattle area, restricted to 10 results per page Query:http://www.google.com/base/feeds/snippets/-/hotels?bq=[location:seattle]&max-results=50&start-index=100Results:All hotels within the seattle area, restricted to 50 results per page starting from the hundredth result
Using links within items the describe relationships is a core aspect of a RESTful protocol and is utilized by Astoria to show the foreign key relationships between rows/entities in the data base. However it can be cumbersome to have to make multiple requests and follow every link to get all the content related to an item. For this reason, Astoria includes the $expand query parameter which automatically follows the links and retrieves the XML inline. Compare the following queries
$expand
Query:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas[Name eq 'History']/Articles[Title eq 'Civil War, American']Results:The encyclopedia article on the American Civil War which has links to its Area, ArticleBody, Notes and RelatedArticles Query:http://astoria.sandbox.live.com/encarta/encarta.rse/Areas[Name eq 'History']/Articles[Title eq 'Civil War, American']?$expand=ArticleBody,Notes,RelatedArticlesResults:The encyclopedia article on the American Civil War with its Area, ArticleBody and Notes shown inline as XML elements
Google Base has a notion of adjusted query results. When this feature is enabled, Google Base will automatically use spelling correction, stemming and other tricks to try and match results. For example, if you perform a search for the value "female" in the gender field of an item, the query adjustment engine will know to also match the value "f" in the gender field of any corresponding items. The query adjustment engine applies its heuristics on queries for field names, field values and item categories. The documentation is contradictory as to whether this feature is enabled by default or has to be specifically enabled by the user of the API.
gender
Another interesting feature, is that the Google Base data API allows one to filter out repetitive results using a feature called "crowding". With this feature, limits can be placed on how many results that match a certain criteria should be returned. See the following examples for details
Query:http://www.google.com/base/feeds/snippets/-/restaurants?crowdby=cuisine(text):2Results:Return all restuarants stored within Google Base but show no more than 2 per cusine type
In comparing both approaches there is a lot to like and dislike. I like the "expand" feature in Astoria as well as the fact that I can retrieve XML results from multiple paths of the hierarchy. However there does seem to be a paucity of operators and functions for better filtering of results.
From the Google Base data API, I love the "crowd" feature and having a full library of functions for performing tests within predicates. Also some of the operators such as the ones for finding results near a certain location are quite impressive although unnecessary for the majority of RESTful protocols out there. That said, I do think they went overboard on some of the features such as having if...else blocks within the URIs. I suspect that some of that complexity wouldn't have been needed if they just had hierarchies instead of a flat namespace that requires complex filtering to get anything out of it.
if...else