I should probably start out by pointing out that the title of this post is a lie. By definition, RESTful protocols can not be truly SQL-like because they depend on Uniform Resource Identifiers (URIs aka URLs) for identifying resources. URIs on the Web are really just URLs and URLs are really just hierarchical paths to a particular resource similar to the paths on your local file system (e.g. /users/mark/bobapples
, A:\Temp\car.jpeg
). Fundamentally URIs identify a single resource or aset of resources. On the other hand, SQL is primarily about dealing with relational data which meansyou write queries that span multiple tables (i.e. resources). A syntax for addressing single resources (i.e. URLs/URIs) is fundamentally incompatible with a query language that operates over multiple resources. This was one ofthe primary reasons the W3C created XQuery even though we already had XPath.
That said, being able to perform sorting, filtering, and aggregate operations over a single set of resources via a URI is extremely useful and is a fundamental aspect of the Web today. As Sam Ruby points out in his blog post Etymology, a search results page is fundamentally RESTful even though its URI identifies a query as opposed to a specific resource or set of resources [although you could get meta and say it identifies the set of resources that meet your search criteria].
Both Google's Google Base data API and Microsoft's Project Astoria are RESTful protocols for performing sorting, filtering and aggregate operations similar to what you find in SQL over a hierarchical set of resources. What follows is an overview of the approaches taken by both protocols.
Filtering Results using Predicates (include supported operators and Google's full text option)
Although Astoria provides an abstraction over relational data, it does so in a way that supports the hierarchical nature of HTTP URIs. The primary resource also known as an entity set is placed at the root of the hierarchy (e.g. the set of allmy customers) and each relationship to another set of resources is treated as anotherlevel in the hierarchy (e.g. each customer's orders). Each step in the hierarchy can be filtered using a predicate. Below are some query URLs and the results they return
Google Base does not treat the data within it as a hierarchy. Instead filters/predicates can be applied to one of
two Atom feeds;
http://www.google.com/base/feeds/snippets and
http://www.google.com/base/feeds/items which represent all the items within
Google Base and all the items
a specific user has stored within
Google Base respectively. The latter
URL requires the HTTP request to be authenticated.
The first way one can filter results from a Google Base feed is by placing one or more categories as part of the
path component. For example
The second way is to filter results in a Google Base feed is by performing a full text query using the
q
query parameter. For example,
The final way to filter results from
Google Base feed is by applying
a predicate on a field of the item using the bq
query
parameter. For example
Supported Datatypes, Operators and Functions
As can be seen from the previous examples, both the Google Base data API and Astoria support operations on fields searching for string matches.
Astoria supports the major SQL
datatypes, a list of which can be obtained from the table describing the
System.Data.SqlTypes namespace in the .NET Frameworks. The operations that
can be performed on the various fields of an entity are the following
comparisons
Operator | Description |
eq | Equal |
ne | Not equal |
gt | Greater than |
gteq | Greater than or equal |
lt | Less than |
lteq | Less than or equal |
The list of datatypes supported by
Google Base is provided in the
Google Base data API documentation topic on
Attribute Types. In addition to the comparison operators supported by
Astoria, the
Google Base data API also
supports
Operator | Description |
@"..." + Xmi | Convert string in quotes to a geocoded location and match anything that is within a radius of X miles/kilometers/meters around it depending on the unit of distance specified |
name(type):X..Y | Test whether the value of the named attribute
[and optional type] falls between X and Y (e.g. [event date range:2007-05-20..2007-05-25] matches all events which fall between both dates) |
date-range << date-range | Test if the date on the right hand side is a subset of the date range on the left hand
side |
if boolean-expression then expression else expression | Works like an if...else statement in every programming language you've ever used. |
In addition to these operators, it turns out that the
Google Base data API
also support a
full blown
expression language for use within predicates. This includes a library
of over 20 functions from math functions like sin
and
cos
to aggregation functions like sum
and
count
as well as more esoteric functions like dist
,
exists
and join
. Below are some queries
which use these operators and functions in action
Sorting
Sorting query results is often necessary when working with large amounts of
data. Both Google Base data API
and Astoria provide a way to
indicate that the results should be sorted based on one or more fields. In
Astoria, sorting is done using
the $orderby
query parameter. For example,
The Google Base data API uses
the orderby
and sortorder
query parameters to
control sorting and sort order respectively. Examples are shown below
Paging
When dealing with large numbers of items, it often isn't feasible to return all
of them in a single XML document for a variety of reasons.
Both the Google Base data API
and Astoria provide mechanisms
to retrieve results as multiple "pages".
In Astoria, this is done
using a combination of the top
and skip query
parameters which indicate the number of items to return and what item to start
the list from respectively. Examples below
The Google Base data API uses
the max-results
and start-index query
parameters to indicate the number of items to return and what item to start
the list from respectively. The default value of max-results
is 25 while its maximum value is 250. The total number of results is
emitted in the returned feed as the element
openSearch:totalResults
. Examples below
Astoria Specific Features
Using links within items the describe relationships is a core aspect of a
RESTful protocol and is utilized by
Astoria to show the foreign key
relationships between rows/entities in the data base. However it can be
cumbersome to have to make multiple requests and follow every link to get
all the content related to an item. For this reason,
Astoria includes the
$expand
query parameter which automatically follows the links
and retrieves the XML inline. Compare the following queries
GData Specific Features
Google Base has a notion of adjusted query results. When this feature is enabled,
Google Base will automatically use spelling correction, stemming and other
tricks to try and match results. For example, if you perform a search for the
value "female" in the gender
field of an item, the query
adjustment engine will know to also match the value "f" in the
gender
field of any corresponding items. The query adjustment
engine applies its heuristics on queries for field names, field values and
item categories. The documentation is contradictory as to whether this feature
is enabled by default or has to be specifically enabled by the user of the
API.
Another interesting feature, is that the
Google Base data API
allows one to filter out repetitive results using a feature called
"crowding". With this feature, limits can be placed on how many results
that match a certain criteria should be returned. See the following
examples for details
NOTE: I actually couldn't get this feature to work using either the
example queries from the documentation or queries I constructed. It is quite
possible that this feature doesn't work but is so esoteric that no one has
noticed.
Conclusion
In comparing both approaches there is a lot to like and dislike. I like the
"expand" feature in Astoria as
well as the fact that I can retrieve XML results from multiple paths of the
hierarchy. However there does seem to be a paucity of operators and functions
for better filtering of results.
From the Google Base data API,
I love the "crowd" feature and having a full library of functions for
performing tests within predicates. Also some of the operators such as the
ones for finding results near a certain location are quite impressive
although unnecessary for the majority of RESTful protocols out there.
That said, I do think they went overboard on some of the features such as
having if...else
blocks within the URIs. I suspect that some of
that complexity wouldn't have been needed if they just had hierarchies instead
of a flat namespace that requires complex filtering to get anything out of it.