Last week TechCrunch UK wrote about a search startup that utilizes AI/Semantic Web techniques named True Knowledge. The post entitled VCs price True Knowledge at £20m pre-money. Is this the UK’s Powerset? stated
The chatter I’m hearing is that True Knowledge is being talked about in hushed tones, as if it might be the Powerset of the UK. To put that in context, Google has tried to buy the Silicon Valley search startup several times, and they have only launched a showcase product, not even a real one. However, although True Knowledge and Powerset are similar, they are different in significant ways, more of which later....Currently in private beta, True Knowledge says their product is capable of intelligently answering - in plain English - questions posed on any topic. Ask it if Ben Affleck is married and it will come back with "Yes" rather than lots of web pages which may or may not have the answer (don’t ask me!)....Here’s why the difference matters. True Knowledge can infer answers that the system hasn’t seen. Inferences are created by combining different bits of data together. So for instance, without knowing the answer it can work out how tall the Eiffel Tower is by inferring that it is shorter that the Empire State Building but higher than St Pauls Cathedral....AI software developer and entrepreneur William Tunstall-Pedoe is the founder of True Knowledge. He previously developed a technology that can solve a commercially published crossword clues but also explain how the clues work in plain English. See the connection?
The scenarios described in the TechCrunch write up should sound familiar to anyone who has spent any time around fans of the Semantic Web. Creating intelligent agents that can interrogate structured data on the Web and infer new knowledge has turned out to be easier said than done because for the most part content on the Web isn't organized according to the structure of the data. This is primarily due to the fact that HTML is a presentational language. Of course, even if information on the Web was structured data (i.e. idiomatic XML formats) we still need to build machinary to translate between all of these XML formats.
Finally, in the few areas on the Web where structured data in XML formats is commonplace such as Atom/RSS feeds for blog content, not a lot has been done with this data to fulfill the promise of the Semantic Web.
So if the Semantic Web is such an infeasible utopia, why are more and more search startups using that as the angle from which they will attack Google's dominance of Web search? The answer can be found in Bill Slawski's post from a year ago entitled Finding Customers Through Anti-Commercial Queries where he wrote
Most Queries are Noncommercial The first step might be to recognize that most queries conducted by people at search engines aren't aimed at buying something. A paper from the WWW 2007 held this spring in Banff, Alberta, Canada, Determining the User Intent of Web Search Engine Queries, provided a breakdown of the types of queries that they were able to classify. Their research uncovered the following numbers: "80% of Web queries are informational in nature, with about 10% each being navigational and transactional." The research points to the vast majority of searches being conducted for information gathering purposes. One of the indications of "information" queries that they looked for were searches which include terms such as: “ways to,” “how to,” “what is.”
Most Queries are Noncommercial
The first step might be to recognize that most queries conducted by people at search engines aren't aimed at buying something. A paper from the WWW 2007 held this spring in Banff, Alberta, Canada, Determining the User Intent of Web Search Engine Queries, provided a breakdown of the types of queries that they were able to classify.
Their research uncovered the following numbers: "80% of Web queries are informational in nature, with about 10% each being navigational and transactional." The research points to the vast majority of searches being conducted for information gathering purposes. One of the indications of "information" queries that they looked for were searches which include terms such as: “ways to,” “how to,” “what is.”
Although the bulk of the revenue search engines make is from people performing commercial queries such as searching for "incredible hulk merchandise", "car insurance quotes" or "ipod prices", this is actually a tiny proportion of the kinds of queries people want answered by search engines. The majority of searches are about the five Ws (and one H) namely "who", "what", "where", "when", "why" and "how". Such queries don't really need a list of Web pages as results, they simply require an answer. The search engine that can figure out how to always answer user queries directly on the page without making the user click on half a dozen pages to figure out the answer will definitely have moved the needle when it comes to the Web search user experience.
This explains why scenarios that one usually associates with AI and Semantic Web evangelists are now being touted by the new generation of "Google-killers". The question is whether knowledge inference techniques will prove to be more effective than traditional search engine techniques when it comes to providing the best search results especially since a lot of the traditional search engines are learning new tricks.
Now Playing: Bob Marley - Waiting In Vain