These are my notes from the talk Scaling Google for Every User by Marissa Mayer.
Google search has lots of different users who vary in age, sex, location,
education, expertise and a lot of other factors. After lots of research, it
seems the only factor that really influences how different users view search
relevance is their location.
One thing that does distinguish users is the difference between a novice search
user and an expert user of search. Novice users typically type queries in
natural language while expert users use keyword searches.
Example Novice and Expert Search User Queries
NOVICE QUERY: Why doesn't anyone carry an umbrella in Seattle?
EXPERT QUERY: weather seattle washington
NOVICE QUERY: can I hike in the seattle area?
EXPERT QUERY: hike seattle area
On average, it takes a new Google user 1 month to go from typing novice
queries to being a search expert. This means that there is little payoff in
optimizing the site to help novices since they become search experts in such a
short time frame.
Design Philosophy
In general, when it comes to the PC user experience, the more features
available the better the user experience. However when it comes to handheld
devices the graph is a bell curve and there reaches a point where adding extra
features makes the user experience worse. At Google, they believe their
experience is more like the latter and tend to hide features on the main page
and only show them when necessary (e.g. after the user has performed a
search). This is in contrast to the portal strategy from the 1990s when
sites would list their entire product line on the front page.
When tasked with taking over the user interface for Google search, Marissa
Mayer fell back on her AI background and focused on applying mathematical
reasoning to the problem. Like Amazon, they decided to use
split A/B testing to test different changes they planned
to make to the user interface to see which got the best reaction from their
users. One example of the kind of experiments they've run is when the founders
asked whether they should switch from displaying 10 search results by default
because Yahoo! was displaying 20 results. They'd only picked 10 results
arbitrarily because that's what
Alta Vista did. They had some focus
groups and the majority of users said they'd like to see more than 10 results
per page. So they ran an experiment with 20, 25 and 30 results and were
surprised at the outcome. After 6 weeks, 25% of the people who were getting
30 results used Google search less while 20% of the people getting 20 results
used the site less. The initial suspicion was that people weren't having to
click the "next" button as much because they were getting more results but
further investigation showed that people rarely click that link anyway. Then
the Google researchers realized that while it took 0.4 seconds on average to
render 10 results it took 0.9 seconds on average to render 25 results. This
seemingly imperciptible lag was still enough to sour the experience of users
enough that they'd reduce their usage of the service.
Improving Google Search
There are a number of factors that determine whether a user will find a
set of search results to be relevant which include the query, the actual user's
individual tastes, the task at hand and the user's locale. Locale is especially
important because a query such as "GM" is likely be a search for General
Motors but a query such as "GM foods" is most likely seeking information about
genetically modified foods. Given a large enough corpus of data, statistical
inference can seem almost like artificial intelligence. Another example is that
a search like b&b ab
looks for bed and breakfasts in Alberta while
ramstein ab
locates the Ramstein Airforce Base. This is because in general b&b
typically means bed and breakfast so a search like "b&b ab" it is assumed
that the term after "b&b" is a place name based on statistical inference
over millions of such queries.
At Google they want to get even better at knowing what you mean instead of
just looking at what you say. Here are some examples of user queries which
Google will transform to other queries based on statistical inference [in
future versions of the search engine]
User Query |
Google Will Also Try This Query |
unchanged lyrics van halen | lyrics to unchained by van halen |
how much does it cost for an exhaust system | cost exhaust system |
overhead view of bellagio pool | bellagio pool pictures |
distance from zurich switzerland to lake como italy | train milan italy zurich switzerland |
Performing query inference in this manner is a very large scale, ill-defined
problem. Other efforts Google is pursuing is cross language information
retrieval. Specifically, if I perform a query in one language it will be
translated to a foreign language and the results would then be translated to
my language. This may not be particularly interesting for English speakers
since most of the Web is in English but it will be valuable for other
languages (e.g. an Arabic speaker interested in restaurant reviews from New
York City restaurants).
Google Universal Search was a revamp of the core engine
to show results other than text-based URLs and website summaries in the
search results (e.g. search for nosferatu). There were a number of challenges in building this
functionality such as
- Google's search verticals such as books, blog, news, video, and image
search got a lot less traffic than the main search engine and originally
couldn't handle receiving the same level of traffic as the main page.
- How do you rank results across different media to figure out the most
relevant? How do you decide a video result is more relevant than an image
or a webpage? This problem was tackled by Udi Manber's team.
- How do you integrate results from other media into the existing search
result page? Should results be segregated by type or should it be a list
ordered by relevance independent of media type? The current design was
finally decided upon by Marissa Mayer's team but
they will continue to incrementally improve it and measure the user reactions.
At Google, the belief is that the next big revolution is a search engine that
understands what you want because it knows you. This means personalization is
the next big frontier. A couple of years ago, the tech media was full of
reports that a bunch of Stanford students had figured out how to make Google five times
faster. This was actually incorrect. The students had figured out how to
make PageRank calculations faster which doesn't really affect the speed of
obtaining search results since PageRank is calculated offline. However this
was still interesting to Google and the students' company was purchased. It
turns out that making PageRank faster means that they can now calculate
multiple PageRanks in the time it used to take to calculate a single PageRank
(e.g. country specific PageRank, personal PageRank for a given user, etc). The
aforementioned Stanford students now work on Google's personalized search
efforts.
Speaking of personalization, iGoogle has
become their fastest growing product of all time. Allowing users create a
personalized page then opening up the platform to developers such
Caleb to build gadgets lets them learn more about their users. Caleb's
collection of gadgets garner about 30 million daily page views on various
personalized homepage.
Q&A
Q: Does the focus on expert searchers mean that they de-emphasis natural language processing?
A: Yes, in the main search engine. However they do focus on it for their
voice search product and they do believe that it is unfortunate that users have
to adapt to Google's keyword based search style.
Q: How do the observations that are data mined about users search habits get
back into the core engine?
A: Most of it happens offline not automatically. Personalized search is
an exception and this data is uploaded periodically into the main engine to
improve the results specific to that user.
Q: How well is the new Universal Search interface doing?
A: As well as Google Search is since it is now the Google search interface.
Q: What is the primary metric they look at during A/B testing?
A: It depends on what aspect of the service is being tested.
Q: Has there been user resistance to new features?
A: Not really. Google employees are actually more resistant to changes in
the search interface than their average user.
Q: Why did they switch to showing Google Finance before Yahoo! Finance when
showing search results for a stock ticker?
A: Links used to be ordered by ComScore metrics but ince Google Finance
shipped they decided to show their service first. This is now a standard policy
for Google search results that contain links to other services.
Q: How do they tell if they have bad results?
A: They have a bunch of watchdog services that track uptime for various
servers to make sure a bad one isn't causing problems. In addition, they have
10,000 human evaluators who are always manually checking teh relevance of
various results.
Q: How do they deal with spam?
A: Lots of definitions for spam; bad queries, bad results and email spam.
For keeping out bad results they do automated link analysis (e.g. examine
excessive number of links to a URL from a single domain or set of domains)
and they use multiple user agents to detect cloaking.
Q: What percent of the Web is crawled?
A: They try to crawl most of it except that which is behind signins and
product databases. And for product databases they now have Google Base and encourage people to upload their data there so
it is accessible to Google.
Q: When will I be able to search using input other than search (e.g. find this
tune or find the face in this photograph)?
A: We are still a long way from this. In academia, we now have experiments
that show 50%-60% accuracy but that's a far cry from being a viable end user
product. Customers don't want a search engine that gives relevant results half
the time.