The job of a search engine is, first and foremost, to provide answers to user’s queries. In response to each query, a search engine returns links to web pages it finds in its index – a database of web pages known to this particular search engine. Thus, an answer to the user’s query comes in the form of search results – a list of hyperlinks to web pages, whose content matches this query.
This is how it works:
These days, a search query that would return fewer than a dozen results is hard to find. Most searches will retrieve links to millions of web pages. The number of answers potentially matching any given search query is growing increasingly fast along with the rapid growth of the internet. It doesn’t make much sense to provide the user with all potentially matching pages that exist – a person would have to browse through dozens of resources before anything useful comes up. Instead, what a search engine does is rank the search results placing the most relevant of them on top.
Looking at these search results, the user may feel quite satisfied, not really satisfied, or not satisfied at all. This subjective feeling of getting (or not getting) what one was searching for is what describes the quality of search from the user’s point of view – is this information useful for me? The trick is to describe and measure all these subjective attitudes and to take into account everyone. The quality of search depends on how well search results are ranked. Ranking means sorting search results in a way that meets user's expectations.
It’s impossible to build a perfect algorithm that would come up with the best possible result for every possible query. Yandex’s search engine processes almost 200,000,000 queries every day. Almost half of these queries are unique. To deal with this load of questions successfully, a search engine has to be able to make decisions based on the previous experience, that is, it has to learn.
Machine learning is essential not only in search technology. Speech or text recognition, for instance, is also impossible without a machine being able to learn. The term ‘machine learning’ coined in the ‘50s, basically, means the effort to make a computer perform the tasks natural to human behavior, but difficult for breaking down into algorithmic patterns ‘understandable’ by machines. A machine that can learn is a machine that can make its own decisions based on input algorithms, empirical data and experience.
Decision making, however, is a human quality, which a machine cannot really master. What it can do, though, is learn to create and apply a rule that would help to decide whether a particular web page is a good answer to user’s question or not.
This rule is based on properties of web pages and user’s queries. Some of these properties, like the number of links leading to a particular page, are static – describing a web page, while others, like whether a web page has words matching a search query, how many and where on a page, are dynamic – describing both a web page and a search query. There are also properties specific only to search queries, such as geolocation. For a search engine, this means that to give a good answer to a user’s question it has to factor in where this question has come from.
These quantifiable properties of web pages and search queries are called ranking factors. These factors are key in performing exact searches and making the decision on which results are the most relevant. For a search engine to return relevant results for a user’s query, it needs to consider a multitude of such factors.
Three types of ranking factors:
To approximate users’ expectations, a search engine requires sample user queries and matching results, which have already been considered satisfactory by the users. Assessors – people, who decide whether a particular web page offers a ‘good’ response to a certain search query – provide their evaluations. A number of search responses, together with corresponding queries, make up a learning sample for a search engine ‘to learn to find’ certain dependencies between these web pages and their properties. To represent real users’ search patterns truthfully, a learning sample has to include all kinds of search queries in the same proportion as they occur in real life.
After a search engine has found dependencies between web pages in the learning sample and their properties, it can choose the best ranking formula for the search results it can deliver to a specific user’s query and return the most relevant of them on top of all the rest.
Think of teaching a machine how to pick the most delicious apples. First, assessors take a bite of each apple in a ‘tasting crate’ and put all tasty apples to the right and all sour apples to the left. This crate contains all sorts of apples in the same proportion as they are likely to grow in the garden. A machine cannot taste apples, but it can analyze their properties, like size, color, sugar content, firmness, presence or absence of a leaf. The tasting crate is a learning sample, which allows the machine to learn to select the apples with the winning combination of properties: size, color, sweetness and firmness. Errors are unavoidable, though. For instance, if a machine does not have any information about insect larvae, the best apples it has selected might hide a worm. To minimize the probability of error, a machine needs to consider a maximum number of apples’ properties.
Machine learning has been implemented in search technologies since the early noughties. Different search systems use different models. One of the problems in machine learning is overfitting. An algorithm that overfits its data is like a sophomore medical student who diagnoses himself with every possible symptom he has read about in his manual. Not having been exposed to the real practice yet, he makes up causes for the natural things he observes.
When a computer uses a large number of factors (properties of web pages and search queries, in our case) on a relatively small learning sample (‘good’ results as estimated by assessors), it begins to find dependencies that do not exist. For example, a learning sample might accidentally include two different pages each having the same particular combination of factors, like they both are 2 KB, with purple background and feature text, which starts with “A”. And, by sheer chance, these pages both happen to be relevant to the search query [apple]. A computer may deem this accidental combination of factors to be essential for a search result to be relevant to the search query [apple]. At the same time, all web pages offering really relevant and useful information about apples, but lacking this particular combination of factors, will be considered less important.
In 2009 Yandex launched MatrixNet, a new method of machine learning. A key feature of this method is its resistance to overfitting, which allows the Yandex’ search engine take into account a very large number of factors when it makes the decision about relevancy of search results. But now, the search system does not need more samples of search results to learn how to tell the ‘good’ from the ‘not so good’. This safeguards the system from making mistakes by finding dependencies that do not exist.
MatrixNet allows generate a very long and complex ranking formula, which considers a multitude of various factors and their combinations. Alternative machine learning methods either produce simpler formulas using a smaller number of factors or require a larger learning sample. MatrixNet builds a formula based on tens of thousands of factors, which significantly increases the relevance of search results.
Another important feature of MatrixNet is that allows customize a ranking formula for a specific class of search queries. Incidentally, tweaking the ranking algorithm for, say, music searches, will not undermine the quality of ranking for other types of queries. A ranking algorithm is like complex machinery with dozens of buttons, switches, levers and gauges. Commonly, any single turn of any single switch in a mechanism will result in global change in the whole machine. MatrixNet, however, allows to adjust specific parameters for specific classes of queries without causing a major overhaul of the whole system.
Change of a single parameter in different ranking formulas:
In addition, MatrixNet can automatically choose sensitivity for specific ranges of ranking factors. It’s like trying to hear someone whisper on the airfield. Figuratively speaking, MatrixNet can hear both the whisper and the sound of planes landing or taking off.
For each user’s query, a search engine has to evaluate properties of millions of pages, assess their relevancy and rank them accordingly with the most relevant on top. Scanning each page in succession either would require a huge number of servers (that could deal with all those pages very quickly) or would take a lot of time – but a searcher cannot wait. MatrixNet solves this problem as it allows to check web pages for a very large number of ranking factors without increasing processing power.
In response to each query, more than a thousand servers simultaneously perform a search. Each server searches within its own part of index to produce a list of the best results. This list is guaranteed to include web pages most relevant to this query.
The next step is to produce one final list of top results based on all those lists of the most relevant pages produced by each server. These results are, then, ranked using that long and complicated MatrixNet formula, which allows to consider a multitude of ranking factors and their combinations. Thus, the most relevant websites find their way to the top of search results for the user to receive an answer to their question almost instantly.
This is approximately how ranking works: