Millions web users are daily exposed to viewing banner ads on the pages of Yandex’s sites. Advertisers on Yandex have an opportunity to show their ads only to that part of the viewer audience who are potentially interested in seeing them, people of a certain age or gender, for instance. To enable advertisers to target their ads to a specific audience, Yandex uses its own proprietary behavior analytics technology Crypta. This technology allows classify web users based on their online behavior.
Crypta is based on Yandex’s own machine learning method MatrixNet. For Crypta to be able to tell one age group from another or men from women, it had to learn by example. Actually, by thousands of examples. These samples were depersonalized information about web users’ age and gender sourced from social networking website for professionals MoiKrug (My Circle) — this data appeared to be reliable since people are likely to be truthful about their personal details in a work-related context. To verify the data from MoiKrug, the age and gender information for each user was cross checked with the information they provided in their personal profile on Yandex. The cross checking sieved through about one million profiles whose information appeared to be sufficiently reliable.
After that, Crypta was fed the information about online behavior of the users whose personal details in their profiles were determined reliable. The information about users’ online behavior included, among other things, length of search queries, presence of certain search terms, activity periods during the past 24 hours etc. Both the users’ demographics and the information about their online behavior were used for training Crypta’s classifying algorithm.
Using the sourced data about web users’ age and gender and their online behavior, Crypta’s developers created two samples: the learning sample containing the information about 700,000 web users and the test sample with the information about 300,000 of them. Based on the learning sample, Crypta found 300 most essential factors for age or gender identification and assigned weights to each of them.
The testing sample was designed to see how well Crypta has learnt. To check how well the algorithm can determine the user’s age and gender, it was exposed to the test sample with the data about the users’ age and gender removed. Crypta determined the degree of probability for each user to be either a man or a woman belonging to one of the five age groups. Then, Crypta’s developers compared the results with the available data and improved the learning algorithm. After some more testing and tweaking the technology was implemented on Yandex.
When selecting a certain group in the entire Yandex user audience, the company’s behavior analytics technology allows determine user’s gender with a 74% probability, which is 50% more accurate than a wild guess. Users belonging to the largest age group from 25 to 34 years old, for instance, get identified with a 45% probability, which is 100% more accurate than random guessing. Demographic accuracy for behavior targeting to smaller audiences is even higher. Behavior targeting to half of Yandex’s user audience, for instance, identifies users’ gender with probability as high as 85%, while their belonging to the largest age group is determined with a 52% probability.
Crypta continually keeps its knowledge updated. It processes and updates information about virtually every Yandex user on a daily basis.
This technology is capable of learning how to differentiate users based on something other than their gender or age. Crypta is insensitive to features or interests web users share offline. But if they display common patterns in their online behavior, Crypta can group them together or differentiate them from other groups.