A brief intro note on recommender systems
Why recommender systems matter
Oct 10, 21 • 3 min read
Introduction
As the amount of available digital information continues to increase exponentially, enabling unprecedented access to large-scale services and products, so does the need for systems capable of reducing the impact of overchoice by effectively filtering and retrieving relevant content for users. Recommender systems, as they are known, constitute key item discovery and exploration platforms, driving business growth and generating substantial added value for service providers.
YouTube, for instance, which has recently reported a striking upload rate of 500 hours of content per minute1, credits their recommendations for driving more than 70% of web traffic on the platform, a figure that closely mirrored by Netflix’s estimated 80%, with search accounting for the remainder. Another example is that of Amazon’s purchases derived from recommendations, which McKinsey has estimated to be of around 35%. Consequently, keyphrases such as “recommended for you”, “others have liked” or “you may also know” keep reshaping the digital landscape, transcending online shopping, media and advertisement domains, and expanding into an ever-growing diversity of fields.
Methods
Recommendation methods have been refined over time to further enhance the personalized user experience, typically relying on detailed knowledge and deep understanding of the item space, users, their preferences and behavior, and additional contextualization.
Item similarity and neighborhood-based methods have gradually given way to latent factor models that leverage different users’ interaction information, and hybrid strategies that better tackle cold-start2 scenarios and enable multi-scale data modeling. The generalization of these concepts with neural network-based architectures, for example, has allowed for increased control over the incorporation of crucial multi-modal contextual signals, enabling circumstancial factors to adapt the recommendation space, among an array of other advantages. This is especially useful in the sequential recommendation setting, where temporal patterns and trends from past implicit and explicit feedback, usually derived from activity logs, can be complemented with features that model the continually evolving needs and interests of users to infer their intent and predict future interactions.
Nonetheless, some domains, such as travel and e-commerce applications, experience sporadic and mostly anonymous activity, capturing primarly short-term dynamics, thus requiring alternative solutions independent of persistent user profiles.
Impact
The study of these algorithms at the core of news, social media, and educational or entertainment content feeds is relevant not only for their benefits, but also for their potential drawbacks. The aforementioned YouTube statistics can be used to provide some perspective on how insignificantly small the realistically accessible part of the corpus is when compared to the total amount of available information on the platform, a property reflected across various domains.
The amount of control over when, how, and which information is disseminated provides the ideal setup for a highly manipulative setting, strongly tied to the design considerations and objectives of these underlying systems.3 The growing number of popular services4 exploiting user engagement signals and psychological cues derived therefrom (e.g., trust, cognitive dissonance) to build highly addictive algorithms and feedback loops capable of shaping beliefs and behaviors is concerning. This has fueled ongoing discussions around ethics, fairness, diversity, explainability, and interpretability in recommendations - topics that are often disregarded in a literature predominantly concerned with misaligned offline optimization objectives.
Footnotes
-
This results in almost 82 years of new daily content. ↩
-
Cold-start refers to the lack of interaction information usually associated with new item or user profiles, non-logged in searches or infrequent activity producing mostly independent session behavior. ↩
-
Covington et al., for instance, noted that the simple change in ranking objective from click-through rate to watch time allowed for better retention of user engagement while significantly reducing the promotion of deceptive and clickbait videos. ↩
-
Especially free social platforms with business models revolving around user data. ↩