202511200938 Status: idea Tags: Datascience

Recommender systems

Recommender systems are algorithms providing personalized suggestions for items that are most relevant to each user. With the massive growth of available online contents, users have been inundated with choices. It is therefore crucial for web platforms to offer recommendations of items to each user, in order to increase user satisfaction and engagement.

You can find them everywhere:

  • google search shows you reccomendations
  • netflix and youtube show reccomendations
  • spotify mixes your playlists for you
  • amazon knows what you should buy next.

All these platforms use powerful Machine Learning models in order to generate relevant recommendations for each user.

A user u can rate an item i (e.g., a movie, song, or product). Since not all ratings are known, a machine learning model is trained to predict them using available data (user history, similarities, item features, etc.). During inference, the model predicts ratings for multiple items, and the system recommends those with the highest predicted score

We therefore need to collect user feedback, so that we can have a ground truth for training and evaluating our models. An important distinction has to be made here between explicit feedback and implicit feedback.

explicit feedback

Explicit feedback is a rating explicitly given by the user to express their satisfaction with an item. Examples are: number of stars on a scale from 1 to 5 given after buying a product, thumb up/down given after watching a video, etc. This feedback provides detailed information on how much a user liked an item, but it is hard to collect as most users typically don’t write reviews or give explicit ratings for each item they purchase


Link to original

implicit feedback

Implicit feedback, assume that user-item interactions are an indication of preferences. Examples are: purchases/browsing history of a user, list of songs played by a user, etc. This feedback is extremely abundant, but at the same time it is less detailed and more noisy (e.g. someone may buy a product as a present for someone else). However, this noise becomes negligible when compared to the sheer size of available data of this kind, and most modern Recommender Systems tend to rely on implicit feedback


Link to original

Filtering

Recommender systems can be classified according to the kind of information used to predict user preferences as Content-Based or Collaborative Filtering.

Content-Based Filtering

Content-Based Filtering

  • Content-based methods describe users and items by their known metadata. Each item i is represented by a set of relevant tags—e.g. movies of the IMDb platform can be tagged as“action”, “comedy”, etc. Each user u is represented by a user profile, which can created from known user information—e.g. sex and age—or from the user’s past activity.
  • To train a Machine Learning model with this approach we can use a k-NN model. For instance, if we know that user u bought an item i, we can recommend to u the available items with features most similar to it.

The advantage of this approach is that items metadata are known in advance, so we can also apply it to Cold-Start scenarios where a new item or user is added to the platform and we don’t have user-item interactions to train our model. The disadvantages are that we don’t use the full set of known user-item interactions (each user is treated independently), and that we need to know metadata information for each item and user.


Link to original

Collaborative Filtering

Collaborative Filtering

  • Collaborative filtering methods do not use item or user metadata, but try instead to leverage the feedback or activity history of all users in order to predict the rating of a user on a given item by inferring interdependencies between users and items from the observed activities.
  • To train a Machine Learning model with this approach we typically try to cluster or factorize the rating matrix rui in order to make predictions on the unobserved pairs (u, i), i.e. where rui = ”?”.

The advantage of this approach is that the whole set of user-item interactions (i.e. the matrix rui) is used, which typically allows to obtain higher accuracy than using Content-Based models. The disadvantage of this approach is that it requires to have a few user interactions before the model can be fitted


Link to original

Hybrid approaches

Finally, there are also hybrid methods that try to use both the known metadata and the set of observed user-item interactions. This approach combines advantages of both Content-Based and Collaborative Filtering methods, and allow to obtain the best results.


References