Movie Recommender System 🎥

In this project, I use the MovieLens dataset to create a tailored movie recommendation system that leverages user ratings to suggest personalised film discoveries.

Introduction

Ever wondered how Netflix knows exactly which new show will keep you glued to your screen? How Spotify seems to have a great ability to know which songs you would like in your ‘Daily Mix’? These daily conveniences are powered by types of information filtering systems known as recommender systems.

A recommender system is a intricate algorithm intended for prediction and suggestion of items that a user is likely to enjoy, guided by their past behaviour and preferences. Such systems make use of algorithms of machine learning and data analytics in order to analyse structures within user data for generation of recommendations. Majorly, the technique applied by such systems is collaborative filtering in relation to discovering similarities among users and, at the same time, item content that works to match with user preferences. In this project, I create a movie recommender system using Python that uses the concept of recommender systems to curate personalised watchlists based on user preferences.

Sourcing the Data: The MovieLens Dataset

MovieLens, provided by the GroupLens Research Project at the University of Minnesota, offers various sizes of datasets suitable for experimenting with recommender systems. For this project, I opted for the “latest-small” dataset, which contains about 100,000 ratings applied to over 9,000 movies by 600 users. This dataset strikes a balance between comprehensiveness and manageability, making it an ideal starting point for our exploration. (NOTE: You can opt to use the larger datasets as it’s always more fun but you will require immense computing power!)

Process Overview

To develop a movie recommender system, we follow a systematic approach, breaking down complex processes into manageable steps. Here’s an overview:

  1. Data Preparation: The first step involves preparing the MovieLens dataset for analysis. This includes loading the dataset into a Python environment, exploring its structure, and cleaning the data to remove any inconsistencies or missing values.
  2. User-Item Matrix Creation: We create a user-item matrix where rows represent users and columns represent movies. Each cell in the matrix contains the rating a user has given to a movie, with unrated movies marked as NaN (Not a Number).
  3. Matrix Normalisation: Since different users may have different rating scales (some might rate movies generously, others more critically), we normalise the matrix. Normalisation adjusts the ratings so that we can fairly compare them across different users.
  4. Similarity Computation: To recommend movies, we need to find users who are similar to the target user. We use similarity metrics like Pearson correlation and cosine similarity for this purpose. These metrics help us identify users with similar tastes.
  5. Recommendation Generation: Based on the similarity scores, we predict ratings for movies the target user hasn’t seen yet. We then recommend the movies with the highest predicted ratings.

Pearson Correlation vs. Cosine Similarity

Understanding the difference between Pearson correlation and cosine similarity is crucial for implementing a recommender system. In this project I opt to use Pearson correlation. Here is the comparison:
Further information can be found here

FeaturePearson CorrelationCosine Similarity
NatureMeasure linear correlation between two variablesMeasures the cosine of the angle between two vectors
Sensitivity to MagnitudeSensitive to magnitude differencesInsensitive to magnitude differences
UsageUsed for measuring the strength and direction of a linear relationship between two variables Used for measuring the similarity between two vectors, often in high-dimensional spaces
InterpretationValues range from -1 to 1, with 0 indicating no correlationValues range from 0 to 1, with 0 indicating orthogonality (no similarity)

The Code

View code on Colab
View code on Github

Future Improvements

There’s a lot that can be added to this project. Here are a few I’ve thought of:

Enhancement of the user interface and explainability

  1. Develop a user-friendly interface that allows users to easily input their preferences, view recommended movies, and provide feedback on the recommendations.
  2. Provide explanations or reasons behind each recommendation, such as “Recommended because you enjoyed similar movies like X and Y” or “Recommended based on the preferences of users with similar tastes.” 

Incorporation of user-based collaborative filtering:

  1. In addition to the item-based collaborative filtering approach implemented, I can also explore user-based collaborative filtering.
  2. User-based collaborative filtering identifies similar users based on their rating patterns and recommends movies that similar users have enjoyed but the target user hasn’t watched yet.
Shopping Basket