In this project, I use the MovieLens dataset to create a tailored movie recommendation system that leverages user ratings to suggest personalised film discoveries.
Ever wondered how Netflix knows exactly which new show will keep you glued to your screen? How Spotify seems to have a great ability to know which songs you would like in your ‘Daily Mix’? These daily conveniences are powered by types of information filtering systems known as recommender systems.
A recommender system is a intricate algorithm intended for prediction and suggestion of items that a user is likely to enjoy, guided by their past behaviour and preferences. Such systems make use of algorithms of machine learning and data analytics in order to analyse structures within user data for generation of recommendations. Majorly, the technique applied by such systems is collaborative filtering in relation to discovering similarities among users and, at the same time, item content that works to match with user preferences. In this project, I create a movie recommender system using Python that uses the concept of recommender systems to curate personalised watchlists based on user preferences.
MovieLens, provided by the GroupLens Research Project at the University of Minnesota, offers various sizes of datasets suitable for experimenting with recommender systems. For this project, I opted for the “latest-small” dataset, which contains about 100,000 ratings applied to over 9,000 movies by 600 users. This dataset strikes a balance between comprehensiveness and manageability, making it an ideal starting point for our exploration. (NOTE: You can opt to use the larger datasets as it’s always more fun but you will require immense computing power!)
To develop a movie recommender system, we follow a systematic approach, breaking down complex processes into manageable steps. Here’s an overview:
Understanding the difference between Pearson correlation and cosine similarity is crucial for implementing a recommender system. In this project I opt to use Pearson correlation. Here is the comparison:
Further information can be found here
Feature | Pearson Correlation | Cosine Similarity |
---|---|---|
Nature | Measure linear correlation between two variables | Measures the cosine of the angle between two vectors |
Sensitivity to Magnitude | Sensitive to magnitude differences | Insensitive to magnitude differences |
Usage | Used for measuring the strength and direction of a linear relationship between two variables | Used for measuring the similarity between two vectors, often in high-dimensional spaces |
Interpretation | Values range from -1 to 1, with 0 indicating no correlation | Values range from 0 to 1, with 0 indicating orthogonality (no similarity) |
There’s a lot that can be added to this project. Here are a few I’ve thought of:
Enhancement of the user interface and explainability
Incorporation of user-based collaborative filtering: