This repository contains code and analysis for the 4th homework assignment for the Algorithmic Methods of Data Mining course.
The repository contains the following key files:
main.ipynb
: Main Jupyter notebook containing implementation and analysis for the recommendation system and clustering tasksCommandLine.sh
: Bash script to execute command line tasksSS.png
: Screenshot of command line outputvodclickstream_uk_movies_03.csv
: CSV file containing the dataset used
The main tasks are implemented in main.ipynb
. This covers:
- Recommendation system using minhash and LSH
- User clustering with feature engineering, dimensionality reduction, K-means and DBSCAN
- The command line question and the algorithmic question
The command line question is executed via CommandLine.sh
and output is shown in SS.png
.
The code requires Python 3 and standard data science libraries like Pandas, NumPy, Scikit-Learn, etc.
The bash script assumes a Linux/Unix environment with common command line utilities like grep, wc, etc.
- Ambar Chatterjee
- Elias Antoun
- Sofia Noemi Crobeddu
- Damian Zeller
Course project completed as part of the Algorithmic Methods of Data Mining course.
- Course instructors
- Dataset provided by: [source]