This repository analyses IMDb data to find the top 10 highest-rated movies and identifies the most credited persons (actors, directors, or others) for those movies.
To run this, you will need:
-
Python 3.10.15
-
Install dependencies listed in
imdb_analysis/requirements.txt
-
Download the following files from IMDb datasets:
- title.basics.tsv
- title.ratings.tsv
- name.basics.tsv
- title.principals.tsv
-
Data definition can be found here (not needed to run the analysis)
-
Install Python 3.10 and verify:
python --version
-
Create a virtual environment:
python -m venv my_venv
-
Activate the virtual environment:
-
On macOS/Linux:
source my_venv/bin/activate
-
On Windows
my_venv\Scripts\activate
-
-
Verify Python version:
python --version
-
Install required packages:
pip install -r requirements.txt
-
Download IMDb data files (place in the data folder):
- title.basics.tsv
- title.ratings.tsv
- name.basics.tsv
- title.principals.tsv
To run the analysis and find the top 10 movies:
python movie_rankings.py
└── 📁imdb_analysis
└── 📁data
└── name.basics.tsv
└── title.basics.tsv
└── title.principals.tsv
└── title.ratings.tsv
└── 📁helpers
└── __init__.py
└── constants.py
└── utils.py
└── 📁schema
└── __init__.py
└── titles.py
└── 📁tests
└── __init__.py
└── conftest.py
└── test_helpers.py
└── test_movie_rankings.py
└── .gitignore
└── movie_rankings.py
└── readme.md
└── requirements.in
└── requirements.txt
└── setup.py