Using information we scrape from the web, build linear regression models from which we can learn about movies adapted from novels and predict whether an adaptaion will success.
- acquisition: web scraping
- storage: flat files
- sources:
Opening weekend box office (U.S.)
Movie
- MPAA rating
- release year
- release seaon
- director
- number of directed films then
- average rating of directed films then
- average gross then
- cast
- number of participated films then
- average rating of participated films then
- average gross then
- writer: with original author or not
- genre
- distributor
- language
- country
Book
- first published year
- years from movie release
- page
- author
- number of publications then
- visibility
- book visibility
- basics of the web (requests, HTML, CSS, JavaScript)
- web scraping (
BeautifulSouop
,selenium
) numpy
andpandas
statsmodels
,scikit-learn
matplotlib
,seaborn
- Model: Linear Regression
- Featrue engineering:
StandardScaler
,PolynomialFeatures
,OneHotEncoder
- Cross-Validation
- Regularization:
Ridge
,Lasso
,RidgeCV
,LassoCV