This is a compilation of ML projects, most kaggle competitions
-
Classification: is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output Y = f(X)
- Breast cancer: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data#data.csv
- 31 Features
- diagnosis - The diagnosis of breast tissues (M = malignant, B = benign)
- Credit Card: https://www.kaggle.com/mlg-ulb/creditcardfraud
- 30 Features
- Class 1 for fraudulent transactions, 0 otherwise
- Diabetes: https://www.kaggle.com/uciml/pima-indians-diabetes-database
- 8 Features
- Outcome - Class variable (0 or 1) 268 of 768 are 1, the others are 0
- Heart: https://www.kaggle.com/ronitf/heart-disease-uci
- 13 Features
- target: 1 or 0
- Kidney Disease: https://www.kaggle.com/mansoordaku/ckdisease
- 25 Features
- Target - 'ckd' or 'notckd' - ckd=chronic kidney disease.
- Breast cancer: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data#data.csv
-
Regression: A regression problem is when the output variable is a real or continuous value, such as “salary” or “weight”
- AirBnB NYC: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
- 16 Features
- Target - price in dollars
- Avocado prices: https://www.kaggle.com/neuromusic/avocado-prices
- 13 Features
- Target - Total number of avocados sold
- SF Salaries: https://www.kaggle.com/kaggle/sf-salaries
- 12 Features
- Target - Total Pay salary
- House price: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data
- 80 Features
- Target - SalePrice - the property's sale price in dollars. This is the target variable that you're trying to predict.
- Women Shoes Prices: https://www.kaggle.com/datafiniti/womens-shoes-prices
- 50 Features
- Target - Prices amount min and Prices amount max
- AirBnB NYC: https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data
-
Recommendation: is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item. They are primarily used in commercial applications.
- Retailrocket : https://www.kaggle.com/retailrocket/ecommerce-dataset
- 10 Features (4 files)
- Target - value: property value of the item
- Netflix movies: https://www.kaggle.com/shivamb/netflix-shows
- 12 Features
- Target - rating. TV Rating of the movie / show
- Movielens: https://www.kaggle.com/grouplens/movielens-20m-dataset#rating.csv
- 6 Features
- Target - movie ratings
- Amazon Product Reviews: https://www.kaggle.com/saurav9786/amazon-product-reviews#ratings_Electronics%20(1).csv
- 3 Features
- Target - Rating :Rating of the corresponding product by the corresponding user
- TED Talks: https://www.kaggle.com/rounakbanik/ted-talks
- 16 Features
- Target - Rating : A stringified dictionary of the various ratings given to the talk (inspiring, fascinating, jaw dropping, etc.)
- Retailrocket : https://www.kaggle.com/retailrocket/ecommerce-dataset
-
Unsupervised: is the training of machine using information that is neither classified nor labeled and allowing the algorithm to act on that information without guidance.
- College: https://www.kaggle.com/flyingwombat/us-news-and-world-reports-college-data#College.csv
- 19 Features
- No Target but it needs: Cluster private information
- Department of justice (json): https://www.kaggle.com/jbencina/department-of-justice-20092018-press-releases/
- 6 Features
- No Target but it needs: What words tend occur frequently together?
- Credit card fraud detection: https://www.kaggle.com/arjunbhasin2013/ccdata#CC%20GENERAL.csv
- 18 Features
- No Target but it needs: Fraud or not
- Wines: https://www.kaggle.com/akram24/wine-pca
- 14 Features
- No Target but it needs: Cluster types of wines
- World happiness: https://www.kaggle.com/unsdsn/world-happiness
- 12 Features
- No Target but it needs: happiness or not
- College: https://www.kaggle.com/flyingwombat/us-news-and-world-reports-college-data#College.csv
-
Others with text (NLP): NLP is like learning the language of your own mind!
- Tweets: https://www.kaggle.com/c/nlp-getting-started/data
- 4 Features
- target - in train.csv only, this denotes whether a tweet is about a real disaster (1) or not (0)
- Youtube: https://www.kaggle.com/datasnaek/youtube-new
- 16 Features
- 21 files for analysing sentiments
- Seattle hotel: https://github.com/0x6f736f646f/Seattle-Hotel-recommendation/blob/master/Data/Seattle_Hotels.csv
- 3 Features
- Target - desc
- Tweets: https://www.kaggle.com/c/nlp-getting-started/data
- Mariana Alanis - Initial work -