This repository contains project file for Project 0 - Titanic Survival Exploration as part of Udacity's Machine Learning Nanodegree.
We are glad to partner with IIT Kharagpur as a part of the Kharagpur Winter of Code. We are proud to host this Open Source event during the winter months and we hope you have a great winter this year.
See Project Ideas here
The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.
One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.
In this problem, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.
In this optional project, you will create decision functions that attempt to predict survival outcomes from the 1912 Titanic disaster based on each passenger’s features, such as sex and age. Start with a simple algorithm and increase its complexity until you are able to accurately predict the outcomes for at least 80% of the passengers in the provided data. This project will introduce you to some of the concepts of machine learning as you start the Nanodegree program.
This project requires Python 2.7 and the following Python libraries installed:
You will also need to have software installed to run and execute an iPython Notebook
Udacity recommends our students install Anaconda, i pre-packaged Python distribution that contains all of the necessary libraries and software for this project.
Template code is provided in the notebook titanic_survival_exploration.ipynb
notebook file. Additional supporting code can be found in titanic_visualizations.py
. While some code has already been implemented to get you started, you will need to implement additional functionality when requested to successfully complete the project.
- Importing Data with Pandas
- Cleaning Data
- Exploring Data through Visualizations with Matplotlib
- Supervised Machine learning Techniques: + Logit Regression Model + Plotting results + Support Vector Machine (SVM) using 3 kernels + Basic Random Forest + Plotting results
- K-folds cross validation to valuate results locally
- Output the results from the IPython Notebook to Kaggle
In a terminal or command window, navigate to the top-level project directory titanic_survival_exploration/
(that contains this README) and run one of the following commands:
ipython notebook titanic_survival_exploration.ipynb
jupyter notebook titanic_survival_exploration.ipynb
This will open the iPython Notebook software and project file in your browser.
The dataset used in this project is included as titanic_data.csv
. This dataset is provided by Udacity and contains the following attributes:
survival
? Survival (0 = No; 1 = Yes)pclass
? Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)name
? Namesex
? Sexage
? Agesibsp
? Number of Siblings/Spouses Aboardparch
? Number of Parents/Children Aboardticket
? Ticket Numberfare
? Passenger Farecabin
? Cabinembarked
? Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)
Check here Udacity Reviews
See CONTRIBUTING.md
- Coursera Lectures by Andrew Ng are not very mathematically heavy and provide a good introduction to ML algorithms.
- Standford Lectures
- Unsupervised Learning
- Udacity Lectures (Intro to ML)
- Udacity Lectures (ML)
- Sentdex Lectures on Introduction to ML
- Udemy Lectures on ML using Python as well as R
- Udemy Course on various Data science and Machine Learning Techniques
- Machine Learning A-Z™: Hands-On Python & R In Data Science
- EdX: Learning From Data (Introductory Machine Learning)