Titanic Kaggle Competition - README

This repository contains code for participating in the Titanic competition on Kaggle. The objective of the competition is to predict whether passengers aboard the Titanic survived or not, based on various features such as age, sex, ticket class, and more.

Prerequisites

Before running the code, ensure you have the following Python libraries installed:

pandas
numpy
scikit-learn
matplotlib

You can install them using pip:

pip install pandas numpy scikit-learn matplotlib

Usage

Clone the repository to your local machine:

git clone https://github.com/your-username/titanic-kaggle.git
cd titanic-kaggle

Download the Titanic dataset (train.csv) from Kaggle or provide the path to the file in the code where it reads the dataset.

df = pd.read_csv("/Users/pb/Downloads/titanic/train.csv")

Data Preprocessing
- The 'Sex' column is encoded using LabelEncoder to convert categorical data to numerical values (0 for one category and 1 for the other).
- Missing values in the "Age" column are replaced with the mean age of the dataset.
- Unnecessary columns like "Embarked", "PassengerId", "Name", "Ticket", and "Cabin" are dropped from the features.
Split the dataset into training and test sets using train_test_split.
Model Selection and Training
- The model chosen for this task is a Random Forest Classifier with 100 estimators.
- The model is trained on the training data using the fit method.
Model Evaluation
- The accuracy of the model is computed on the test set using accuracy_score.
- The "out-of-bag" (oob) score of the Random Forest Classifier is also displayed.
- ROC-AUC score, precision, recall, and F1-score are computed using various evaluation metrics from scikit-learn.
Visualization
- Precision-Recall Curve is plotted to visualize the precision and recall trade-off.
- Precision vs. Recall plot is displayed to explore the relationship between precision and recall.
- ROC Curve is plotted to visualize the true positive rate (sensitivity) against the false positive rate (1-specificity).
Running the Code

To run the code, ensure you have the required libraries installed and have the Titanic dataset in the correct path or adjust the path in the pd.read_csv() function accordingly. Then, simply execute the code in your Python environment.

python titanic.py

Metrics:

Accuracy: 81.00558659217877%
oob score: 81.17999999999999 %
acc_random_forest: 98.03
ROC-AUC-Score: 0.996966182600511
Precision: 0.78099173553719
Recall: 0.7052238805970149

Plots

Precision and Recall Plot

Precision vs. Recall Plot

ROC Curve

Disclaimer

Keep in mind that this is a basic implementation, and there are many ways to improve the model's performance, such as hyperparameter tuning, feature engineering, or using different machine learning algorithms. This code serves as a starting point for your exploration in the Titanic Kaggle competition.

Feel free to explore, modify, and experiment with the code to enhance your results.

Happy coding and good luck with the competition!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
plots		plots
LICENSE		LICENSE
README.md		README.md
titanic.py		titanic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Titanic Kaggle Competition - README

Prerequisites

Usage

Metrics:

Plots

Precision and Recall Plot

Precision vs. Recall Plot

ROC Curve

Disclaimer

About

Releases

Packages

Languages

License

Piyush-Bhor/titanic-kaggle

Folders and files

Latest commit

History

Repository files navigation

Titanic Kaggle Competition - README

Prerequisites

Usage

Metrics:

Plots

Precision and Recall Plot

Precision vs. Recall Plot

ROC Curve

Disclaimer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages