The repository contains a guide to Python libraries used for EDA (Exploratory data analysis):

1) NumPy:

See folder with notebooks here

Find article here

2) Pandas:

See folder with notebooks here

Find article here

3) Matplotlib:

See folder with notebooks here

Find article here

4) Seaborn:

See folder with notebooks here

Find article here

5) Plotly and Cufflinks:

See folder with notebooks here

Find article here

6) Geographical plots:

See folder with notebooks here

Find article here

The repository contains EDA (Exploratory data analysis) models for the following datasets:

1) Haberman's Survival Dataset:

A simple dataset contaning cases from a study conducted between 1958 and 1970. The study was conducted at the University of Chicago's Bilings Hospital. The study is on the survival of patients who had undergone surgery for breast cancer.

Number of Instances: 306

Number of Attributes: 4

Attribute Information:

1) Age of patient at time of operation (Age)

2) Patient's year of operation (Op_Year)

3) Number of positive axillary nodes detected (axil_nodes)

4) Survival status (class attribute) 1 = the patient survived 5 years or longer, 2 = the patient died within 5 year (Surv_status)

See notebook here

2) Iris Flower Dataset:

Its a simple dataset of 3 flowers of Iris species namely setosa, virginica and versicolor.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

 1) Id
  
 2) SepalLengthCm
  
 3) SepalWidthCm
  
 4) PetalLengthCm
  
 5) PetalWidthCm
  
 6) Species

See notebook here

3) Pyramid Scheme

Foundation of a pyramid scheme is to create a tree structure and joining members at every node. Members earn money through a commission based project which extracts money from the members who are joined at the child nodes.

Consider the following variables

1) cost_price: cost price of the product

2) profit_markup: Is the profit markup which the company tries to achieve, such as three times the cost_price. 

3) selling_price: This is the amount which potential customer pays to become the member of the scheme 

selling_price = profit_markup*cost_price

4) sales_commision: What a member earns when he adds another member in the chain. The payment is cumulative as all members above the         leaf node earn the commission.

5) Net Profit for the company on each sale

result = selling_price - cost_price - (depth_of_tree-1)*sales_commision

Based on the above model, the net sales commission increases with the depth of tree for a chain increases. If the profit amount is negative, it will not make economic sense for the company to invest in that chain. And pilot will try to spin off another chain below him to keep the chain profitable.

See notebook here

4) Titanic Dataset

The data has been split into two groups:

1) training set (train.csv)

2) test set (test.csv)

The attritubes are:

1) Pclass-Passenger Class (1 = 1st; 2 = 2nd; 3 = 3rd)

2) survival-Survival (0 = No; 1 = Yes)

3) name-Name

4) sex-Sex

5) age-Age

6) sibsp-Number of Siblings/Spouses Aboard

7) parch-Number of Parents/Children Aboard

8) ticket-Ticket Number

9) fare-Passenger Fare (British pound)

10) cabin-Cabin

11) embarked-Port of Embarkation (C = Cherbourg; Q = Queenstown; S = Southampton)

See notebook here

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Guide		Guide
Haberman EDA		Haberman EDA
Iris EDA		Iris EDA
Pyramid scheme EDA		Pyramid scheme EDA
Titanic EDA		Titanic EDA
datasets		datasets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The repository contains a guide to Python libraries used for EDA (Exploratory data analysis):

1) NumPy:

2) Pandas:

3) Matplotlib:

4) Seaborn:

5) Plotly and Cufflinks:

6) Geographical plots:

The repository contains EDA (Exploratory data analysis) models for the following datasets:

1) Haberman's Survival Dataset:

A simple dataset contaning cases from a study conducted between 1958 and 1970. The study was conducted at the University of Chicago's Bilings Hospital. The study is on the survival of patients who had undergone surgery for breast cancer.

Number of Instances: 306

Number of Attributes: 4

Attribute Information:

2) Iris Flower Dataset:

Its a simple dataset of 3 flowers of Iris species namely setosa, virginica and versicolor.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

3) Pyramid Scheme

Foundation of a pyramid scheme is to create a tree structure and joining members at every node. Members earn money through a commission based project which extracts money from the members who are joined at the child nodes.

Consider the following variables

Based on the above model, the net sales commission increases with the depth of tree for a chain increases. If the profit amount is negative, it will not make economic sense for the company to invest in that chain. And pilot will try to spin off another chain below him to keep the chain profitable.

4) Titanic Dataset

The data has been split into two groups:

The attritubes are:

About

Releases

Packages

Languages

jayashree8/Machine_learning_EDA

Folders and files

Latest commit

History

Repository files navigation

The repository contains a guide to Python libraries used for EDA (Exploratory data analysis):

1) NumPy:

2) Pandas:

3) Matplotlib:

4) Seaborn:

5) Plotly and Cufflinks:

6) Geographical plots:

The repository contains EDA (Exploratory data analysis) models for the following datasets:

1) Haberman's Survival Dataset:

A simple dataset contaning cases from a study conducted between 1958 and 1970. The study was conducted at the University of Chicago's Bilings Hospital. The study is on the survival of patients who had undergone surgery for breast cancer.

Number of Instances: 306

Number of Attributes: 4

Attribute Information:

2) Iris Flower Dataset:

Its a simple dataset of 3 flowers of Iris species namely setosa, virginica and versicolor.

It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other.

The columns in this dataset are:

3) Pyramid Scheme

Foundation of a pyramid scheme is to create a tree structure and joining members at every node. Members earn money through a commission based project which extracts money from the members who are joined at the child nodes.

Consider the following variables

Based on the above model, the net sales commission increases with the depth of tree for a chain increases. If the profit amount is negative, it will not make economic sense for the company to invest in that chain. And pilot will try to spin off another chain below him to keep the chain profitable.

4) Titanic Dataset

The data has been split into two groups:

The attritubes are:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages