Name		Name	Last commit message	Last commit date
parent directory ..
python		python
Lab_DT_Iris_en.pdf		Lab_DT_Iris_en.pdf
README.md		README.md

README.md

Decision Tree

Decision tree is an algorithm that classifies data by forming a tree structure. Decision tree is made of nodes which have rules associated with them. The idea is to find an attribute of a dataset that best classifies it. When such attribute is found, a tree is splitted into several sub-trees each containing a possible value of the attribute (approach of this implementation). Then, each sub-tree is treated as tree, which means that a new attribute that now best classifies data is chosen and so on. The process of splitting finishes, when there are no attributes to use for splitting or the depth of the tree is already too big (customizable variable) and splitting a tree leads to overfitting.

To split a tree we need to have a function that would help us to decide which attribute bes classifies data. For this we need the following functions:

Shannon Entropy Shannon entropy is used to calculate the homogenity of an attribute, i.e. whether data is divided evenly or not. Formula of the Shannon entropy of an attribute is given by $ H(A) = - \sum_i{p_i \log_2 p_i} $. Formula of the Shannon entropy of two attributes is given by $ H(A, B) = - \sum_i^B{p_i * H(i)} $.
Informaiton Gain (Discriminative Power) Information gain is a function that defines which attribute of the dataset classifies it the best. Formula of the information gain is given by $ G(A, B) = E(A) - E(A, B) $

Implementations:

Python 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

decision_tree

decision_tree

README.md

Decision Tree

Implementations:

Files

decision_tree

Directory actions

More options

Directory actions

More options

Latest commit

History

decision_tree

Folders and files

parent directory

README.md

Decision Tree

Implementations: