Skip to content

Algorithms and Data Structures for Data Science and Machine Learning

License

Notifications You must be signed in to change notification settings

volkansonmez/Algorithms_and_Data_Structures-1

Repository files navigation

Volkan Sonmez's Machine Learning Projects

© 2018 - current, Volkan Sonmez, www.pythonicfool.com

This is a repository of teaching materials, code, and data for my data analysis and machine learning projects.

Each repository will (usually) correspond to one of the posts on my website.

You are free to:

  • Share—copy and redistribute the material in any medium or format
  • Adapt—remix, transform, and build upon the material

Under the following terms:

  • Attribution—You must give appropriate credit (mentioning that your work is derived from work that is © Volkan Sonmez and, where practical, linking to http://www.pythonicfool.com/), and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. License

Algorithms and Data Structures for Data Science and Machine Learning

(Algorithms and Data Structures for Data Science and Machine Learning - 1)

  • Supervised Learning:

Decision Tree:

Decision Tree made for splitting data for its best attributes. All attributes are recommended to be numerical. If not, all attributes should be converted into numerical values and standardized before executing the tree for training. By measuring the gini index of the related (parent-child) nodes, the splits are made to achieve the largest information gain.

Random Forest:

To make a Random Forest, there needs to be n decision trees which are the building blocks of Random Forest algorithm. Find the optimal number of trees trained with optimal portion of the data and the optimal gini for achieving the best accuracy.

K-Nearest-Neighbor (KNN):

KNN algorithm (K-Nearest Neighbor) for both classification and regression problems.

  • Unsupervised Learning:

Kmeans & Kmeans-Plus-Plus:

First Part: Kmeans plus plus algorithm to initialize the centroids. The centroids are initialized based on their probability index set by their distance from each other

Second Part: Kmeans algorithm to cluster data into groups

  • Data Structures:

Linked List:

Singly Connected Linked List with 20+ methods and its FIFO and LIFO applications.

Doubly Connected Linked List:

Doubly Connected Linked List with several methods.

Binary Search Tree:

Binary search tree with: add node, bfs, dfs, tree validation, delete node, find node, and search node methods.

Graph + Priority Heap with Graph Traversal Algorithms:

Djikstra, Heapq, Djikstra improved with heapq, and Bellman-Ford algorithms.