Modified scripts from ML course 02450 from Technical University of Denmark. The scripts have been modified for custom use (e.g. automation of various processes, use of pandas rather than numpy arrays and such). See topics in About and go to the folder figures to get a quick overview of the conducted data analysis and visualization.
The course is build in three phases:
- Data Feature extraction and Visualization
- Supervised Learning Classification Regression
- Unsupervised Learning Clustering and Density Estimation
In the folder figures all computed figures from the scripts in the folder scripts are saved. The figures are named after the script name, and are, as for now, in chronilogical order with respect to the ML course 02459 from Technical Univisersity of Denmark.
- Rename scripts to mirror contect
The apriori method included in the Tools is taken from http://www.borgelt.net/apriori.html, for details of the algorithm see also http://www.borgelt.net/doc/apriori/apriori.
- matplotlib (imshow, ...)
- os, sys
- pandas, numpy
- scipy
- sklearn (preprocessing, metrics, decomposition, ...)
Description of the datasets in the Data folder:
- body.mat A subset of the dataset on body dimenstions. Described in G. Heinz, L. J. Peterson, R. W. Johnson, and C. J. Kerk, “Exploring relationships in body dimensions,” Journal of Statistics Education, vol. 11, no. 2, 2003.
- faithful.mat and faithful.txt Dataset on eruption of the Old Faithful geyser described in A. Azzalini and A. Bowman, “A look at some data on the old faithful geyser,” Applied Statistics, pp. 357–365, 1990. W. Härdle, Smoothing techniques: with implementation in S. Springer, 1991
- female.txt and male.txt
- iris.xls Fisher's Iris data (see description)
- nanonose.xls This data has been taken from the nanonose project, which is described in T. S. Alstrøm, J. Larsen, C. H. Nielsen, and N. B. Larsen, “Data-driven modeling of nano-nose gas sensor arrays,” in SPIE Defense, Security, and Sensing. International Society for Optics and Photonics, 2010, pp. 76 970U–76 970U.
- StopWords A txt file of list of common words provided in the TMG toolbox.
- textDocs.txt This example of documents for a term-document matrix is taken from L. Eldén, Matrix Methods in Data Mining and Pattern Recognition. Philadelphia, PA, USA: Society for Industrial and Applied Mathematics, 2007.
- Wine.mat and Wine2.mat P. Cortez, A. Cerdeira, F. Almeida, T. Matos, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547–553, 2009. Wine2 is same as Wine but with some outliers removed.
- zipdata.mat and digits.mat. USPS handwritten digits, see also J. J. Hull, “A database for handwritten text recognition research,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 16, no. 5, pp. 550–554,1994.
- wildfaces.mat and wildfaces_grayscale.mat Described in Tamara L. Berg, Alexander C. Berg, Jaety Edwards, David A. Forsyth Neural Information Processing Systems (NIPS), 2004. The wildfaces.mat is an extract with 1000 examples of the original dataset and wildfaces_grayscale a gray scale converted version of these 1000 examples taken from the original data.