Skip to content

micheleandreucci/Data-Mining-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DM 2 Project

Project for Data Mining 2 A.A. 2020/2021

Dataset

link: https://github.com/mdeff/fma

The Dataset For Music Analysis reports the data of 106.574 tracks (objects) with their respective 53 attributes where we can find useful information about the the license typology, interest of the track, information of the album, creation of the album.

Files:

Below is the list of files along with its purpose.

  • Advanced tecniques of clustering: On a dataset already prepared for one of the previous tasks, run at least one clustering algorithm(e.g. X-Means, Bisecting K-Means, OPTICS). Discuss the results that you find analyzing the clusters and reporting external validation measures (e.g SSE, silhouette).
  • Transactional clustering: By using categorical features, or by turning a dataset with continuous variables into a dataset with categorical variables (e.g. by using binning), run at least one clustering algorithm(e.g. K-Modes, ROCK). Discuss the results that you find analyzing the clusters and reporting external validation measures (e.g SSE, silhouette).
  • Sequential Pattern Mining: Convert the time series into a discrete format (e.g., by using SAX) and extract the most frequent sequential patterns (of at least length 3/4) using different values of support, then discuss the most interesting sequences.
  • Time Series Analysis
  • Advanced Classification Methods(Naive Bayes Classifier, Logistic Regression, Rule-based Classifiers, Support Vector Machines, Neural Networks, Ensemble Methods). Evaluate each classifier with the following techniques (accuracy, precision, recall, F1-score, ROC curve)
  • Imbalanced Learning and Anomaly Detection
  • Explainability.ipynb: To use one or more explanation methods (e.g., LIME, LORE, SHAP, etc.) to illustrate the reasons for the classification in one of the steps of the previous tasks.