Project was made using as the base for its dataset prepared by Jorge L. Reyes-Ortiz, that contained samples from smartphones' gyroscopes and accelerometers, and also labels corresponding to the one of users states: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING. Link to the dataset: UCI HAR Dataset
My purpose was to use the data and find the best model to make decisions based on them. That's why I was testing the SVM model- to find how good was an idea of dataset creator to use this- and Linear Regression. Finding which one would fit better and work better- giving result in determining the physical condition of the device user based on input' steam data.
Author's article: article
Information about dataset is in UCI HAR Dataset folder.
- SVC
- LogisticRegresion
- PCA
- StandardScaler
- GridSearchCV, RandomizedSearchCV
- Pipeline
- OneVsRestClassifier
- roc_curve, label_binarize
all from sklearn library. I also used Numpy
and Matplotlib
library.
- Firstly I loaded the sets. This one with data to model I decomposed with use of PCA.
- Then I plotted the decomposed to 24 dimensions data and their labels.
- Then I trained the SVC model on data in few configurations with use of cross validation.
- And on the same data I trained Linear Regression one.
- Finally, I made a classification repost for best two models.
Better mean results, using grid searcher, showed SVC model, but was computing much slower. Also, for another comparison for final repost, better scores gives SVC.
Firstly, I was trying to use as the learning set decomposed set with raw data from instruments, but without well-made signal preprocessing, it was giving low result (best: 0.67 accuracy for SVM), so I used special set already prepared by researchers. Also, I was trying to use the Hidden Markov' Chains model with use of hmmlearn library. But after long fight I reconciled that tools aren't made for my case in which I wanted to use labels set.