Human Activity Recognition (HAR) refers to the capability of machines to identify various activities performed by the users. The knowledge acquired from these recognition systems is integrated into many applications where the associated device uses it to identify actions or gestures and performs predefined tasks in response.
For this assignent we will be using a publically available dataset called UCI-HAR. The dataset is available to download here. The Dataset contains data for 30 participants . Each participant performed six activities while wearing a Samsung Galaxy S II smartphone on their waist (The video of the participants taking data is also available here). The smartphone's accelerometer and gyroscope captured 3-axial linear acceleration and 3-axial angular velocity. Read all the readme
and info
files for more information.
We will use the raw accelerometer data within the inertial_signals folder. The provided script, CombineScript.py
, organizes and sorts accelerometer data, establishing separate classes for each category and compiling participant data into these classes. MakeDataset.py
script is used to read through all the participant data and create a single dataset. The dataset is then split into train,test and validation set. We focus on the first 10 seconds of activity, translating to the initial 500 data samples due to a sampling rate of 50Hz.
- Step-1> Place the
CombineScript.py
andMakeDataset.py
in the same folder that contains the UCI dataset. Ensure you have moved into the folder before running the scripts. If you are runing the scripts from a different folder, you will have to play around with the paths in the scripts to make it work. - Step-2> Run
CombineScript.py
and provide the paths to test and train folders in UCI dataset. This will create a folder calledCombined
which will contain all the data from all the participants. This is how most of the datasets are organized. You may encounter similar dataset structures in the future. - Step-3> Run
MakeDataset.py
and provide the path toCombined
folder. This will create a Dataset which will contain the train, test and validation set. You can use this dataset to train your models.
-
Plot the waveform for data from each activity class. Are you able to see any difference/similarities between the activities? You can plot a subplot having 6 colunms to show differences/similarities between the activities. Do you think the model will be able to classify the activities based on the data?
-
Do you think we need a machine learning model to differentiate between static activities (laying, sitting, standing) and dynamic activities(walking, walking_downstairs, walking_upstairs)? Look at the linear acceleration
$(acc_x^2+acc_y^2+acc_z^2)$ for each activity and justify your answer. -
Train Decision Tree using trainset and report Accuracy and confusion matrix using testset.
-
Train Decision Tree with varrying depths (2-8) using trainset and report accuracy and confusion matrix using Test set. Does the accuracy changes when the depth is increased? Plot the accuracies and reason why such a result has been obtained.
-
Use PCA (Principal Component Analysis) on Total Acceleration
$(acc_x^2+acc_y^2+acc_z^2)$ to compress the acceleration timeseries into two features and plot a scatter plot to visualize different class of activities. Next, use TSFEL (a featurizer library) to create features (your choice which ones you feel are useful) and then perform PCA to obtain two features. Plot a scatter plot to visualize different class of activities. Are you able to see any difference? -
Use the features obtained from TSFEL and train a Decision Tree. Report the accuracy and confusion matrix using test set. Does featurizing works better than using the raw data? Train Decision Tree with varrying depths (2-8) and compare the accuracies obtained in Q4 with the accuracies obtained using featured trainset. Plot the accuracies obtained in Q4 against the accuracies obtained in this question.
-
Are there any participants/ activitivies where the Model performace is bad? If Yes, Why?
Utilize apps like Physics Toolbox Suite
from your smartphone to collect your data in .csv/.txt format. Ensure at least 15 seconds of data is collected, trimming edges to obtain 10 seconds of relevant data. Collect 3-5 samples per activity class and report accuracy using both featurized and raw data. You have to train on UCI dataset (You can use the entire dataset if you want) and test it on the data that you have collected and report the accuracy and confusion matrix. Test your model's performance on the collected data, explaining why it succeeded or failed.
- Ensure the phone is placed in the same position for all the activities.
- Ensure the phone is in the same alignment during the activity as changing the alignment will change the data collected and will affect the model's performance.
- Ensure to have atleast 10s of data per file for training. As the data is collected at 50Hz, you will have 500 data samples.
This project was done as part of the Machine Learning course at IIT Gandhinagar, under Prof. Nipun Batra.