# PhD_Amputee_HAR_MATLAB_Project
Contains MATLAB Files and Functions relating to my PhD project, which involves the analysis of physical activity data from healthy individuals and lower limb amputees in free living conditions. It is recommended that you read my thesis ("Construction of a Clinical Activity Monitoring Framework Based on FreeLiving Investigations of Individuals with Lower Limb Amputation") for a full contextual understanding of what each file and function is doing.
File Description
This contains a curated list of functions and files used in the thesis, and descriptions of what each file/function does. Each section corresponds to a different .zip file, this excludes the subfolder section where each folder has its own separate .zip file.
Due to sensitivity of data, the original video and GPS data for this project have not been included in this dataset. Please contact my supervisor Arjan Buis (arjan.buis@strath.ac.uk) for further information.
Bibliography of used functions at bottom of this readme
## Functions
 **Name of File**  **Purpose** 

 activPAL_acceleration_conversion  Converts accelerometer units in ActivPAL to m/s² 
 AP_Camera_Time_Sync_V3  Synchronizes ActivPAL timestamps to camera times 
 AP_Label_DataAssociation  A function that associates each sample of ActivPAL data with an activity label, used as part of the syncAndLabelFunction 
 bandpass  Applies bandpass filter to signal 
 cepstral_feature_testing  Calculates cepstral coefficients as part of main feature matrix (see FeatureCalculation3) 
 Cohen_Kappa  Calculates Cohen’s kappa. Written by [Cardillo, 2018] 
 confusionmatStats  Calculates statistics relating to confusion matrix (precision, recall, F1 Score), written by [Cheong, 2015] 
 EVDA  Calculates Eigenvalues of Dominant Directions as part of the feature matrix 
 FeatureCalculation3  Calculates all 243 features of the feature matrix 
 fourplot  Displays statistical plots (boxplot, lagplot, histogram, normal probability plot). Written by [Jos, 2015] 
 getRGB  Sets the colour of plots in "PCA_plot" 
 Hill_Calculation_V2  Calculates angle of hill in recordings 
 ICC  Calculates Interclass Correlation Coefficients, written by [Salarian, 2016] 
 InterX  Calculates Intersection of a curve, which is used in elevation plots to determine when hills are present via intersection of thresholds. Written by [NS, 2010] 
 LabelConsolidation  Takes the original labels as extracted from VoTT and condenses them into umbrella labels as necessary 
 nmi  Calculates Normalized Mutual Information, written by [Mo Chen, 2016] 
 PCA_plot  Originally intended just for PCA, this is a function that plots data points in a 2D or 3D space and automatically assigns each class with a different colour and marker 
 Qualitative_Feature_Analysis  This function determines the 'relevant features' of the PCA and mrMR feature selection process as discussed in chapter 6 
 smote  SMOTE function written by [Larsen, 2020] 
 smote_preparation  Due to severe imbalance of dataset, some cursory undersampling of majority class + oversampling of minority class is required in order to make the SMOTE function work. This function generates indexes to windows of certain classes, such that when the main feature matrix is built it repeats certain rows and omits others to create the undersample/oversample effect. Ultimately replaced with smote_preparation_alt 
 smote_preparation_alt  Same purpose as “smote_preparation” only instead of creating windowIndexes it actually performs undersampling/oversampling on the input data 
 StrideSegmentation  First attempt at segmenting by stride using original algorithm. Didn’t work very well and was scrapped. 
 StrideSegmentation2  Segmentation of accelerometer data using VANE event data timestamps for each step (stride) 
 SupervisedClassifierAccuracyCalculation  Calculates accuracy of supervised classifiers, is used in experiment 1 of chapter 6 
 SupervisedPCAClassifierAccuracyCalculation  Calculates accuracy of supervised classifiers after PCA is applied to original dataset, is used in experiment 1 of chapter 6 
 Swtest  ShapiroWilk test written by [BenSaida, 2014] 
 syncAndLabelFunction  Acquires synchronized accelerometer pertaining to the recording sessions, as well as the label corresponding to each sample of acceleration data 
 SyncAndLabelFunctionPlusMagnetometer  Acquires synchronized accelerometer and magnetometer data pertaining to the recording sessions, as well as the label corresponding to each sample of acceleration/magnetometer data 
 tvd_mm  TV Denoising algorithm function written by [Selesnick, 2017] 
 VoTT_info_extraction  Takes the raw file exported from VoTT video annotator and acquires key meta information to be used in sampling processes 
 WindowSegmentation2  Default Segmentation Function 
## Class Files
 **Name of File**  **Purpose** 

 Kernel  Calculates Kernel PCA dimensionality reduction of a matrix, written by [Qiu, 2021] 
 KernelPCA  See above 
 KernelPCAFunction  See above 
 Visualization  See above 
## Subfolders
 Name of File  Purpose 

 Parametric_tSNE  Files for calculating parametric tSNE, which did not work in this investigation. Written by [van der Maaten, 2008] 
 UMAP_Files  Contains files necessary for UMAP dimensionality reduction. Written by [Meehan et al, 2021] 
 PAL datx files  Contains PAL .datx files for project. Raw accelerometer data filesize is too large for uploading to PURE, so instead must export the .csv files for each individual from PAL Analysis. You would need to export the raw accelerometer data and event data 
 MAT files  Contains the MAT files created for the project. Most files are created to cut down on processing time when running long scripts. These files have not been curated so recommend only retrieve them if necessary. 
 CSV Files  Contains the VoTT Ground Truth Timestamp data for each individual 
## Scripts
 Name of File  Purpose 

 Data_Collection_Feature_Calculation_Time  Calculates the time required to obtain the feature matrix used in chapters 6,7 and 8 
 Data_Collection_hill_angle_plots  Acquires the hill angle plots which are used for annotation of uphill and downhill data 
 Data_Collection_raw_data_analysis_1  Creates an overlay plot of all timeseries signals for a single class. Used in Exploratory Data Analysis for chapter 5 but scrapped due to high variability in signals relative to window sample index. 
 Data_Collection_raw_data_analysis_2  Selects random samples from each class and plots the triaxial timeseries. Saves the plots to a subfolder where they are extracted for analysis in Notebook software. 
 Posthoc_Ex1_step_count_icc  Validates ILLA step counts relative to ground truth 
 Posthoc_Ex2_standing_analysis  Validates ILLA standing times relative to ground truth 
 Posthoc_Ex3_magnetometer_posthoc  Analysis of inclusion of magnetometer features in supervised and unsupervised applications.
 Supervised_Learning_Ex1_FilterFS_selection_experiment  Used to determine the most suitable feature selection method for Experiment 1 in chapter 6 
 Supervised_Learning_Ex1_FSDR_Selection  Calculates the average test accuracy of each of the supervised classifiers after application of feature selection and/or PCA (or neither) 
 Supervised_Learning_Ex1_LSTM  Calculates the average test accuracy of the LSTM classifier after various combinations of data selection as applied to the supervised classifiers in the “Supervised_Learning_Ex1_FSDR_Selection” script 
 Supervised_Learning_Ex2_Classifier_Optimize  Determines optimal hyperparameters for each classifier (Supervised + SVM) and outputs test accuracy at level 1 label resolution 
 Supervised_Learning_Ex3_Terrain_Resolution  Calculates SVM and LSTM accuracy for varying levels of label/terrain resolution 
 Supervised_Learning_Ex4_Amputee_Validation  Performs LOSO validation on lower limb amputees using healthy and ILLA data 
 Supervised_Learning_misclassification  Pinpoints samples of data where there is strong positive predictions and strong negative predictions from the SVM and LSTM classifiers. 
 Supervised_Learning_PCA_Important_Features  Determines relevant features as selected by the PCA and mrMR feature selection processes 
 Supervised_Learning_SMOTE_RawSignalPlot  Plots SMOTE’d raw signals in timeseries to ensure that the raw synthesized signals look reasonable 
 Unsupervised_Learning_Ex1_2_3_4  Carries out the first four exercises of chapter 7, thus determines the best population model, the best dimensionality reduction method, the appropriate label resolution, and tuning parameters for the dimensionality reduction method 
 Unsupervised_Learning_Ex5_ClusterAlgorithms  Performs grid search hyperparameter optimization on KMeans, Hierarchical, GMM, and DBSCAN. 
 Unsupervised_Learning_PtII_stairCluster_density  Tests true stair clusters and extraneous walking clusters for statistically significant differences in cluster density 
 Unsupervised_Learning_PtII_stairCluster_loglike  Tests true stair clusters and extraneous walking clusters for statistically significant differences in negative loglikelihoods 
 Unsupervised_Learning_PtIII_stairClusters_param_tuning  Tunes the algorithm parameters for detection of stair clusters 
 Unsupervised_Learning_PtIII_stairClusters_plots  Plots stair cluster detection plots as recognized by Chapter 7 algorithm 

***
## Bibliography
Ahmed BenSaïda (2014). ShapiroWilk and ShapiroFrancia normality tests. (https://www.mathworks.com/matlabcentral/fileexchange/13964shapirowilkandshapirofrancianormalitytests), MATLAB Central File Exchange. Retrieved August 2021.
Giuseppe Cardillo (2018). Cohen's Kappa (https://github.com/dnafinder/Cohen), GitHub. Retrieved August 2021.
Audrey Cheong (2015). confusionmatStats(group,grouphat) (https://www.mathworks.com/matlabcentral/fileexchange/46035confusionmatstatsgroupgrouphat), MATLAB Central File Exchange. Retrieved August 2021.
Jos (2015). fourplot(X) (https://www.mathworks.com/matlabcentral/fileexchange/42480fourplotx), MATLAB Central File Exchange. Retrieved August 2021.
Larsen (2020). matlab_smote (https://github.com/dkbsl/matlab_smote), GitHub. Retrieved August 2021.
Connor Meehan, Jonathan Ebrahimian, Wayne Moore, and Stephen Meehan (2021). Uniform Manifold Approximation and Projection (UMAP) (https://www.mathworks.com/matlabcentral/fileexchange/71902), MATLAB Central File Exchange. Retrieved August 2021
Mo Chen (2016). Normalized Mutual Information (https://www.mathworks.com/matlabcentral/fileexchange/29047normalizedmutualinformation), MATLAB Central File Exchange. Retrieved August 2021.
NS (2010). Curve intersections (https://www.mathworks.com/matlabcentral/fileexchange/22441curveintersections), MATLAB Central File Exchange. Retrieved August 2021.
Qiu (2021). Kernel Principal Component Analysis (KPCA) (https://uk.mathworks.com/matlabcentral/fileexchange/69378kernelprincipalcomponentanalysiskpca), MATLAB Central File Exchange. Retrieved August 2021
Arash Salarian (2016). Intraclass Correlation Coefficient (ICC) (https://www.mathworks.com/matlabcentral/fileexchange/22099intraclasscorrelationcoefficienticc), MATLAB Central File Exchange. Retrieved August 2021.
Ivan Selesnick (2017). Total Variation Denoising (an MM algorithm) (https://eeweb.engineering.nyu.edu/iselesni/lecture_notes/TVDmm/TVDmm.pdf). Retrieved August 2021
Laurens van der Maaten (2008). Learning a Parametric Embedding by Preserving Local Structure (https://lvdmaaten.github.io/tsne/#implementations). Retrieved August 2021