Sport Activity Classification Using Classical Machine Learning and Time Series Methods
Аннотация
Human activity recognition has been well investigated field for a decade especially using inertial sen-sor data recorded by a smartphone or sport-watch, or any device which can be used daily such as activity trackers. However, this type of data is not always available, therefore the development of alternative methods may be reasonable. In this thesis, retrospective personalized supervised sport ac-tivity classification is implemented applying a comprehensive set of classic machine learning and time series classification algorithms into the sport data recorded by an individual athlete. Retrospective in the sense that the actual classification happens after completing the activity and finishing the recording. The data consists of five categories biking, running, walking, skiing, and roller skiing and extracted features from a several sensors such as heart rate, geolocation, barometer, and accelerometer, and by contextualizing sensor data with a personal data generating a calorie consumption feature, etc. The same original data is structured in three different ways, since there are three diverging types of clas-sifiers, in order to find the best method to label sport activities. Classification tasks are divided into a standard classical machine learning, univariate time series classification and multivariate time series classification. In time series classification particular set of features is used, namely heart rate, speed, and altitude, and also classification task is reduced to three categories, biking, running and other. For univariate classification, this three-dimensional multivariate data is transformed into a vector using interlacing method, whereas applied multivariate classifier has the integrated ability to construct ap-plicable data from features. The obtained results show that by using 21 relevant features, five catego-ries were clearly separable producing over 95 percent accuracy by several standard machine learning models, such as support vector classifier, logistic regression classifiers, and k-nearest neighbours’ classifier. Adopted multivariate model MUSE, which conducts word extraction from the features achieved the accuracy of 96.6 % but requiring an order of magnitude more time. Applied interlacing method for a univariate data transformation was also successful, since several models performed by over 90 % accuracy, and best of them, supervised time series forest classifier up to 93.1 %. Based on the analysis, all applied methods have their strengths and the method choice ultimately depends on the available data.