A User-Friendly Interface for Outlier Detection in Physical Activity Step Counts.


Physical activity programs have become immensely popular among different age groups largely due to the steep growth in non communicable diseases. According to the World Health Organization mojority of the world’s adolescents are insufficiently physically active, leading to a higher risk of contracting non communicable diseases. Physical activity programs introduced in response to this problem are increasingly gaining popularity in the wider community. Walking and running are the most popular physical activities that are included in the majority of these physical activity programs, with step counts used to measure the level of activity. In order to collect participants step counts various methods have been implemented using devices such as pedometers or accelerometers or other wearable fitness tracking devices, while additionally allowing participants to enter and edit their step entries. This can lead to some abnormal entries which might result in outliers, which may adversely affect the statistical analysis of the program outcomes. Administrators of these physical activity programs remove these data points manually which is time consuming. To alleviate this problem, we develop an automated tool, step counts outlier detection using the R Shiny environment to detect multiple outliers in step entries, while allowing the researchers and program developers to investigate the detected outliers more effectively and efficiently. The interface is comprised of outlier detection methods based on median absolute deviation, Grubb’s test, local outlier factor, nearest neighbor and aberration detection algorithms such as Early Aberration Detection System, Bayesian Outbreak Detection Algorithm and a robust quasi Poisson regression algorithm, which will enable the administrators to compare and identify outliers via different approaches. First, we assess the efficacy of these methods using simulated step count entries. Then we apply these methods to real step count data collected in the Virgin Pulse 100 Day Global Challenge program. We observe that these methods provide effective approaches for detecting multiple outliers in step entries with high levels of precision and importantly provide a user friendly automated interface.

Aug 30, 30300 3:00 PM — 4:00 PM
Melbourne Convention and Exhibition Centre
Melbourne, Victoria
S. Sandun M. Silva
S. Sandun M. Silva

My research interests include biostatistics, data science and genome-wide association studies (GWAS)