Detecting possible persons of interest in a physical activity program using step entries Including a web-based application for outlier detection and decision making

Abstract

According to recent statistics from the World Health Organization, 23% of people aged 18 years and over are not sufficiently physically active. Strangely, this is at a time when, due to the improvement in sensor technology, physical activity programs that track physical activity have become popular. However, some participants who enroll in these programs cheat by manipulating the data they enter. This can be discouraging for other participants, also invalidating the overall accuracy of program outcomes. Therefore, detecting these participants and discarding their manipulated entries is important in order to maintain the quality of the program. Currently, most of these physical activity programs use manual processes to detect and reject fraudulent step entries by reviewing the participant’s demographic profiles along with their longitudinal step count performance data. In this study, a process, including two parallel models for detecting person of interest characteristics and abnormal step count entries, is developed. The first model uses the penalized logistic regression with Synthetic Minority Over-sampling Technique subsampling to address the imbalance in the proportion of genuine and persons of interest. Having a highly imbalanced distribution between genuine and person of interest profiles makes this task more challenging. The second model uses a variety of outlier detection methods to detect and reject abnormal step entries based on previously entered data. This process will be more efficient and productive compared to the current manual system and will support better decision-making in the future. The proposed system can be applied for other fraud detection applications after suitable adjustments.

Publication
Special Issue ISCB ASC 2018, 62(2)
S. Sandun M. Silva
S. Sandun M. Silva
Biostatistician

My research interests include biostatistics, data science and genome-wide association studies (GWAS)

Related