SOCIETY
Consequences of inappropriate detection and removal of outliers in statistics
- 1 Institute of Population and Human Studies, Bulgarian Academy of Science, Sofia, Bulgaria
Abstract
In statistics, the presence of outliers in the data set could wrongly distort the estimation of the mean. In addition, the extreme values increase the variability and consequently, the power of the statistical methods decreases. However, there are disagreements in the literature both about what the nature of outliers is, and about how to deal with them when doing further statistical analyses. A lot of conventional procedures for both detecting and dealing with outliers are discussed. The effect of increase the probability of error of the first order type is demonstrated with two simple simulations. The general conclusion is that an information outside of the data set is necessary for a correct decision. This information could come only from the human expertise of the researchers of the specific domain of in terest. The importance of the topic for outliers is discussed, the need of deeper analyses, accompanied with many simulation studies, is argued.
Keywords
References
- Outliers (7 April 2022) In Wikipedia, https://en.wikipedia.org/wiki/Outlier
- Ilyas, I. F., & Chu, X. (2019). Data cleaning. Morgan & Claypool.
- Kazil, J., & Jarmul, K. (2016). Data wrangling with python: tips and tools to make your life easier. O'Reilly Media, Inc.
- Wiggins, B. J., & Christopherson, C. D. (2019). The replication crisis in psychology: An overview for theoretical and philosophical psychology. Journal of Theoretical and Philosophical Psychology, 39(4), 202.
- Frost, J. (2022) Five Ways to Find Outliers in Your Data. In Statistics by Jim. https://statisticsbyjim.com/basics/outliers/
- Grubbs, F. E. (February 1969). Procedures for detecting outlying observations in samples. Technometrics. 11 (1): 1–21.
- Frost, J. (2022) Guidelines for Removing and Handling Outliers in Data. In Statistics by Jim. https://statisticsbyjim.com/basics/remove-outliers/
- Weber–Fechner law (29 March 2022) In Wikipedia, https://en.wikipedia.org/wiki/Weber%E2%80%93Fechner_law
- Lachaud, C. M., & Renaud, O. (2011). A tutorial for analyzing human reaction times: How to filter data, manage missing values, and choose a statistical model. Applied Psycholinguistics, 32(2), 389-416.
- Changyong, F. E. N. G., Hongyue, W. A. N. G., Naiji, L. U., Tian, C. H. E. N., Hua, H. E., & Ying, L. U. (2014). Log-transformation and its implications for data analysis. Shanghai archives of psychiatry, 26(2), 105.