Abstract
Regression analysis is conceptually the simplest method use for investigating the functional relationship between dependent and independents variables. In this paper, the problems of over and under detection of outliers in data sets, is put into test by applying the various methods to data set without outliers injection at various sample sizes.
This study reviews methods of outliers detection in multiple linear regressions using Deffits, Cooks distance, Dfbetas, R-students and Mahalanobis distance. It was seen from the result analyzed that the methods of outliers detection had different performance when detecting outliers in data set at various sample sizes. Data simulation were done without injection of outliers to independent and dependent variables.
The R-code simulation shows the performance of five outliers detection methods in multiple linear regression, from the five techniques compared Dfbetas, performed better than all the methods for all the sample size except at sample size of 10. The next best method is cooks distance specifically for the higher sample size of 30, 50 and 100. mahalanobis and Deffits are more liberal among the all other outlier procedures.
Key words: outliers, outlier detection, multiple linear regression, simulation.
|