I recently had this question in consulting:
I’ve got 12 out of 645 cases with Mahalanobis’s Distances above the critical value, so I removed them and reran the analysis, only to find that another 10 cases were now outside the value. I removed these, and another 10 appeared, and so on until I have removed over 100 cases from my analysis! Surely this can’t be right!?! Do you know any way around this? It is really slowing down my analysis and I have no idea how to sort this out!!
And this was my response:
I wrote an article about dropping outliers. As you’ll see, you can’t just drop outliers without a REALLY good reason. Being influential is not in itself a good enough reason to drop data.
Craig slinkman says
There is another cause if outliers. The data point may be an indication that the model is insufficient and that something spins seriously wrong. Examples of this are a missing predictor variable or an incorrect functional form.
Therefore be careful about dipping outliers from you data set. I am retired and no longer have my books but I suggest you look at Sanford Weisberg’s Applied Linear Regression.
soma says
hey! thanks for sharing this!
i have a question and unfortunately i couldn’t find my answer.i hope some one can help here.
i know how to remove outliers.but what dint understand is ,should i only remove it according to my dependent variable vector?or i should do it for other vectors too?for example if i want to estimate salary according to age and education ,…
should i remove records which they are outlier in age vector?
Meenu says
Hi Karen, the newsletter and the dropping outliers link do not seem to be available. Is there a way I can get the answer to the question posted above?
Thanks
Meenu
Karen says
Hi Meenu, I fixed it.