Removing outliers « Reenigne blog

Removing outliers

If you've got a sequence of data that you want to do some statistical analysis on, but you know that some of it is bad, how do you remove the bad data? You could just remove the top 5% and bottom 5% of values, for example, but maybe you don't have any bad data (or maybe you have lots) and that would adverse affect your measurement of the standard deviation.

Suppose that you know that your dataset is supposed to follow a normal distribution (lots do). Then you could remove outliers by measuring the skew and kurtosis (3rd and 4th moments), and just repeatedly remove the sample furthest from the mean until these measurements look correct. This algorithm is guaranteed to terminate since if you only have 2 samples the skew and kurtosis will be 0. You've still got a parameter or two to tune though (how much skew and kurtosis you'll tolerate).

This entry was posted on Monday, October 17th, 2011 at 4:00 pm and is filed under maths. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to “Removing outliers”

Felipe Lopez says:

March 23, 2012 at 2:59 am

Hello Andrew! i sent you an email, plz check it.

regards!

Reply

Removing outliers

One Response to “Removing outliers”

Leave a Reply