Why standard deviation?
Moderators: gmalivuk, Moderators General, Prelates

 Posts: 14
 Joined: Thu Dec 31, 2009 9:36 pm UTC
Why standard deviation?
I'm taking a Psych Stats course this semester and we're getting into measures of variability. I'm understanding all of the math and terminology, but one thing is bothering me: why do we use the standard deviation?
I get that the standard deviation is the square root of the mean of squared deviations, and that for a normal distribution 68.27% of scores will fall within one standard deviation of the mean, 95.45% within two standard deviations, etc., but it still seems like a rather arbitrary measure. I've read the merged thread from ages ago (apologies if this should have gone there, the rules weren't clear on thread necromancy), but nothing there gave a satisfactory explanation. Most of the answers were either "It makes sense when doing higher math" or "Because it works". When I asked my professor about it, he muttered something about needing to see the proof of the normal curve and sort of apologetically admitted to not being able to give me the answer. If someone could really break it down for me like I'm five, I'd appreciate it a lot.
I get that the standard deviation is the square root of the mean of squared deviations, and that for a normal distribution 68.27% of scores will fall within one standard deviation of the mean, 95.45% within two standard deviations, etc., but it still seems like a rather arbitrary measure. I've read the merged thread from ages ago (apologies if this should have gone there, the rules weren't clear on thread necromancy), but nothing there gave a satisfactory explanation. Most of the answers were either "It makes sense when doing higher math" or "Because it works". When I asked my professor about it, he muttered something about needing to see the proof of the normal curve and sort of apologetically admitted to not being able to give me the answer. If someone could really break it down for me like I'm five, I'd appreciate it a lot.
Re: Why standard deviation?
Are you wondering why we don't just use the variance instead of taking its square root? Or are you wondering why we care about such things at all? Or is this a question of why we like this measure of the "spread" or "dispersion" of the data as opposed to some other way of measuring it?

 Posts: 481
 Joined: Wed Sep 21, 2011 3:44 am UTC
Re: Why standard deviation?
Basically, if you think of the mean as being a value that defines what the "typical" value of a given sample will be, the standard deviation is a value that defines the "typical" amount that a given sample will vary away from that mean. If you want to define a value to give you this information, the first instinct might be to simply measure each sample's distance from the mean and average all the distances. The problem is that if you do this just by subtracting the mean from each value, then values below the mean yield negative distance, and when you sum up all the distances, the negatives will cancel out the positives and you get zero.
So you need a formula that will pertain to the magnitude of the distance from the mean without regard for the direction of that distance. The most direct function to do this would be absolute value. You would take the absolute value of the distance between each sample and the mean, and average the resulting values together. On its own there's nothing wrong with this convention, but although absolute value by definition does what we want in this application, it's not on its own a simple algebraic function. So we should find a way to express a similar process in simple algebraic terms for ease of use.
Fortunately, another way to express x is (x^2)^(1/2), where you only take the positive square root. Squaring and then immediately taking the positive square root is completely identical to taking the absolute value, but if we split up the squaring and the taking of the square root with some intermediate step, then we have a function that does not simplify down to absolute value, but serves the same basic purpose in our application. In our case, the only other step we're worried about is the averaging of the values, so that's what we put between the squaring and the taking of the square root.
So the final result is that we define the standard deviation by first squaring all the distances from each sample to the mean, then taking the mean of all the resulting values, then taking the square root of the result.
I don't know if that reasoning has any bearing on official reasons why this is considered the best convention, but it's an explanation that makes the existing convention at least make intuitive sense to me.
So you need a formula that will pertain to the magnitude of the distance from the mean without regard for the direction of that distance. The most direct function to do this would be absolute value. You would take the absolute value of the distance between each sample and the mean, and average the resulting values together. On its own there's nothing wrong with this convention, but although absolute value by definition does what we want in this application, it's not on its own a simple algebraic function. So we should find a way to express a similar process in simple algebraic terms for ease of use.
Fortunately, another way to express x is (x^2)^(1/2), where you only take the positive square root. Squaring and then immediately taking the positive square root is completely identical to taking the absolute value, but if we split up the squaring and the taking of the square root with some intermediate step, then we have a function that does not simplify down to absolute value, but serves the same basic purpose in our application. In our case, the only other step we're worried about is the averaging of the values, so that's what we put between the squaring and the taking of the square root.
So the final result is that we define the standard deviation by first squaring all the distances from each sample to the mean, then taking the mean of all the resulting values, then taking the square root of the result.
I don't know if that reasoning has any bearing on official reasons why this is considered the best convention, but it's an explanation that makes the existing convention at least make intuitive sense to me.
Re: Why standard deviation?
We want some way of measuring the spread of a distribution. There are many ways of doing this, but the standard deviation is the favourite because it shows up in the Gaussian distribution. The Gaussian is everyone's favourite distribution because it shows up all over the place, especially coming out of the central limit theorem.
 Indigo is a lie.
Which idiot decided that websites can't go within 4cm of the edge of the screen?
There should be a null word, for the question "Is anybody there?" and to see if microphones are on.
Re: Why standard deviation?
TennysonXII wrote:I'm taking a Psych Stats course this semester and we're getting into measures of variability. I'm understanding all of the math and terminology, but one thing is bothering me: why do we use the standard deviation?
I get that the standard deviation is the square root of the mean of squared deviations, and that for a normal distribution 68.27% of scores will fall within one standard deviation of the mean, 95.45% within two standard deviations, etc., but it still seems like a rather arbitrary measure. I've read the merged thread from ages ago (apologies if this should have gone there, the rules weren't clear on thread necromancy), but nothing there gave a satisfactory explanation. Most of the answers were either "It makes sense when doing higher math" or "Because it works". When I asked my professor about it, he muttered something about needing to see the proof of the normal curve and sort of apologetically admitted to not being able to give me the answer. If someone could really break it down for me like I'm five, I'd appreciate it a lot.
Well we need a measure of the variation observed, and the standard deviation gives us this. It has lots of nice properties as others have mentioned, it being coupled with the normal distribution, which we can often use as our error distribution (we might have to fiddle with the data a bit first, but we can do that in understood ways). There are distributions where the standard deviation is a bit less meaningful: for any heavily skewed distribution (having more weight to the left or right of the mean than it does on the other side) then it gives us less intuitive information.
Elvish Pillager wrote:you're basically a daytimemiller: you always come up as guilty to scumdar.
Re: Why standard deviation?
I'd say don't get too distracted by the behaviour of the normal distribution. The standard deviation applies to any distribution.
Re: Why standard deviation?
Like arbiteroftruth mentioned the standard deviation isn't the only way to measure spread  but it is the most convenient. The mean absolute deviation (MAD) which is the average of the absolute values of the differences from the mean is actually used for some things. But it isn't an easy quantity to do math with. The standard deviation (or rather the variance...) is a lot easier to do work with because it does pretty much what we want and has a couple useful properties (the variance is additive for independent random variables which is really nice). It also has a really natural interpretation for the normal distribution.
The variance is also one of the central moments of a distribution which are interesting and useful to describing various features of a distribution. Some distributions have really interesting mean to variance relationships (in the Poisson the mean and the variance are always the same). I don't know if we get some of those interesting things with the MAD.
The variance is also one of the central moments of a distribution which are interesting and useful to describing various features of a distribution. Some distributions have really interesting mean to variance relationships (in the Poisson the mean and the variance are always the same). I don't know if we get some of those interesting things with the MAD.
double epsilon = .0000001;
Re: Why standard deviation?
Macbi wrote:The Gaussian is everyone's favourite distribution because it shows up all over the place, especially coming out of the central limit theorem.
Macbi is right, but I'd like to point out that other distributions also show up all over the place. The Gaussian is not everywhere. While you can measure the standard deviation of any set of data, that calculated number has limited meaning outside of Gaussian distributions. If your data is not Gaussian distributed, be very careful how you interpret your standard deviation.

 Posts: 14
 Joined: Thu Dec 31, 2009 9:36 pm UTC
Re: Why standard deviation?
Dunno if I'm asking the right question here. Happens all the time. Let me try a different way.
The interquartile range makes sense to me. Four chunks of 25 is an easy way to deal with data, so a long time ago someone arbitrarily said "Let's break up the data that way!" Hell, even the MAD makes sense to me. What I don't understand is why someone would go, "Let's take the square root of the mean of squared deviations, that will really work!" What's so special about the standard deviation? How is it that x% of scores will always show up within y standard deviations of the mean on a normal curve?
The interquartile range makes sense to me. Four chunks of 25 is an easy way to deal with data, so a long time ago someone arbitrarily said "Let's break up the data that way!" Hell, even the MAD makes sense to me. What I don't understand is why someone would go, "Let's take the square root of the mean of squared deviations, that will really work!" What's so special about the standard deviation? How is it that x% of scores will always show up within y standard deviations of the mean on a normal curve?
Re: Why standard deviation?
TennysonXII wrote:Dunno if I'm asking the right question here. Happens all the time. Let me try a different way.
The interquartile range makes sense to me. Four chunks of 25 is an easy way to deal with data, so a long time ago someone arbitrarily said "Let's break up the data that way!" Hell, even the MAD makes sense to me. What I don't understand is why someone would go, "Let's take the square root of the mean of squared deviations, that will really work!" What's so special about the standard deviation? How is it that x% of scores will always show up within y standard deviations of the mean on a normal curve?
Well look at the way the normal distribution is defined! The shape of the curve is dependent on the standard deviation! Given that this is the case, of course theres going to be lots of nice properties there.
If the mean absolute difference makes sense to you, why not the square? In general its not a ridiculous thing to square distances lets think about the distance between two points in a two dimensional plane here. We can measure how far east one point is from the other, and how far apart north they are. So whats the distance? Well pythaogras' theorem says that a^2=b^2+c^2 for a right angled triangle, so the distance between these two points is the square root of the square of the difference in eastwest direction, and difference in north south direction.
We can extend this to any number of dimensions, so on a sphere we can look at the left right, up down distance, and distance in depth, square and sum them, then root to get the distance between their points. This is called the cartesian distance.
If we have a set of data points, and their mean, we can (kinda) think of each of these deviations as representing a distance on an nsided plane, so a measure of the data's distance from the mean is given by the cartesian distance.
Elvish Pillager wrote:you're basically a daytimemiller: you always come up as guilty to scumdar.
Re: Why standard deviation?
I also had a bit of trouble with the notions of variance and standard deviation when I was first introduced to them. It wasn't really intuitive to me why we were squaring.
What made more intuitive sense to me at that time was: take the average of the absolute values of the deviations from the mean.
My instructor explained it by saying the absolute value isn't a nice function to do calculus with, so we use squares instead.
At the time, that almost seemed a little bit like a copout to me  are we using squares just because it's convenient, and not because it captures some intuitive notion of typical deviation from the mean?
Perhaps one possible answer to your question is the following:
In general, when measuring how far a set of values is from some other set of values, the square root of the sum of the squares of the differences is in fact a natural measure to use.
Euclidean distance in two, three, or more dimensions is a square root of a sum of squares of differences.
EDIT: I was scooped by mister k. Maybe my remarks are still of some use.
What made more intuitive sense to me at that time was: take the average of the absolute values of the deviations from the mean.
My instructor explained it by saying the absolute value isn't a nice function to do calculus with, so we use squares instead.
At the time, that almost seemed a little bit like a copout to me  are we using squares just because it's convenient, and not because it captures some intuitive notion of typical deviation from the mean?
Perhaps one possible answer to your question is the following:
In general, when measuring how far a set of values is from some other set of values, the square root of the sum of the squares of the differences is in fact a natural measure to use.
Euclidean distance in two, three, or more dimensions is a square root of a sum of squares of differences.
EDIT: I was scooped by mister k. Maybe my remarks are still of some use.
Re: Why standard deviation?
TennysonXII wrote:Dunno if I'm asking the right question here. Happens all the time. Let me try a different way.
The interquartile range makes sense to me. Four chunks of 25 is an easy way to deal with data, so a long time ago someone arbitrarily said "Let's break up the data that way!" Hell, even the MAD makes sense to me. What I don't understand is why someone would go, "Let's take the square root of the mean of squared deviations, that will really work!" What's so special about the standard deviation? How is it that x% of scores will always show up within y standard deviations of the mean on a normal curve?
You can scale all errors you use with a positive real factor and nothing would change. Only the interpretation as the standard deviation, as it would not be the standard deviation any more.
>> How is it that x% of scores will always show up within y standard deviations of the mean on a normal curve?
The gaussian distribution always has the same shape. The only changes are a shift of the whole distribution and a change of the width. Just rescale the axes and you get the same graph again. This statement is true for many (but not all) distributions.
The square has mathematical reasons  for example, the general error propagation, where you can square individual errors, add them, calculate the square root and get the standard deviation of the value. That is not possible with other definitions of deviations (again, except with multiples of the standard deviation).
It is linked to the distance in an ndimensional space, where you have to square the indiviual differences, too.
kbltd wrote:I'd say don't get too distracted by the behaviour of the normal distribution. The standard deviation applies to any distribution.
Show me the standard deviation of a Cauchy distribution, please.
 gmalivuk
 GNU Terry Pratchett
 Posts: 26824
 Joined: Wed Feb 28, 2007 6:02 pm UTC
 Location: Here and There
 Contact:
Re: Why standard deviation?
If you have two independent sets of data, and you want to look at how their sums are distributed, variance is nice because the variance of the sum is the sum of the variance. I don't know that MAD or IQR or the other variability measurements have any nice such properties, do they?TennysonXII wrote:What's so special about the standard deviation?

 Posts: 14
 Joined: Thu Dec 31, 2009 9:36 pm UTC
Re: Why standard deviation?
So let me see if I'm understanding a little better now:
Why does the standard deviation always fall along the same place in a normal distribution? Because data always falls along the same place in a normal distribution. That's kind of the definition of a normal distribution. If all the data line up the same way, then all measures of variability will line up the same way too.
Why do we use the standard deviation as opposed to some other measure of variability? Because it plays nicely with the normal curve and with other more complicated stuff that I'll find out about later.
I guess the only question I have left is why the standard deviation always falls at 34.1% above the mean as opposed to always falling at some other point. I don't see the relationship there.
Why does the standard deviation always fall along the same place in a normal distribution? Because data always falls along the same place in a normal distribution. That's kind of the definition of a normal distribution. If all the data line up the same way, then all measures of variability will line up the same way too.
Why do we use the standard deviation as opposed to some other measure of variability? Because it plays nicely with the normal curve and with other more complicated stuff that I'll find out about later.
I guess the only question I have left is why the standard deviation always falls at 34.1% above the mean as opposed to always falling at some other point. I don't see the relationship there.
 Xanthir
 My HERO!!!
 Posts: 5426
 Joined: Tue Feb 20, 2007 12:49 am UTC
 Location: The Googleplex
 Contact:
Re: Why standard deviation?
TennysonXII wrote:I guess the only question I have left is why the standard deviation always falls at 34.1% above the mean as opposed to always falling at some other point. I don't see the relationship there.
That's the "definition of the normal distribution" thing. The normal distribution has a particular shape, such that if you measure the area 1 stdev away from the mean, it's 68% of the total area. It just falls out of the math.
(defun fibs (n &optional (a 1) (b 1)) (take n (unfold '+ a b)))
 Yakk
 Poster with most posts but no title.
 Posts: 11129
 Joined: Sat Jan 27, 2007 7:27 pm UTC
 Location: E pur si muove
Re: Why standard deviation?
How about I hit you with a theory hammer?
https://secure.wikimedia.org/wikipedia/ ... ral_moment
So we have this family of "moments about the mean".
The first 3 of which (not counting zero) are linear. Linear things are good. This is the "mean", the "variance" and the "kurtosis" of the distribution of data.
The standard deviation is the square root of the variance, because that brings its "scale" back in line with the scale of the values from the data (and/or the mean).
If you take a look at the equation for the normal curve:
https://secure.wikimedia.org/wikipedia/ ... stribution
you'll see those sigma squared terms ([imath]\sigma[/imath]^{2}). Sigma squared is the variance. With the possible exception of a factor of 2, it should be pretty clear that the shape of a normal curve is very naturally a function of the mean ([imath]\mu[/imath]) and the variance.
The CDF describes how the integral of the normal curve behaves. You'll notice the sqrt of [imath]\sigma[/imath]^{2} there? That means that the "spread" of the sum of the area under the curve varies with the square root of the variance  [imath]\sigma[/imath]. So if you want to know what percentage of elements are within some window, the width of the window is scaled by sigma.
As to why the normal curve is important, well, if you average up a bunch of uncorrelated random variables, the result ends up moving towards a normal curve.
https://secure.wikimedia.org/wikipedia/ ... ral_moment
So we have this family of "moments about the mean".
The first 3 of which (not counting zero) are linear. Linear things are good. This is the "mean", the "variance" and the "kurtosis" of the distribution of data.
The standard deviation is the square root of the variance, because that brings its "scale" back in line with the scale of the values from the data (and/or the mean).
If you take a look at the equation for the normal curve:
https://secure.wikimedia.org/wikipedia/ ... stribution
you'll see those sigma squared terms ([imath]\sigma[/imath]^{2}). Sigma squared is the variance. With the possible exception of a factor of 2, it should be pretty clear that the shape of a normal curve is very naturally a function of the mean ([imath]\mu[/imath]) and the variance.
The CDF describes how the integral of the normal curve behaves. You'll notice the sqrt of [imath]\sigma[/imath]^{2} there? That means that the "spread" of the sum of the area under the curve varies with the square root of the variance  [imath]\sigma[/imath]. So if you want to know what percentage of elements are within some window, the width of the window is scaled by sigma.
As to why the normal curve is important, well, if you average up a bunch of uncorrelated random variables, the result ends up moving towards a normal curve.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision  BR
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.
Re: Why standard deviation?
It turns out that, if you start with pretty much any distribution with standard deviation sigma, sample that distribution N times (where N is pretty large), and take the average, then about 34.1% of the time that average will be within sigma/sqrt(N) of the mean of the distribution.TennysonXII wrote:I guess the only question I have left is why the standard deviation always falls at 34.1% above the mean as opposed to always falling at some other point. I don't see the relationship there.
Our nice definition of "standard deviation" comes first, from which we can work out the esotericseeming number 34.1%. (And the rest of the esotericseeming numbers that come from the esotericseeming formula exp((x/sigma)^{2})/sqrt(2pi sigma^{2}).) If you want to see how it's worked out, see a proof of the central limit theorem.
Jerry Bona wrote:The Axiom of Choice is obviously true; the Well Ordering Principle is obviously false; and who can tell about Zorn's Lemma?
Re: Why standard deviation?
I'd never understood why standard deviation was so widely used instead of the simpler and more intuitive mean variation (average of the absolute value of the variance). As far as I can tell the only difference between the two is that standard deviation tends to magnify the importance of outliers due to averaging the square of the values rather than the values themselves. If the goal is to magnify the outliers, squaring seems arbitrary since you might as well cube all the values, average them, then take the cube root. This would result in a different number, but not a better number, for describing the dispersion of the data.
There's an interesting page that argues that mean variation is actually better than standard deviation in real life data since it is less likely to magnify error values. However, the main advantage of mean variation is that it has a clear, intuitive meaning which makes it more useful to the people interpreting the data.
Link wasn't allowed for some reason... here it is (copy/paste/fix):
http ://www.leeds.ac.uk/educol/documents/00003759.htm
Link removed from user's first post.
There's an interesting page that argues that mean variation is actually better than standard deviation in real life data since it is less likely to magnify error values. However, the main advantage of mean variation is that it has a clear, intuitive meaning which makes it more useful to the people interpreting the data.
Link wasn't allowed for some reason... here it is (copy/paste/fix):
http ://www.leeds.ac.uk/educol/documents/00003759.htm
Link removed from user's first post.
Last edited by jtheoph on Thu Sep 22, 2011 9:37 pm UTC, edited 1 time in total.
 gmalivuk
 GNU Terry Pratchett
 Posts: 26824
 Joined: Wed Feb 28, 2007 6:02 pm UTC
 Location: Here and There
 Contact:
Re: Why standard deviation?
You know what would be neat? You reading the responses people have already posted in this thread prior to repeating the claim that standard deviation doesn't seem to be a good measure.jtheoph wrote:As far as I can tell the only difference between the two is that standard deviation tends to magnify the importance of outliers due to averaging the square of the values rather than the values themselves. If the goal is to magnify the outliers, squaring seems arbitrary since you might as well cube all the values, average them, then take the cube root. This would result in a different number, but not a better number, for describing the dispersion of the data.
This is only true if the people interpreting the data place a lot of value on clear, intuitive meanings. Rather than, say, being able to do other kinds of analysis on the data, which are difficult or impossible with as badlybehaved a function as absolute value.it has a clear, intuitive meaning which makes it more useful to the people interpreting the data.
Do you also have a suggestion to replace skewness and (excess) kurtosis? Because those are currently defined in terms of the standard deviation.

 Posts: 14
 Joined: Thu Dec 31, 2009 9:36 pm UTC
Re: Why standard deviation?
antonfire wrote:Our nice definition of "standard deviation" comes first, from which we can work out the esotericseeming number 34.1%. (And the rest of the esotericseeming numbers that come from the esotericseeming formula exp((x/sigma)^{2})/sqrt(2pi sigma^{2}).) If you want to see how it's worked out, see a proof of the central limit theorem.
So what you're saying is that once I understand exp((x/sigma)^{2})/sqrt(2pi sigma^{2}), I'll understand the whole shebang. And in order to do that, I need to go find "Proof of the Central Limit Theorem for Dummies." That works for me, and thanks to everyone for helping me work this out. This is my first statistics course, and so far it's all really interesting and intuitively useful. That's something I couldn't have said about any other math I've worked through. My career decision to be a researcher is looking like an excellent fit.
Re: Why standard deviation?
Do you also have a suggestion to replace skewness and (excess) kurtosis? Because those are currently defined in terms of the standard deviation.
You can use the Bowley coefficient of skewedness as a robust alternative to the traditional definition of skewness. Kurtosis can also be calculated without relying on standard deviation as explained here: http://weber.ucsd.edu/~hwhite/pub_files/hwcv092.pdf.
Re: Why standard deviation?
If you have several values from a normal distribution, the probability for certain numbers depends on the sum of squares of their deviations from the mean value only. Try to get this nice feature with other definitions of the width of a distribution.
Re: Why standard deviation?
jtheoph wrote:There's an interesting page that argues that mean variation is actually better than standard deviation in real life data since it is less likely to magnify error values. However, the main advantage of mean variation is that it has a clear, intuitive meaning which makes it more useful to the people interpreting the data.
http ://www.leeds.ac.uk/educol/documents/00003759.htm
There are a few interesting points here, but there is a bit halfway where Gorard claims that any [infinite] superpopulation (that is, infinite population used to approximate a large finite population) must necessarily have an infinite variance. At that point, it became clear that he doesn't really understand how parameter estimation works, which invalidates most of his main points about the efficiency of the mean deviation (and I pretty much gave up at that point).
Re: Why standard deviation?
Standard deviation is also easier to calculate when it comes to studying stochastic process, i.e., via the CameronMartin theorem, a solution of the polynomial chaos expansion coefficients (c_i) yields mu = c_0, sigma^2 = sum_i=1..P c_i^2. So you can obtain statistical moments without actually having to generate statistics.
Re: Why standard deviation?
It depends on the data you're looking at. If the excess kurtosis of your data is zero (IE, roughly normal), then fitting with least squares and thinking in terms of standard deviations is optimal under a bunch of criteria (Efficiency of statistical inferrence, etc). Generally, most is going to be normal due to the CLT, with maybe one or two easily detectable outliers from contamination.
But if your data is laplace distributed, then it's theoretically optimal to use mean absolute error. More generally, I remember reading an article claiming that the correct measure of dispersion is an L_p norm that depends on the kurtosis of the distribution in question according to some formula.
But if your data is laplace distributed, then it's theoretically optimal to use mean absolute error. More generally, I remember reading an article claiming that the correct measure of dispersion is an L_p norm that depends on the kurtosis of the distribution in question according to some formula.
Who is online
Users browsing this forum: No registered users and 14 guests