Variance and standard deviation - intuitive level
Variance and standard deviation - formal level
Drama
Mathematical sweat
Susan Bolles is a new professor of
Psychology at Goatshead College.
Her chairman, sorry, chairperson, Dr.
Alexa Terrorvski, assigned her the
introductory psychology class of
1000 students. The first midterm
exam has just taken place. There
were 100 questions, 1 point each.
The exam papers were
computer-graded, all 1000 of them.
Dr. Terrorvski wants to know how
the class did, so she asks Susan.
Susan says that the mean (the
average) was 60.
Dr. Terrorvski wants to know more,
how many people scored close to
100, and how many people scored
close to zero. Susan walks up to the
pile of exam sheets and starts
reading the scores: 48, 30, 70, 99,
53 …… Papers are spread out from
the middle point, the average.
Realizing that this would take a good part
of the day, Dr. Terrorvski shouts out:
There must be a better way!
I will tell you what. Take this pile out to the
stadium, put it down in the middle of the
stadium. Mark this point 60 (your average
score). Take a step and mark this point 61.
Another step, mark this 62, all the way to
100. Return to mark 60 and take a step in
the opposite direction. Mark this point
59.Take another step, and mark this 58,
repeat all the way to mark 0. Now return to
your pile of exam sheets and pick up an
exam sheet. Read the score and walk to the
point it corresponds to on the markings you
made. I will return in an hour to see how
the exam went.
When Dr. Terrorvski returns she finds
Susan drenched in sweat and panting
vigorously. There is a long line of white
sheets of paper on both sides of the
point that marks 60, the average.
Susan picks up another exam sheet
which had the score of 3 and begins
walking. 59. 58. 57….
Enough! Terrorvski shouts. This is a
mess. Look at how many papers are
spread out away from the mean,
students have scored low scores, all
the way down to zero. So many
scores, so many students are very far
from 60, the mean. There is a big
distance of many scores from the
mean. You need to be more effective
in teaching your students, even the
weak ones, Dr. Terrorvski said, and
marched out of the stadium.
Susan went to her office and tried
to get a better picture of the
situation. Rather than walking
away from the point of the mean,
she calculated the distance of
each score from the mean. Score
40. Distance from the mean -20.
Score 65. Distance from the mean
+5. And so on. At the end she
added up all these distances and
she found the total distance.
That was the distance she had to
walk in the stadium!
In the second midterm, the mean
was again 60. This time she did
not go out in the stadium. She
simply found out the distance of
each score from the mean. She
added up all these distances
and was pleased to see that the
total distance was very small.
Surely, Dr. Terrorvski would not
yell at her this time. Students
scored close to the mean, there
were very few low scores. The
scores this time were not spread
out all over the place away from
the mean.
The symbol for variance is s2.
The formula for variance is
\[ s^2=\frac{\sum (X-\bar{X})^2}{n-1} \]
We read this as follows:
Variance equals the sum of
squared deviations of each score
from the mean, divided by how
many scores went into the
calculation.
Why squared? Why square the
deviation, you say.
The sum of deviations from the
mean always, in all cases, equals
0. That is why we square each
deviation to prevent this. You
should know that in all sciences,
for the purpose of meaningful
analysis, we may transform our
data by squaring them, or
expressing them as logarithms,
and so on. This does not change
the relation of scores amongst
themselves.
Why divide by n?
You understand that if in one case
we have large scores, and in
another small scores, the sum of
the deviations from the mean will
be large in the first instance, and
small in the second instance. If we
want to compare the spread of the
scores in the two instances, we
have to average each of these
sums of deviations. I hope you
understand this.
For example, in order to compare
the income of New Yorkers to
that of Chicagoans, we must
average the total income of New
Yorkers and Chicagoans.
The numerator of this formula
\[ \sum (X-\bar{X})^2 \]
is the sum of the difference of
each score squared, or raised to
the second power. More formally,
we say the numerator of the
variance formula is the sum of
squared deviations of each score from the
mean, squared. In statistical jargon
we say:
Sum of squares, or SS.
The denominator is the n, i.e., the
number of scores we have in this
case. Dividing by how many
scores we have the mean or
average.
So, the variance formula is the
average of the sum of squared
deviations, or, in statistical jargon,
the mean squares or MS, for
short.
What is standard deviation? you say.
To calculate the standard deviation
we take the square root of
variance. Simple.
You do not need to know how to
calculate the square root of a
number. Not in the age of
computers. Anyone can learn to
do simple arithmetic. The
challenge is to understand
concepts of statistical and
mathematical operations.
The formula for standard deviation
is:
\[ s=\sqrt{s^2} \]
Do not worry about the -1 in the
denominator. The n changes
depending on whether we deal
with samples or an entire
population. Remember our goal
here is to understand the concepts
of statistics and want to avoid
getting stuck in compulsive
swamps.
We use n-1 when we work with samples.
We N without -1 when we refer to population.
Now that we have removed the
mystery of standard deviation of
the normal distribution, we return
to it.
Remember this is not just a curve,
it is Goddess Normal Curve. Glory
to NC in the highest!

