Statistics 2nd ed

Story 7 — Variance and standard deviation

Variance and standard deviation - intuitive level

Variance and standard deviation - formal level

Drama

Mathematical sweat

 

Susan Bolles is a new professor of

Psychology at Goatshead College.

Her chairman, sorry, chairperson, Dr.

Alexa Terrorvski, assigned her the

introductory psychology class of

1000 students. The first midterm

exam has just taken place. There

were 100 questions, 1 point each.

The exam papers were

computer-graded, all 1000 of them.

Dr. Terrorvski wants to know how

the class did, so she asks Susan.

Susan says that the mean (the

average) was 60.

 

Dr. Terrorvski wants to know more,

how many people scored close to

100, and how many people scored

close to zero. Susan walks up to the

pile of exam sheets and starts

reading the scores: 48, 30, 70, 99,

53 …… Papers are spread out from

the middle point, the average.

Realizing that this would take a good part

of the day, Dr. Terrorvski shouts out:

 

There must be a better way!

I will tell you what. Take this pile out to the

stadium, put it down in the middle of the

stadium. Mark this point 60 (your average

score). Take a step and mark this point 61.

Another step, mark this 62, all the way to

100. Return to mark 60 and take a step in

the opposite direction. Mark this point

59.Take another step, and mark this 58,

repeat all the way to mark 0. Now return to

your pile of exam sheets and pick up an

exam sheet. Read the score and walk to the

point it corresponds to on the markings you

made. I will return in an hour to see how

the exam went.

 

When Dr. Terrorvski returns she finds

Susan drenched in sweat and panting

vigorously. There is a long line of white

sheets of paper on both sides of the

point that marks 60, the average.

Susan picks up another exam sheet

which had the score of 3 and begins

walking. 59. 58. 57….

 

Enough! Terrorvski shouts. This is a

mess. Look at how many papers are

spread out away from the mean,

students have scored low scores, all

the way down to zero. So many

scores, so many students are very far

from 60, the mean. There is a big

distance of many scores from the

mean. You need to be more effective

in teaching your students, even the

weak ones, Dr. Terrorvski said, and

marched out of the stadium.

 

Susan went to her office and tried

to get a better picture of the

situation. Rather than walking

away from the point of the mean,

she calculated the distance of

each score from the mean. Score

40. Distance from the mean -20.

Score 65. Distance from the mean

+5. And so on. At the end she

added up all these distances and 

she found the total distance.

That was the distance she had to

walk in the stadium!

 

In the second midterm, the mean

was again 60. This time she did

not go out in the stadium. She

simply found out the distance of

each score from the mean. She

added up all these distances

and was pleased to see that the

total distance was very small.

Surely, Dr. Terrorvski would not

yell at her this time. Students

scored close to the mean, there

were very few low scores. The

scores this time were not spread

out all over the place away from

the mean.

 

The symbol for variance is s2.

The formula for variance is

 

 

\[ s^2=\frac{\sum (X-\bar{X})^2}{n-1} \]

 

 

We read this as follows:

Variance equals the sum of

squared deviations of each score

from the mean, divided by how

many scores went into the

calculation.

 

Why squared? Why square the

deviation, you say.

 

The sum of deviations from the

mean always, in all cases, equals

0. That is why we square each

deviation to prevent this. You

should know that in all sciences,

for the purpose of meaningful

analysis, we may transform our

data by squaring them, or

expressing them as logarithms,

and so on. This does not change

the relation of scores amongst

themselves. 

 

Why divide by n?

 

You understand that if in one case

we have large scores, and in

another small scores, the sum of

the deviations from the mean will

be large in the first instance, and

small in the second instance. If we

want to compare the spread of the

scores in the two instances, we

have to average each of these

sums of deviations. I hope you

understand this.

 

For example, in order to compare

the income of New Yorkers to

that of Chicagoans, we must

average the total income of New

Yorkers and Chicagoans.

The numerator of this formula

 

 

\[ \sum (X-\bar{X})^2 \]

 

 

is the sum of the difference of

each score squared, or raised to

the second power. More formally,

we say the numerator of the
variance formula is the sum of

squared deviations of each score from the

mean, squared. In statistical jargon

 we say: 

 

Sum of squares, or SS.

 

The numerator of the variance

formula is the

Sum of Squares,

or SS

 

The denominator is the n, i.e., the

number of scores we have in this

case. Dividing by how many 

scores we have the mean or

average.

 

So, the variance formula is the

average of the sum of squared

deviations, or, in statistical jargon,

the mean squares or MS, for

short. 

 

 

Variance is also called

mean squares

or

MS

 

 

 What is standard deviation? you say.

 

To calculate the standard deviation

we take the square root of

variance. Simple.

 

You do not need to know how to

calculate the square root of a

number. Not in the age of

computers. Anyone can learn to

do simple arithmetic. The

challenge is to understand

concepts of statistical and

mathematical operations.

 

The formula for standard deviation

is:

 

 

\[ s=\sqrt{s^2} \]

 

 

Do not worry about the -1 in the

denominator. The n changes

depending on whether we deal

with samples or an entire

population. Remember our goal 

here is to understand the concepts

of statistics and want to avoid

getting stuck in compulsive

swamps.

 

We use n-1 when we work with samples.

We N without -1 when we refer to population.

 

Now that we have removed the

mystery of standard deviation of

the normal distribution, we return

to it.

 

Remember this is not just a curve,

it is Goddess Normal Curve. Glory

to NC in the highest!

 

AddToAny share buttons