Statistics 2nd ed

no-correlation

Lesson 9 — Correlation

scatter plot
scatterplots

Correlation measures the strength and direction of the relationship between two variables.
It tells us whether high values of one variable go with high (or low) values of another.


Pearson’s r

The most common measure is Pearson’s correlation coefficient, $$r$$.
It ranges from –1 to +1.

  • $$r = +1$$ → perfect positive correlation (as X increases, Y increases).
  • $$r = –1$$ → perfect negative correlation (as X increases, Y decreases).
  • $$r = 0$$ → no linear relationship.

Symbolic formula:
$$r = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum (X - \bar{X})^2 , \sum (Y - \bar{Y})^2}}$$

Formula in words:
$$r = \frac{\text{sum of the cross-products of deviations from the mean}}{\text{square root of (sum of squared deviations in X × sum of squared deviations in Y)}}$$


Example

Suppose study hours (X) and test scores (Y) are:

  • X = [2, 4, 6]
  • Y = [50, 60, 80]

Means:

  • $$\bar{X} = 4$$
  • $$\bar{Y} = 63.3$$

Deviations:

  • (2–4)(50–63.3) = (–2)(–13.3) = 26.6
  • (4–4)(60–63.3) = (0)(–3.3) = 0
  • (6–4)(80–63.3) = (2)(16.7) = 33.4

Sum cross-products = 60

Sum squares X = (–2)² + 0² + 2² = 8
Sum squares Y = (–13.3)² + (–3.3)² + 16.7² ≈ 466.7

So:
$$r = \frac{60}{\sqrt{8 \times 466.7}} = \frac{60}{\sqrt{3733}} = \frac{60}{61.1} = 0.98$$

A very strong positive correlation.


Coefficient of Determination

The square of correlation is $$r^2$$.
It represents the proportion of variance in Y explained by X.

Example above:
$$r^2 = (0.98)^2 = 0.96$$

So about 96% of the variation in scores is explained by study hours.


Definition

  • Correlation: degree of linear relationship between two variables.
  • Pearson’s r: ranges from –1 to +1.
  • Coefficient of determination (r²): proportion of explained variance.

Visual Placeholders

Figure 9.1 — Scatterplot with positive correlation (points rising, line upward).

Figure 9.2 — Scatterplots showing r ≈ +1, r ≈ 0, r ≈ –1.


Why This Matters

Correlation is the first step in studying relationships.
It helps identify whether variables move together, setting the stage for regression analysis.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.