Statistics 2nd ed

test-scores

Lesson 10 — Regression

scatter intercept
slope intercept

Correlation tells us the strength of the relationship between two variables.
Regression goes one step further: it gives us an equation to predict one variable from another.

 


The Regression Equation

The regression line predicts Y from X.

Symbolic formula:
$$\hat{Y} = a + bX$$

Formula in words:
$$\text{Predicted Y} = \text{intercept} + (\text{slope} \times X)$$

Where:

  • $$\hat{Y}$$ = predicted value of Y
  • $$a$$ = intercept (value of Y when X = 0)
  • $$b$$ = slope (change in Y for each 1-unit change in X)

Slope and Intercept

The slope is calculated as:

$$b = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sum (X - \bar{X})^2}$$

The intercept is:

$$a = \bar{Y} - b\bar{X}$$


Example

Study hours (X) and test scores (Y):

  • X = [2, 4, 6]
  • Y = [50, 60, 80]
  • $$\bar{X} = 4, \quad \bar{Y} = 63.3$$

Step 1: Slope

  • Numerator = Σ(X – X̄)(Y – Ȳ) = 60
  • Denominator = Σ(X – X̄)² = 8
  • $$b = \tfrac{60}{8} = 7.5$$

Step 2: Intercept

  • $$a = 63.3 - (7.5)(4) = 33.3$$

Regression equation:
$$\hat{Y} = 33.3 + 7.5X$$

Interpretation: each extra study hour adds about 7.5 points to the predicted test score.


Coefficient of Determination

The square of the correlation, $$r^2$$, shows the proportion of variance explained by regression.

Here: $$r^2 = 0.98$$, so 98% of score variation is explained by study hours.

 


Definition

  • Regression: predicts one variable from another using a line
  • Slope (b): how much Y changes per unit change in X
  • Intercept (a): expected value of Y when X = 0
  • r²: proportion of variance explained by regression

Visuals

Figure 10.1 — Scatterplot with regression line (Y predicted from X).

Figure 10.2 — Illustration of slope (rise/run) and intercept.


Why This Matters

Regression is a predictive tool.
It connects statistical description to practical forecasting: how much outcome (Y) changes with predictor (X).
It is the basis for more advanced models used in science, business, and data analysis.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Lesson 9 — Correlation

scatter plot
scatterplots

Correlation measures the strength and direction of the relationship between two variables.
It tells us whether high values of one variable go with high (or low) values of another.


Pearson’s r

The most common measure is Pearson’s correlation coefficient, $$r$$.
It ranges from –1 to +1.

  • $$r = +1$$ → perfect positive correlation (as X increases, Y increases).
  • $$r = –1$$ → perfect negative correlation (as X increases, Y decreases).
  • $$r = 0$$ → no linear relationship.

Symbolic formula:
$$r = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sqrt{\sum (X - \bar{X})^2 , \sum (Y - \bar{Y})^2}}$$

Formula in words:
$$r = \frac{\text{sum of the cross-products of deviations from the mean}}{\text{square root of (sum of squared deviations in X × sum of squared deviations in Y)}}$$


Example

Suppose study hours (X) and test scores (Y) are:

  • X = [2, 4, 6]
  • Y = [50, 60, 80]

Means:

  • $$\bar{X} = 4$$
  • $$\bar{Y} = 63.3$$

Deviations:

  • (2–4)(50–63.3) = (–2)(–13.3) = 26.6
  • (4–4)(60–63.3) = (0)(–3.3) = 0
  • (6–4)(80–63.3) = (2)(16.7) = 33.4

Sum cross-products = 60

Sum squares X = (–2)² + 0² + 2² = 8
Sum squares Y = (–13.3)² + (–3.3)² + 16.7² ≈ 466.7

So:
$$r = \frac{60}{\sqrt{8 \times 466.7}} = \frac{60}{\sqrt{3733}} = \frac{60}{61.1} = 0.98$$

A very strong positive correlation.


Coefficient of Determination

The square of correlation is $$r^2$$.
It represents the proportion of variance in Y explained by X.

Example above:
$$r^2 = (0.98)^2 = 0.96$$

So about 96% of the variation in scores is explained by study hours.


Definition

  • Correlation: degree of linear relationship between two variables.
  • Pearson’s r: ranges from –1 to +1.
  • Coefficient of determination (r²): proportion of explained variance.

Visual Placeholders

Figure 9.1 — Scatterplot with positive correlation (points rising, line upward).

Figure 9.2 — Scatterplots showing r ≈ +1, r ≈ 0, r ≈ –1.


Why This Matters

Correlation is the first step in studying relationships.
It helps identify whether variables move together, setting the stage for regression analysis.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.