Statistics 2nd ed

regression-analysis

Lesson 17 — Regression Beyond the Line

multiple regression plane
logistic curve

Simple regression predicts Y from one X.
But in real life, outcomes often depend on several variables — or may not be linear.

This chapter introduces multiple regression and logistic regression.


Multiple Regression

Formula:

$$\hat{Y} = a + b_1X_1 + b_2X_2 + \dots + b_kX_k$$

In words:
$$\text{Predicted Y} = \text{intercept} + (b_1 \times X_1) + (b_2 \times X_2) + \dots$$

Where:

  • $$X_1, X_2, \dots X_k$$ = predictors
  • $$b_1, b_2, \dots b_k$$ = slopes (weights for each predictor)

Example: Predicting college GPA from:

  • High school GPA ($$X_1$$)
  • Study hours ($$X_2$$)

Equation:
$$\hat{Y} = 1.0 + 0.5X_1 + 0.1X_2$$

Interpretation:

  • For each 1-point increase in HS GPA, college GPA rises 0.5.
  • For each extra study hour, GPA rises 0.1.

Coefficient of Determination

In multiple regression, $$R^2$$ tells us the proportion of variance explained by all predictors together.

Example: $$R^2 = 0.65$$ → predictors explain 65% of the outcome’s variability.


Logistic Regression

What if the outcome is yes/no (categorical)?
Example: Will a student pass or fail?

We use logistic regression.

Formula:

$$P(Y=1) = \frac{1}{1 + e^{-(a + bX)}}$$

In words:
$$\text{Probability of success} = \frac{1}{1 + e^{-(\text{intercept} + \text{slope} \times X)}}$$

Output: probability between 0 and 1.

Example: Predicting pass/fail from study hours.

  • Equation: $$P = \frac{1}{1 + e^{-( -2 + 0.5X )}}$$
  • If X = 6 hours: $$P = \frac{1}{1 + e^{-1}} = 0.73$$
  • About 73% chance of passing.

Visuals

Figure 17.1 — Multiple regression plane: Y predicted from two predictors.

Figure 17.2 — Logistic regression curve: probability vs. study hours.


Why This Matters

  • Multiple regression = prediction with many factors
  • Logistic regression = prediction when the outcome is categorical
  • $$R^2$$ = strength of prediction

These methods expand the power of regression beyond a straight line, preparing for modern predictive modeling.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Lesson 10 — Regression

scatter intercept
slope intercept

Correlation tells us the strength of the relationship between two variables.
Regression goes one step further: it gives us an equation to predict one variable from another.

 


The Regression Equation

The regression line predicts Y from X.

Symbolic formula:
$$\hat{Y} = a + bX$$

Formula in words:
$$\text{Predicted Y} = \text{intercept} + (\text{slope} \times X)$$

Where:

  • $$\hat{Y}$$ = predicted value of Y
  • $$a$$ = intercept (value of Y when X = 0)
  • $$b$$ = slope (change in Y for each 1-unit change in X)

Slope and Intercept

The slope is calculated as:

$$b = \frac{\sum (X - \bar{X})(Y - \bar{Y})}{\sum (X - \bar{X})^2}$$

The intercept is:

$$a = \bar{Y} - b\bar{X}$$


Example

Study hours (X) and test scores (Y):

  • X = [2, 4, 6]
  • Y = [50, 60, 80]
  • $$\bar{X} = 4, \quad \bar{Y} = 63.3$$

Step 1: Slope

  • Numerator = Σ(X – X̄)(Y – Ȳ) = 60
  • Denominator = Σ(X – X̄)² = 8
  • $$b = \tfrac{60}{8} = 7.5$$

Step 2: Intercept

  • $$a = 63.3 - (7.5)(4) = 33.3$$

Regression equation:
$$\hat{Y} = 33.3 + 7.5X$$

Interpretation: each extra study hour adds about 7.5 points to the predicted test score.


Coefficient of Determination

The square of the correlation, $$r^2$$, shows the proportion of variance explained by regression.

Here: $$r^2 = 0.98$$, so 98% of score variation is explained by study hours.

 


Definition

  • Regression: predicts one variable from another using a line
  • Slope (b): how much Y changes per unit change in X
  • Intercept (a): expected value of Y when X = 0
  • r²: proportion of variance explained by regression

Visuals

Figure 10.1 — Scatterplot with regression line (Y predicted from X).

Figure 10.2 — Illustration of slope (rise/run) and intercept.


Why This Matters

Regression is a predictive tool.
It connects statistical description to practical forecasting: how much outcome (Y) changes with predictor (X).
It is the basis for more advanced models used in science, business, and data analysis.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.