educational-statistics

Repeated-Measures ANOVA

Goal. Test whether performance changes across four conditions measured on the same participants.

Design & Experiment

Within-subjects factor: Condition with 4 levels (C1, C2, C3, C4).
s = 8 participants measured in k = 4 conditions ⇒ total observations $N = s \times k = 32$.
Example context: the same students take four weekly quizzes after different study activities.

Figure 1: Profile plot (each subject as a line across the four conditions).

Data

Scores (rows = participants S1–S8; columns = conditions C1–C4):

Subject	C1	C2	C3	C4	Row sum	Row mean
S1	70	74	75	81	300	75.00
S2	73	75	78	82	308	77.00
S3	68	73	73	78	292	73.00
S4	74	79	81	85	319	79.75
S5	71	74	78	82	305	76.25
S6	70	72	76	78	296	74.00
S7	73	77	80	84	314	78.50
S8	74	77	80	84	315	78.75
Column sums	573	601	621	654	Grand sum = 2449	Grand mean $ \bar X = 2449/32 = 76.53125 $

Figure 2: Means ± SEM for C1–C4 (bar/line).

Step 1 — Condition Means (and sample variances)

\[ \begin{aligned} \bar X_{\mathrm{C1}} &= 573/8 = 71.625, \quad & s^2_{\mathrm{C1}} &= 4.8393 \\ \bar X_{\mathrm{C2}} &= 601/8 = 75.125, \quad & s^2_{\mathrm{C2}} &= 5.5536 \\ \bar X_{\mathrm{C3}} &= 621/8 = 77.625, \quad & s^2_{\mathrm{C3}} &= 7.6964 \\ \bar X_{\mathrm{C4}} &= 654/8 = 81.750, \quad & s^2_{\mathrm{C4}} &= 7.0714 \end{aligned} \]

Step 2 — Sums of Squares

Notation: $s=8$ subjects, $k=4$ conditions, grand mean $ \bar X = 76.53125$.

2A. Total

\[ SS_{\text{total}}=\sum_{i=1}^{s}\sum_{j=1}^{k}\bigl(X_{ij}-\bar X\bigr)^2 =\mathbf{611.96875}. \]

2B. Conditions (Treatment)

\[ SS_{\text{cond}}= s \sum_{j=1}^{k}\bigl(\bar X_{\cdot j}-\bar X\bigr)^2 = 8 \left[(71.625-76.53125)^2 + (75.125-76.53125)^2 + (77.625-76.53125)^2 + (81.75-76.53125)^2\right] =\mathbf{435.84375}. \]

2C. Subjects

\[ SS_{\text{subj}}= k \sum_{i=1}^{s}\bigl(\bar X_{i\cdot}-\bar X\bigr)^2 = 4 \sum_{i=1}^{8}\bigl(\bar X_{i\cdot}-76.53125\bigr)^2 =\mathbf{162.71875}. \]

2D. Error (Residual)

\[ SS_{\text{error}}= SS_{\text{total}} - SS_{\text{cond}} - SS_{\text{subj}} = 611.96875 - 435.84375 - 162.71875 =\mathbf{13.40625}. \]

Figure 3: Partitioning variance diagram (Total → Conditions + Subjects + Error).

Step 3 — Degrees of Freedom & Mean Squares

\[ \begin{aligned} df_{\text{cond}} &= k-1 = 3, \\ df_{\text{subj}} &= s-1 = 7, \\ df_{\text{error}} &= (s-1)(k-1) = 7\times3 = 21, \\ df_{\text{total}} &= sk-1 = 31. \end{aligned} \]

\[ MS_{\text{cond}} = \frac{SS_{\text{cond}}}{df_{\text{cond}}} =\frac{435.84375}{3}=\mathbf{145.28125},\qquad MS_{\text{error}} = \frac{SS_{\text{error}}}{df_{\text{error}}} =\frac{13.40625}{21}=\mathbf{0.6383928571}. \]

Step 4 — Test Statistic & p-value

\[ F = \frac{MS_{\text{cond}}}{MS_{\text{error}}} = \frac{145.28125}{0.6383928571} =\mathbf{227.5734}. \] With $df_1=3$ and $df_2=21$, this is extremely large. The right-tail p-value is effectively $p \lt 10^{-12}$ (i.e., $p \ll .001$).

Figure 4: F distribution with observed F marked and right-tail region shaded.

Repeated-Measures ANOVA Summary Table

Source	SS	df	MS	F	p
Conditions (within)	435.84375	3	145.28125	227.5734	< 1e-12
Subjects	162.71875	7	23.24554	—	—
Error (residual)	13.40625	21	0.63839	—	—
Total	611.96875	31	—	—	—

Interpretation

Mean performance increases steadily from C1 → C4, and the repeated-measures ANOVA shows a highly significant effect of Condition, $F(3,21)=227.57,\, p\ll .001$. Follow-ups (e.g., paired t-tests with Bonferroni/Holm) can localize which pairs of conditions differ.

Assumptions (checklist)

Sphericity (equal variances of the differences between condition pairs). If violated, apply Greenhouse–Geisser or Huynh–Feldt correction to $df$.
Approximately normal scores within each condition.
No carryover/fatigue effects that confound order (counterbalancing helps).

Figure 5: Sphericity concept sketch (pairwise difference variances).

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Tags

repeated-measures-anova

within-subjects-design

sphericity-assumption

self-test

Factorial ANOVA

Goal. Test the effects of Method (Lecture vs. Online) and Time (Early vs. Late) on exam scores, and whether there is an interaction between Method and Time.

Design & Experiment

Factor A (Method): Lecture vs. Online
Factor B (Time): Early vs. Late
Balanced design: $n=5$ per cell ⇒ total $N=20$.

Students are randomly assigned to one of four cells (Method × Time). After a short module, all students take the same 100-point exam.

Figure 1: 2 × 2 layout (Method × Time).

Data

Scores by cell (five students per cell):

Method	Time	Scores					Cell Mean
Lecture	Early	68	68	70	72	72	70.0
Lecture	Late	76	76	78	80	80	78.0
Online	Early	70	70	72	74	74	72.0
Online	Late	71	71	73	75	75	73.0

Within each cell the sample variance is 4 (SD = 2), so the within-cell sum of squares is $(n-1)s^2 = 4\times4 = 16$ per cell.

Figure 2: Means with SEM by Time, separate lines for Method.

Figure 3: Interaction plot (Lecture rises sharply; Online nearly flat).

Step 1 — Marginal Means and Grand Mean

Cell means: \[ \bar X_{\text{Lecture,Early}}=70,\; \bar X_{\text{Lecture,Late}}=78,\; \bar X_{\text{Online,Early}}=72,\; \bar X_{\text{Online,Late}}=73. \] Marginal means: \[ \bar X_{\text{Lecture}}=\frac{70+78}{2}=74,\quad \bar X_{\text{Online}}=\frac{72+73}{2}=72.5; \qquad \bar X_{\text{Early}}=\frac{70+72}{2}=71,\quad \bar X_{\text{Late}}=\frac{78+73}{2}=75.5. \] Grand mean: \[ \bar X=\frac{70+78+72+73}{4}=73.25. \]

Step 2 — Sums of Squares (Between)

Balanced design formulas (with $n$ per cell, $a=b=2$):

$SS_A = nb \sum_a(\bar X_{a\cdot}-\bar X)^2$, here $nb=10$.
$SS_B = na \sum_b(\bar X_{\cdot b}-\bar X)^2$, here $na=10$.
$SS_{AB} = n \sum_{a,b}\big(\bar X_{ab}-\bar X_{a\cdot}-\bar X_{\cdot b}+\bar X\big)^2$, here $n=5$.

Compute each term:

Factor A (Method): \[ \begin{aligned} SS_A &= 10\Big[(74-73.25)^2 + (72.5-73.25)^2\Big]\\ &= 10\big[0.75^2 + (-0.75)^2\big] = 10(0.5625+0.5625)=\mathbf{11.25}. \end{aligned} \]

Factor B (Time): \[ \begin{aligned} SS_B &= 10\Big[(71-73.25)^2 + (75.5-73.25)^2\Big]\\ &= 10\big[(-2.25)^2 + (2.25)^2\big] = 10(5.0625+5.0625)=\mathbf{101.25}. \end{aligned} \]

Interaction $A\times B$: For each cell compute $d_{ab}=\bar X_{ab}-\bar X_{a\cdot}-\bar X_{\cdot b}+\bar X$. Here each $d_{ab}=\pm1.75$ so $d_{ab}^2=3.0625$ and there are four cells: \[ SS_{AB}=5\times(4\times3.0625)=\mathbf{61.25}. \]

Step 3 — Within-Group (Error) and Total SS

Within each cell, $(n-1)s^2=16$. With 4 cells: \[ SS_{\text{within}}=\mathbf{64.00}. \]

Total: \[ SS_{\text{total}}=SS_A+SS_B+SS_{AB}+SS_{\text{within}} =11.25+101.25+61.25+64.00=\mathbf{238.75}. \]

Step 4 — Degrees of Freedom & Mean Squares

\[ \begin{aligned} &df_A=a-1=1,\quad df_B=b-1=1,\quad df_{AB}=(a-1)(b-1)=1,\\ &df_{\text{within}}=N-ab=20-4=\mathbf{16},\quad df_{\text{total}}=N-1=19. \end{aligned} \] \[ MS_A=\frac{11.25}{1}=11.25,\quad MS_B=\frac{101.25}{1}=101.25,\quad MS_{AB}=\frac{61.25}{1}=61.25,\quad MS_{\text{within}}=\frac{64.00}{16}=\mathbf{4.00}. \]

Step 5 — F Tests & p-values

\[ F_A=\frac{MS_A}{MS_{\text{within}}}=\frac{11.25}{4}= \mathbf{2.8125},\qquad F_B=\frac{MS_B}{MS_{\text{within}}}=\frac{101.25}{4}= \mathbf{25.3125},\qquad F_{AB}=\frac{MS_{AB}}{MS_{\text{within}}}=\frac{61.25}{4}= \mathbf{15.3125}. \] With $df_1=1$, $df_2=16$: \[ p_A \approx 0.11\;(\text{n.s.}),\quad p_B < 0.001,\quad p_{AB} \approx 0.001. \]

ANOVA Summary Table

Source	SS	df	MS	F	p
Method (A)	11.25	1	11.25	2.8125	≈ 0.11
Time (B)	101.25	1	101.25	25.3125	< 0.001
A × B	61.25	1	61.25	15.3125	≈ 0.001
Within (Error)	64.00	16	4.00	—	—
Total	238.75	19	—	—	—

Interpretation

Main effect of Time (B) is significant: Late > Early on average. Main effect of Method (A) is not significant at conventional levels. The interaction (A × B) is significant: Lecture improves markedly from Early→Late, while Online changes little—non-parallel lines in the interaction plot.

Figure 4: Interaction plot highlighting non-parallel lines.

Assumptions (checklist)

Independence of observations within and across cells.
Approximately normal scores within each cell.
Homogeneity of variances across cells (here, each cell variance ≈ 4).

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Tags

statistical-interaction

experimental-design

hypothesis-testing

variance-analysis

statistical-hypothesis-testing

One-Way ANOVA

Goal. Test whether three teaching methods lead to different average exam scores.

Design & Experiment

Twenty-four students are randomly assigned to one of three methods (n = 8 per group):

Group A: Active discussion
Group B: Structured lecture
Group C: Self-study

After a 2-week module, everyone takes the same 100-point exam.

Data

Group A	Group B	Group C
72	78	65
68	82	70
75	80	66
70	77	68
69	79	67
73	81	69
71	83	64
74	76	71

Figure 1: Boxplots of scores by group.

Group sizes: $n_A=n_B=n_C=8$. Total $N=24$.

Step 1 — Sums & Means

$\displaystyle \begin{aligned} \text{Sums:}&\quad \sum A=572,\;\; \sum B=636,\;\; \sum C=540.\\[4pt] \text{Means:}&\quad \bar A=\tfrac{572}{8}=71.5,\;\; \bar B=\tfrac{636}{8}=79.5,\;\; \bar C=\tfrac{540}{8}=67.5.\\[4pt] \text{Grand mean:}&\quad \bar X=\tfrac{572+636+540}{24}=72.8333\ldots \end{aligned} $

Step 2 — Within-Group Variability (sample variances)

For each group, compute $ s_g^2=\dfrac{\sum(x-\bar x_g)^2}{n_g-1} $.

$s_A^2 = 6.0$
$s_B^2 = 6.0$
$s_C^2 = 6.0$

Corresponding sums of squares within each group: $\displaystyle SS_A=\sum(x-\bar A)^2=42,\; SS_B=42,\; SS_C=42\Rightarrow SS_{\text{within}}=42+42+42=126.0. $

Figure 2: Group means with SEM error bars.

Step 3 — Between-Groups Variability

$\displaystyle SS_{\text{between}}=\sum_{g} n_g(\bar x_g-\bar X)^2 =8(71.5-72.8333)^2+8(79.5-72.8333)^2+8(67.5-72.8333)^2 =597.3333\ldots $

Total sum of squares: $\displaystyle SS_{\text{total}}=\sum (x-\bar X)^2 = SS_{\text{between}}+SS_{\text{within}} =597.3333\ldots+126.0=723.3333\ldots $

Figure 3: Partitioning variance ($SS_{\text{total}}=SS_{\text{between}}+SS_{\text{within}}$).

Degrees of Freedom & Mean Squares

$\displaystyle df_{\text{between}}=k-1=3-1=2,\qquad df_{\text{within}}=N-k=24-3=21,\qquad df_{\text{total}}=N-1=23. $

$\displaystyle MS_{\text{between}}=\frac{SS_{\text{between}}}{df_{\text{between}}} =\frac{597.3333}{2}=298.6667,\qquad MS_{\text{within}}=\frac{SS_{\text{within}}}{df_{\text{within}}} =\frac{126.0}{21}=6.0. $

Test Statistic & p-value

$\displaystyle F=\frac{MS_{\text{between}}}{MS_{\text{within}}} =\frac{298.6667}{6.0}=49.7778. $

With $df_1=2$, $df_2=21$, the (right-tail) p-value is $p\approx 1.07\times10^{-8}$ (i.e., $p<0.00000002$).

Figure 4: F distribution curve with right-tail decision region.

ANOVA Summary Table

Source	SS	df	MS	F	p
Between groups	597.3333	2	298.6667	49.7778	< 0.00000002
Within (error)	126.0000	21	6.0000	—	—
Total	723.3333	23	—	—	—

Conclusion

There is a statistically significant difference among the three methods’ mean scores ($F(2,21)=49.78,\; p\ll .001$). A post-hoc comparison (e.g., Tukey HSD) would identify which pairs differ.

Assumptions (checklist)

Independent observations (via random assignment).
Approximately normal scores within each group.
Homogeneity of variance (here, each group variance $\approx 6$).

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Tags

one-way-anova

analysis-of-variance

group-comparisons

variance-partitioning

homogeneity-of-variance

exam-score-analysis

Appendix 8 — Glossary of Key Terms

Mean (average)
Sum of all scores divided by number of scores.
Example: (6 + 8 + 10) / 3 = 8.

Median
Middle score when data are ordered.
Example: For [5, 7, 8], median = 7.

Mode
Most frequent score.
Example: For [2, 3, 3, 5], mode = 3.

Variance (s²)
Average squared deviation from the mean.

Standard Deviation (s)
Square root of variance. Spread of scores around the mean.

Standard Error of the Mean (SEM)
How much sample means vary.
Formula: $$SEM = \frac{s}{\sqrt{n}}$$

t-test
Compares two means.

ANOVA (F-test)
Compares three or more means.

Post Hoc Test
Used after ANOVA to find which groups differ.

Correlation (r)
Strength and direction of a linear relationship. Range: –1 to +1.

Regression
Equation that predicts Y from X.
Example: $$\hat{Y} = a + bX$$

Chi-square (χ²)
Test for categorical data (counts).

Degrees of Freedom (df)
Independent pieces of information in a test.

p-value
Probability of getting the observed result (or more extreme) if the null hypothesis is true.

📱 QR: Interactive glossary (search symbols, formulas, definitions)

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Tags

Appendix 7 — Study Tips for Statistics

Learning statistics is not about memorizing formulas — it’s about thinking with data.
Here are some strategies to make it easier.

1. Read Formulas in Two Ways

Symbolic: $$\bar{X} = \frac{\Sigma X}{n}$$
Words: “Mean = sum of scores / number of scores”

2. Practice by Hand First

Work out a mean or variance with a small dataset.
Then check with calculator/Excel.
This builds intuition and confidence.

3. Draw Pictures

Normal curve with shaded area
Bar charts for group means
Scatterplots for correlation
Visuals make ideas stick.

4. Watch Out for Common Mistakes

Mixing up SD and SEM
Forgetting to subtract 1 for df
Using a one-tailed test when two-tailed is needed

5. Use Short Sessions

10–15 minutes of practice each day beats one long cram.
Try one formula or test per session.

6. Check Your Understanding

Can you explain in words what the test does?
Example: “t-test compares two means. ANOVA compares three or more.”

📱 QR: Online flashcards + short quiz (practice key terms & formulas)

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Tags

study-tips

learning-strategies

how-to-learn-statistics

applied-learning

Appendix 6 — Data Sets for Practice

```html

Appendix 6 — Data Sets for Practice

Working with real numbers is the best way to learn statistics. This appendix provides small “mini datasets” you can analyze by hand (or with a calculator), plus larger files for practice with spreadsheets.

Dataset Provenance (Read This First)

Pedagogical = small, simplified numbers chosen to make learning and checking easier.
Simulated = computer-generated numbers designed to resemble real data (not collected from real people).
Empirical = collected from real observations (only used if explicitly stated).

Note: Unless a dataset is explicitly labeled Empirical, you should treat it as Pedagogical or Simulated practice data.

Mini Datasets (In-Page)

1) Quiz Scores

Provenance: Pedagogical
n: 10
Scale: Ratio (points)
Data: 6, 7, 8, 9, 10, 7, 8, 6, 9, 10

Suggested Lessons:
- Lesson 2 — The Averages: mean, median, mode
- Lesson 3 — Variance & Standard Deviation: variance, SD, z-scores
- Lesson 4 — The Standard Normal Curve: interpret z-scores (as a bridge)
Check values (optional): Mean = 8.0; SD ≈ 1.41

2) Reaction Times (ms)

Provenance: Pedagogical (human-like values)
n: 8
Scale: Ratio (milliseconds)
Units: ms
Data: 220, 250, 270, 230, 260, 280, 240, 300

Suggested Lessons:
- Lesson 3 — Variance & Standard Deviation: spread, outliers, SD
- Lesson 6 — The t-test: use as a template dataset (e.g., compare two conditions by splitting into two groups)
- Lesson 7 — ANOVA: extend to 3+ groups by creating conditions
Instructor tip: reaction time data often show mild skew in real life. If you want skew, see the larger practice files below.

3) Stress Reduction Scores (Three Groups)

Provenance: Pedagogical (grouped scores)
Scale: Interval/Ratio (score units; treat as interval for ANOVA practice)
Groups:

Meditation (n = 3): 65, 70, 72
Exercise (n = 3): 68, 71, 75
Music (n = 3): 75, 78, 82
Suggested Lessons:
- Lesson 7 — ANOVA: one-way ANOVA (three independent groups)
- Lesson 8 — Post Hoc Tests: follow-up comparisons after ANOVA (conceptual)
- Lesson 13 — Degrees of Freedom Cookbook: df for one-way ANOVA
Important note: The sample sizes are intentionally small for learning mechanics. In real studies, groups are usually larger.

Larger Practice Datasets (Download Files)

These datasets are designed for spreadsheet work, graphing, and full problem sets.

Exam Scores (n = 100)
Provenance: Simulated
Suggested Lessons: Lesson 4 (normal curve), Lesson 5 (SEM), Lesson 6 (t-test foundations)
Survey Data (preferences by gender/age)
Provenance: Simulated (categorical practice)
Suggested Lessons: Lesson 12 (chi-square), Lesson 1 (why statistics matters in decisions)
Simulated Medical Trial (treatment vs. control, repeated measures)
Provenance: Simulated (instructional “trial-style” dataset; not clinical research)
Suggested Lessons: Lesson 6 (t-test concepts), Lesson 7 (variance partitioning concepts), and for advanced learners: repeated-measures ideas (optional)

Downloads: CSV and Excel files are provided via the QR code(s) on this page (and/or direct links, if enabled on your device).

Reproducibility note (simulated files): If you revise these datasets in future editions, consider generating them with a fixed random seed so instructors and students can reproduce results across versions.

Trusted External Sources (Optional)

If you want additional datasets beyond the practice files above, the following repositories are widely used for learning and benchmarking:

NIST Statistical Reference Datasets (SRD)
High-quality benchmark datasets for practice and verification (excellent for checking calculations and software).
UCI Machine Learning Repository
Larger, more complex datasets. Recommended only for advanced students or enrichment projects.

Visual Reference

Figure F.1 — Example spreadsheet view of a dataset (columns such as ID, Score, Group). Use this as a template for organizing your own data before running calculations.

Self-Test Quiz Access

Practice problems and self-test quizzes may appear below. If full access is restricted, please sign up (free) to unlock the quiz section.

```

Tags

Appendix 5 — Technology Tips (On Your Phone & Laptop)

Statistics can be done with calculators, spreadsheets, or software. Here’s a quick guide.

Excel / Google Sheets

Task	Formula	Example
Mean	`=AVERAGE(A1:A10)`	Mean of scores in A1–A10
Standard Deviation	`=STDEV.S(A1:A10)`	Spread of scores
t-test	`=T.TEST(A1:A10,B1:B10,2,2)`	Compare two groups

R (RStudio or RStudio Cloud)

Task	Command	Example
Mean	`mean(x)`	`mean(c(6,8,10)) = 8`
SD	`sd(x)`	`sd(c(6,8,10)) = 2`
t-test	`t.test(x,y)`	Compare two groups

Python (NumPy / SciPy / Pandas)

Task	Command	Example
Mean	`np.mean(x)`	`np.mean([6,8,10]) = 8`
SD	`np.std(x, ddof=1)`	`np.std([6,8,10],ddof=1) = 2`
t-test	`stats.ttest_ind(x,y)`	Compare two groups

iPhone Calculator

Rotate sideways → scientific mode
Use √ for square root
Parentheses matter: type numerator, then divide by denominator
Fine for small problems, but not for full datasets

Summary

For quick homework: iPhone calculator
For assignments: Excel / Google Sheets
For coding: Python (Colab) or R (RStudio Cloud)

📱 QR: Open sample data in Google Sheets (ready to practice mean, SD, t-test)

Visuals

Figure E.1 — Screenshots of the same mean calculation in Sheets, R, and Python side by side.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Tags

scientific-calculator

spreadsheet-formulas

coding-for-statistics