Statistics 2nd ed

categorical-data

Lesson 12 — Chi-square Tests

gof observed expectancies
independence 2x2
phi cramer

The chi-square test ($$\chi^2$$) is used with categorical (nominal) data.
It compares observed frequencies with expected frequencies.


Chi-square Goodness-of-Fit

When to Use:

  • One categorical variable
  • Test if observed frequencies match expected frequencies

Formula:
$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

In words:
$$\chi^2 = \text{sum of squared differences between observed and expected, divided by expected}$$

Example:
Survey of favorite subjects (Math, Science, English).
Expected = equal (⅓ each), Observed = [25, 30, 45].
Compute each (O–E)²/E, sum = χ².


Chi-square Test of Independence

When to Use:

  • Two categorical variables
  • Test whether they are associated (independent or not)

Formula:
$$\chi^2 = \sum \frac{(O - E)^2}{E}$$

Where expected frequencies:
$$E = \frac{(\text{row total})(\text{column total})}{\text{grand total}}$$

Example:
Gender (Male/Female) × Sport (Soccer/Basketball/Tennis).
If observed counts differ from expected, χ² tests independence.


Chi-square Correlation Measures

Chi-square can also give a measure of association strength between categorical variables.

  • Phi coefficient (φ): for 2 × 2 tables

$$\phi = \sqrt{\frac{\chi^2}{N}}$$

  • Cramer’s V: for larger tables

$$V = \sqrt{\frac{\chi^2}{N(k-1)}}$$

Where $$k = \min(\text{rows}, \text{columns})$$.

  • Contingency coefficient (C):

$$C = \sqrt{\frac{\chi^2}{\chi^2 + N}}$$


Example (Phi, Cramer’s V, Contingency C)

Suppose χ² = 10.0, N = 100.

  • For 2 × 2: $$\phi = \sqrt{10/100} = \sqrt{0.1} = 0.32$$
  • For 3 × 2 table: $$V = \sqrt{10/(100(2-1))} = \sqrt{0.1} = 0.32$$
  • Contingency coefficient: $$C = \sqrt{10/(10+100)} = \sqrt{0.09} = 0.30$$

Definition

  • Goodness-of-fit: one categorical variable vs. expected distribution
  • Independence: relationship between two categorical variables
  • Correlation measures: strength of association in categorical tables (φ, V, C)

Visuals

Figure 12.1 — Goodness-of-fit example: observed vs. expected bar chart.

Figure 12.2 — Independence test: 2 × 2 contingency table with expected values.

Figure 12.3 — Phi, Cramer’s V, and C illustrated with 2 × 2 and 3 × 2 tables.


Why This Matters

Chi-square lets us analyze data that are counts rather than scores.
It extends statistical testing beyond numbers into categories — essential for psychology, sociology, education, and medicine.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Lecture 1 — Scales of Measurement

scales of measurement

Before we can analyze data, we must know how it was measured.
The type of measurement determines which statistical test is appropriate.

Scientists and psychologists classify data into four scales of measurement: nominal, ordinal, interval, and ratio.


The Four Scales

  1. Nominal Scale
    • Numbers are just labels or categories.
    • Example: 1 = Male, 2 = Female.
    • No arithmetic can be done.
  2. Ordinal Scale
    • Numbers show order or rank, but not equal intervals.
    • Example: 1st place, 2nd place, 3rd place.
    • We know who is higher, but not by how much.
  3. Interval Scale
    • Numbers have equal intervals, but no true zero.
    • Example: Temperature in °C.
    • 20°C is warmer than 10°C, but not “twice as hot.”
  4. Ratio Scale
    • Numbers have equal intervals and a true zero.
    • Example: Height, weight, reaction time.
    • Ratios are meaningful: 20 kg is twice 10 kg.

Definition

  • Nominal: categories only
  • Ordinal: rank order
  • Interval: equal intervals, no true zero
  • Ratio: equal intervals, true zero

Drama Box — “My Kids, My Fingers”

A professor once explained measurement scales by holding up his hand.

  • “I have five fingers. That’s a ratio scale — it’s a real count, and zero means none.”
  • “If I say this finger is first, that’s an ordinal scale.”
  • “If I call them One, Two, Three, that’s just labels — a nominal scale.”
  • “If I measure temperature in Celsius on my skin, that’s interval — the numbers are spaced evenly, but zero doesn’t mean no heat.”

The story helps students remember: labels, ranks, intervals, ratios — the four levels of measurement.


Visuals

Figure L1 — The Ladder of Measurement Scales. Four rungs labeled: Nominal → Ordinal → Interval → Ratio, each with examples.


Why This Matters

  • Nominal/Ordinal data → non-parametric tests
  • Interval/Ratio data → parametric tests

This decision is the first step in statistics.
Before calculating a mean, a t-test, or an ANOVA, we must ask: How were the data measured?

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.