Statistics 2nd ed

Python

Appendix 5 — Technology Tips (On Your Phone & Laptop)

mean across tools

Statistics can be done with calculators, spreadsheets, or software. Here’s a quick guide.


Excel / Google Sheets

TaskFormulaExample
Mean=AVERAGE(A1:A10)Mean of scores in A1–A10
Standard Deviation=STDEV.S(A1:A10)Spread of scores
t-test=T.TEST(A1:A10,B1:B10,2,2)Compare two groups

R (RStudio or RStudio Cloud)

TaskCommandExample
Meanmean(x)mean(c(6,8,10)) = 8
SDsd(x)sd(c(6,8,10)) = 2
t-testt.test(x,y)Compare two groups

Python (NumPy / SciPy / Pandas)

TaskCommandExample
Meannp.mean(x)np.mean([6,8,10]) = 8
SDnp.std(x, ddof=1)np.std([6,8,10],ddof=1) = 2
t-teststats.ttest_ind(x,y)Compare two groups

iPhone Calculator

  • Rotate sideways → scientific mode
  • Use √ for square root
  • Parentheses matter: type numerator, then divide by denominator
  • Fine for small problems, but not for full datasets

Summary

  • For quick homework: iPhone calculator
  • For assignments: Excel / Google Sheets
  • For coding: Python (Colab) or R (RStudio Cloud)

📱 QR: Open sample data in Google Sheets (ready to practice mean, SD, t-test)


Visuals

Figure E.1 — Screenshots of the same mean calculation in Sheets, R, and Python side by side.

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.

Lesson 14 — Big Data

big data

In the past, statistics dealt with small datasets: 20 students in a class, 50 patients in a trial.
Today, we live in the age of big data: millions of tweets, billions of web pages, streams of data from phones, sensors, and satellites.

Big data changes the scale of statistics.


What is Big Data?

Big data is often described by the 3 Vs:

  1. Volume — enormous amounts of data (terabytes, petabytes)
  2. Velocity — data generated quickly (social media streams, stock markets)
  3. Variety — many forms (numbers, text, images, audio, video)

Sometimes a fourth V is added: Veracity (how reliable are the data?).


Why Big Data Matters

  • Traditional statistics assumes small, clean datasets.
  • With big data, we need algorithms and computers to process information.
  • Sampling becomes less important when entire populations are measured (e.g., all tweets in a week).
  • Visualization and summaries are critical to make sense of huge datasets.

Example

  • A teacher records grades for 30 students → small dataset.
  • YouTube collects billions of video views per day → big data.

Statistical tools remain the same (mean, median, regression), but the scale requires computational methods.


Tools for Big Data

  • Databases (SQL, NoSQL) to store data
  • Distributed computing (Hadoop, Spark) to process data
  • Statistical programming (R, Python) for analysis

Visuals

Figure 14.1 — Big Data and the 3 Vs. Diagram showing Volume, Velocity, Variety (and Veracity) in overlapping circles.


Why This Matters

Big data connects statistics to the modern world:

  • Online behavior, medical records, GPS signals, shopping patterns
  • Algorithms detect patterns too large for humans to see
  • Big data powers modern AI and machine learning

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.