Statistics 2nd ed

volume-velocity-variety

Lesson 14 — Big Data

big data

In the past, statistics dealt with small datasets: 20 students in a class, 50 patients in a trial.
Today, we live in the age of big data: millions of tweets, billions of web pages, streams of data from phones, sensors, and satellites.

Big data changes the scale of statistics.


What is Big Data?

Big data is often described by the 3 Vs:

  1. Volume — enormous amounts of data (terabytes, petabytes)
  2. Velocity — data generated quickly (social media streams, stock markets)
  3. Variety — many forms (numbers, text, images, audio, video)

Sometimes a fourth V is added: Veracity (how reliable are the data?).


Why Big Data Matters

  • Traditional statistics assumes small, clean datasets.
  • With big data, we need algorithms and computers to process information.
  • Sampling becomes less important when entire populations are measured (e.g., all tweets in a week).
  • Visualization and summaries are critical to make sense of huge datasets.

Example

  • A teacher records grades for 30 students → small dataset.
  • YouTube collects billions of video views per day → big data.

Statistical tools remain the same (mean, median, regression), but the scale requires computational methods.


Tools for Big Data

  • Databases (SQL, NoSQL) to store data
  • Distributed computing (Hadoop, Spark) to process data
  • Statistical programming (R, Python) for analysis

Visuals

Figure 14.1 — Big Data and the 3 Vs. Diagram showing Volume, Velocity, Variety (and Veracity) in overlapping circles.


Why This Matters

Big data connects statistics to the modern world:

  • Online behavior, medical records, GPS signals, shopping patterns
  • Algorithms detect patterns too large for humans to see
  • Big data powers modern AI and machine learning

Practice self-test quiz

In the space below, please find practice problems and self-test quizzes. For full access, please signup free.