OCMA 2025 Reimagining introductory statistics

Rethinking the teaching of introductory statistics in Ontario colleges.

Rationale:

A math centric approach to the teaching of introductory statistics is rooted in Probability Theory, Probability Models, Expectation and the study of Various Distributions (Normal, Binomial, Poisson, Hypergeometric, Chi-square, F, student t, etc.) that serve as useful mathematical models for a variety of ‘real world’ scenarios.  The discourse is algebraic and powered by calculus.

Judging by a non-random sample of course outlines and textbooks Ontario College intro statistics courses are organized by topics oriented both to the mathematical theories (probability distributions and the central limit theorem) but also activities like measures of centre and dispersion in population distributions, sampling, hypothesis testing and confidence intervals.

Here are some course outlines and textbook structures that may help illustrate the point.

Sheridan, Humber, Mohawk, OER text,

The American Statistical Association’s draft college course outline.

After careful consideration, and some reflection on my own teaching I came to the realization that the organizational structure of a typical introductory statistics course and textbook  do not and cannot make sense to students who are not mathematicians, nor are the topics grounded in a way that helps teachers make sense of them to students. Topics covered are disjoint and teachers and students have a hard time seeing the connections and relations between them.  For example: contingency tables (aka. cross-tabulations) a crucial element of health sciences research tend to get glossed over in college intro statistics, and when they do appear do so under the topic ‘chi-square test of independence’.

Consequences:

Taking a data centred approach allows us – collectively as teachers – and inspired me in particular, to organize introductory statistics courses more clearly.  Etymologically, the concept data is grounded in the idea of information that is given – in Latin datum (the thing that was given) or the verb dare (to give). It is worth noting that data does not just appear, it is part of a process of giving from the research subject and receiving from the researcher or surveyor.

Starting from the grounded understanding of data as ‘information given’ or ‘information collected’ provides the impetus for a reasonably valid set of building blocks that are accessible to the student without trivializing the core mathematics. Here are a few building blocks and decisions I made (perspectives I took) in order to help do that.

  1. There are effectively two types of data Categorical and Measurement (no more interval/ratio qualitative/quantitative etc.).
    • Categorical data is grounded in classification of the object being studied into one of two or more pre-determined (hopefully mutually exclusive) categories.
    • Measurement data is grounded in describing a characteristic of the object at hand through measurement a feature of that object (e.g. head circumference of a human).
  2. There are five types of ‘real-life’ scenarios accessible in an introductory statistics course for college students: single measurement variable, single categorical variable, two categorical variables, two measurement variables and one of each.
  3. Statistical significance is important, but without understanding what I call practical significance (aka clinical significance, or measures of effect) statistical significance results (especially results of hypothesis tests… p-values) will have no meaning to students.
  4. Non-inferential statistical methods help us assess practical significance; inferential statistical methods help us assess statistical significance.
  5. The goal of an introductory statistics course is to learn the workings of each of the type of data situation (see #2 above).

The result:

Simplified topic outline of intro statistics course from data type perspective (full version here covers the first four units):

Unit 1 – Foundations: about data, probability and the research process

Unit 2 – Scenarios where one variable: separate methods for categorical and measurement variable.  Includes introduction to Normal distribution as a model for measurement data.

Unit 3 – Scenarios where two variables are needed:  Compare Means (and medians and histograms and boxplots), Compare Rates, Correlation. Include visualizations + calculations of practical significance:  difference between means (raw difference, %difference and Cohen’s d), relative risk and risk difference and Pearson’s r.

Unit 4 – Inference with one variable scenarios; focus on confidence intervals for single mean and proportion (rate) over and above hypothesis testing. Making sense of confidence intervals takes time.

Unit 5 – Inference in scenarios where two variables are needed: focus on confidence intervals.

Bonus (if time permits) Unit 6 – Interpreting the work of others (depending on context). Reading news, technical and research reports where statistics are presented.

To help students become familiar with the material, and fill in any gaps students come with, I’ve developed a few apps that help students practice the sorts of exercises that stimulate some of the decision making needed and some of the skills in number sense, calculations and getting comfortable with new terminology needed for the various units.

Another perspective:

using Bayes approach as grounding for introductory statistics course:

Links to and descriptors of apps:

HNP webapp – helps develop many of the basic number sense skills needed for introductory statistics as well as some of the specific language needed for the units. Current version is set up to facilitate teacher use for part of a foundations mathematics or introductory statistics course. Students need to be registered and added into course created by the teacher using material available.

statcat.ca a webapp (registration needed) that allows students to practice reading health sciences scenarios and decide on best approach to analysis from a data type perspective.  Includes opportunity to practice choosing correct hypothesis test too.

Other resources:

CAUSE – a USA based organization dedicated to improving statistics education for non-mathematicians.