Random Sampling

STAT 218 - Week 1, Lecture 2

January 10^th, 2024

From the Last Lecture - I

Where can I get a copy of this book?
- There are few options that I found
  - Course Reserves
  - Pearson - you can subscribe for 4 months
  - Cal Poly Textbook Facebook page: Rumor is that there is such a Facebook page where students sell textbooks at low prices.

We have talked about the steps of how to design a study in the last lecture.
- Among these steps, we decided that there is a need to have a data-gathering methods to address our research question with data.
- We also agreed that there should be a representative sample to be able to make inference about our population of interest.
Let’s dive into these terms.

population: consists of all subjects/participants/observational unit of interest (e.g., all squirrels in Cal Poly)
sample: a subset of a population with size n.
Generally, we would like to estimate something or make inferences about something that we want to know by selecting a sample from the population of interest.
e.g. Eighteen (n = 18) squirrels lived in Cal Poly.

If a sample is determined through simple random sampling, it means that

Every item, subject, specimen, or observational unit in the population has an equal chance of being selected in that sample.
The selection of these members of the sample are chosen independently of each other.

To be able to gain benefit from employing randomness, we generally use tools to eliminate bias.

Here are the steps for choosing a random sample of n observational units from a population of interest.

Determine the sampling frame: Give a unique ID number for each member (e.g., from 01 to 50).
Start reading numbers from a Random Digits Table or a computer.
Ignore any number that is not present in your population (e.g., 72)
Ignore any repeated occurrence of the same number.
Continue until the intended sample size is reached.

In many cases, choosing and implementing simple random sample is either difficult or impossible
- What should we do, then?
- How can we label ALL the squirrels in Cal Poly?

Of course, there are other random sampling options that are not simple. Two of them are:
- Random cluster sampling: Remember that we gave a unique ID number for each member (e.g., from 01 to 50)
  - In random cluster sampling, unique ID numbers are assigned to clusters or groups of observational units (e.g., from 01 to 50), and then all the observational units in those clusters are recruited for the study.
- Stratified random sampling: Our population of interest may consists of various strata (e.g., age groups, biological sex, geographical region, grade levels of students).
  - After determining the strata, multiple simple random samples are drawn from each stratum, and these multiple samples collectively represent a representative sample of the population of interest.

Stratified Random Sampling

Dan Kernler, CC BY-SA 4.0, via Wikimedia Commons

Random Cluster Sampling

Dan Kernler, CC BY-SA 4.0, via Wikimedia Commons

Sampling Error: The discrepancy between the sample and the population of interest.
- It is crucial to quantify this because that makes statistics one of the backbones of scientific thinking.
- There will always be a sampling error BUT
  - If we use some nonrandom sampling techniques, sampling error will become unpredictable (sampling bias)
Nonsampling Error: An error caused by other factors rather than sampling error.
- wording of the questions in a questionnaire
- nonresponse bias: e.g., bias occurs because of participants who are not responding some of the questions or not returning a written survey.
- missing data: e.g., experimental living organisms may die during the experiment, human subjects fail to participate in some/all of the sessions of a treatment group.
- …