# A tibble: 1 × 7
statistic t_df p_value alternative estimate lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1.99 12.8 0.0679 two.sided 4.90 -0.418 10.2
STAT 218 - Week 5, Lecture 4
February 8th, 2024
Suppose that we have 2 samples from two populations.
If the two samples look quite similar to each other, we might infer that the two populations are identical;
If the samples look quite different, we would infer that the populations differ.
The question is, then…
An approach: Compare the two sample means to be able to explore how much they differ in comparison to the amount of difference we would expect to see due to chance.
Let’s consider this example. We’re interested in understanding how much people know about world health and development. If we take a multiple choice world health question, then we might like to understand if
These competing ideas are called hypotheses. We call \(H_0\) the null hypothesis and \(H_A\) the alternative hypothesis.
Important
NULL AND ALTERNATIVE HYPOTHESES
The general idea is to formulate as a hypothesis the statement that \(\mu_1\) and \(\mu_2\) differ and then to see whether the data provide sufficient evidence in support of that hypothesis.
Important
We have 4 steps to do that
The hypothesis that \(\mu_1\) and \(\mu_2\) are not equal is called an alternative hypothesis (or a research hypothesis)
\[ H_A: \mu_1 \neq \mu_2 \]
Its antithesis is the null hypothesis, \[ H_0: \mu_1 = \mu_2 \]
which asserts that \(\mu_1\) and \(\mu_2\) are equal. A researcher would usually express these hypotheses more informally and we can trace those hypotheses from the examples, problems and exercises in this course.
Alternatively we can express these hypotheses as following:
\[ \\H_0: \mu_1 - \mu_2 = 0 \\H_A: \mu_1 - \mu_2 \neq 0 \]
The \(t\) test is a standard method of choosing between these two hypotheses. To carry out the \(t\) test, the first step is to compute the test statistic.
It is a measure of how far the difference between the sample means (\(\bar{y}\)’s) is from the difference we would expect to see if \(H_0\) were true (zero difference), expressed in relation to the SE of the difference — the amount of variation we expect to see in differences of means from random samples.
The subscript “s” on \(t_s\) serves as a reminder that this value is calculated from the data (“s” for “sample”).
The quantity \(t_s\) is the test statistic for the \(t\) test; that is, ts provides the data summary that is the basis for the test procedure.
\[ t_s = \frac{(\bar{y}_1 - \bar{y}_2) - (\mu_1 - \mu_2)}{SE(\bar{y}_1 - \bar{y}_2)} \]
Important
We can think of a as a preset threshold of statistical significance (OR the risk of false positive finding).

The Wisconsin Fast Plant, Brassica campestris, has a very rapid growth cycle that makes it particularly well suited for the study of factors that affect plant growth.
In one such study, 7 plants were treated with the substance Ancymidol (ancy) and were compared to 8 control plants that were given ordinary water. Heights of all of the plants were measured, in cm, after 14 days of growth.
(\(df\) for this question is calculated as 12).
Let’s see an example for hypothesis testing by using \(\alpha = 0.05\)

# A tibble: 1 × 7
statistic t_df p_value alternative estimate lower_ci upper_ci
<dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 1.99 12.8 0.0679 two.sided 4.90 -0.418 10.2
Remember!
the data can be regarded as coming from two independently chosen random samples,
the observations are independent within each sample, and
each of the populations is normally distributed.
If \(n_1\) and \(n_2\) are large, condition (3) is less important.