STAT 218 - Week 6, Lecture 1
February 12th, 2024
Important
NULL AND ALTERNATIVE HYPOTHESES
In 2012, physicists suggested that they had discovered the existence of a subatomic particle known as the Higgs boson, based on some data and a \(P\)-value of 0.0000003. What they meant was:
A two-tailed test is used to test the null hypothesis against the alternative hypothesis which is also known as nondirectional alternative. \[ \\H_0: \mu_1 = \mu_2 \\H_A: \mu_1 \neq \mu_2 \]
In some studies it is apparent from the beginning—before the data are collected— that there is only one reasonable direction of deviation from \(H_0\). In such situations it is appropriate to formulate a directional alternative hypothesis. \[ \\H_0: \mu_1 = \mu_2 \\H_A: \mu_1 < \mu_2 \] OR
\[ \\H_0: \mu_1 = \mu_2 \\H_A: \mu_1 > \mu_2 \]
The general idea is to formulate as a hypothesis the statement and then to see whether the data provide sufficient evidence.
Important
We have 4 steps to do that
The average time for all runners who finished the Cherry Blossom Race in 2006 was 93.29 minutes (93 minutes and about 17 seconds). We want to determine using data from 100 participants in the 2017 Cherry Blossom Race whether runners in this race are getting faster or slower, versus the other possibility that there has been no change.
The sample mean and sample standard deviation of the sample of 100 runners from the 2017 Cherry Blossom Race are 97.32 and 16.98 minutes, respectively.(\(\alpha = 0.05\))
\(H_0\): The average 10-mile run time was the same for 2006 and 2017.
\(\mu\) = 93.29 minutes.
\(H_A\): The average 10-mile run time for 2017 was different than that of 2006. \(\mu \neq 93.29\) minutes.
\[ SE = s/ \sqrt{n} \\ 16.98 \sqrt{100} = 1.70 \]
\[ t_s = \frac {{97.32 - 93.29}}{1.70} = 2.37 \]
for \(df = 100 - 1 = 99\), we can find the \(P\)-value by using computer as \(P\)-value = 0.02
Conclusion: Because the \(P\)-value is smaller than 0.05, we reject the null hypothesis. That is, the data provide strong evidence that the average run time for the Cherry Blossom Run in 2017 is different than the 2006 average.
When do we need to use this test?
1 group, 2 different occasions or under 2 different conditions (pre-test/post-test)
Matched subjects
Notation
In this paired-sample \(t\) test we analyze differences
\[ D = Y_1 - Y_2 \] Then \(\bar{D}\) can be considered as follows:
\[ \bar{D} = \bar{Y_1} - \bar{Y_2} \]
which can be an analogous of
\[ \mu_D = {\mu_1} - {\mu_2} \] We may say that the mean of the difference is equal to the difference of the means.
Hunger Rating During a weight loss study, each of nine subjects was given (1) the active drug m-chlorophenylpiperazine (mCPP) for 2 weeks and then a placebo for another 2 weeks, or (2) the placebo for the first 2 weeks and then mCPP for the second 2 weeks. As part of the study, the subjects were asked to rate how hungry there were at the end of each 2-week period.
Let us test \(H_0\) against \(H_A\) at significance level \(\alpha\) = 0.05.
\(H_0: \mu_{D} = 0\) \(H_A: \mu_{D} \neq 0\)
\[ SE_{\bar{D}} = \frac{s_D}{\sqrt{n_D}} \]
\[ t_s= \frac {\bar{d} - \mu_{D}}{SE_{\bar{D}}} \]
\[ SE_{\bar{D}} = \frac{32.8}{\sqrt{9}} = 10.9 \]
\[ t_s= \frac {-29.6 - 0}{10.9} = -2.71 \] Using a computer gives the \(P\)-value as \(P\)= 0.027.
Reject \(H_0\) and find that there is sufficient evidence to conclude that hunger when taking mCPP is different from hunger when taking a placebo.
Definition
\(\alpha\) = Pr{finding significant evidence for \(H_A\)} if \(H_0\) is true
OR
rejecting the null hypothesis when \(H_0\) is actually true.
Definition
\(\beta\) = Pr{lack of significant evidence for \(H_A\)} if \(H_A\) is true
OR
failing to reject the null hypothesis when the alternative is actually true.
From Essential Guide to Effect Sizes by Paul D. Ellis (2010)
The chance of not making a Type II error when \(H_A\) is true —that is, the chance of having significant evidence for \(H_A\) when \(H_A\) is true— is called the power of a statistical test:
Definition
Power = 1 - \(\beta\) = Pr{significant evidence for \(H_A\)} if HA is true
Thus, the power of a \(t\) test is a measure of the sensitivity of the test, or the ability of the test procedure to detect a difference between \(\mu_1\) and \(\mu_2\) when such a difference really does exist.
In this way the power is analogous to the resolving power of a microscope.
The power of a statistical test depends on many factors in an investigationincluding
All other things being equal, using larger samples gives more information and thereby increases power.
Our discussion of the t test and confidence interval (in Sections 7.3–7.8) was based on the conditions of
the data can be regarded as coming from two independently chosen random samples,
the observations are independent within each sample, and
each of the populations is normally distributed (If n1 and n2 are large, this condition is less important).
It may be invalid in the sense that the actual risk of Type I error is larger than the nominal significance level a. (To put this another way, the \(P\)-value yielded by the \(t\) test procedure may be inappropriately small.)
The \(t\) test may be valid, but less powerful than a more appropriate test.