
STAT 218 - Week 5, Lecture 2
February 6th, 2024
Weekly Assignments
Having more practice problems together
RStudio/Coding
ggplot() |
babies data set |
geom_boxplot() |
geom_bar() |
| confidence interval | aesthetics | geometric layer | parity |
| histogram | binwidth | stacked bar-plot | standardized bar plot |
| dodged bar plot | boxplot | gestation | Mandatory Paid Vacation |
labs() |
theme_bw() |
element_text() |
geom_point() |
We will learn…
\(t\)-distribution is another bell shape and symmetric distribution that can be useful if we do not know anything about population parameters.
The \(t\)-distribution is always centered at zero and has a single parameter: degrees of freedom.
Broadly speaking, we use \(t\)-distribution with \(df = n − 1\)

Both are symmetric and bell-shaped but \(t\)-distribution has a larger standard deviation.
The \(t\)-distribution has a single parameter: degrees of freedom.
Standard Normal Distribution has two parameters: \(\mu\) and \(\sigma\).
The tails of \(t\) distributions are thicker than the normal curves,
We learned that our estimates are subject to sampling error.
Sampling Error: the amount of discrepancy between \(\bar{y}\) and \(\mu\) is described (in a probability sense) by the sampling distribution of \(\bar{Y}\)
The standard error of the mean is defined as follow:
\[ SE_\bar{Y} = \frac{s}{\sqrt{n}} \]

\[ 95 \% \ CI = (\bar{Y} \pm 1.96 \times \frac{\sigma}{\sqrt{n}}) \]
\[ SE_{\bar{y}} = \frac{s}{\sqrt{n}} \]
where the critical value \(t_{0.025}\) is determined from Student’s \(t\)-distribution with
\[ df = n - 1 \]
Let’s have a look wing areas of 14 male Monarch butterflies at Oceano Dunes State Park in California
Suppose we consider these 14 observations as a random sample from a population.
\[ df = n - 1 \\ df = 14 - 1 \\ df = 13 \]
From the Table 4, we find
\[ t_{0.025} = 2.160 \]
95% confidence interval (CI) for \(\mu\) can be calculated as following:
\[ \\95 \% \ CI = (\bar{y} \pm t_{0.025} \ \times \ SE_{\bar{y}}) \\95 \% \ CI = (32.8143 \pm 2.160 \ \times \ 2.4757 / \sqrt{14}) \]
\[ \\= 32.81 \pm 1.43 \\ 31.43 \ cm^2 < \mu < 34.2 \ cm^2 \\ OR \\ 95 \% \ CI = (31.43,34.2) \]
We are 95% confident that the true population mean is in this confidence interval.
90% confidence interval (CI) for \(\mu\) can be calculated as following:
\[ \\90 \% \ CI = (\bar{y} \pm t_{0.05} SE_{\bar{y}}) \\90 \% \ CI = (32.8143 \pm 1.771 \ \times \ 2.4757 / \sqrt{14}) \]
\[ \\= 32.81 \pm 1.17 \\ 31.64 \ cm^2 < \mu < 33.98 \ cm^2 \]
We are 90% confident that the true population mean is in this confidence interval.
Think-pair-share: What is the difference between 90% CI and 95% CI?
Recall that
\[ SE_{\bar{y}} = \frac{s}{\sqrt{n}} \]
We can use this formula to determine our sample size as follows:
\[ Desired \ SE = \frac{Guessed \ SD}{\sqrt{n}} \]
Suppose the researcher is now planning a new study of butterflies Monarch butterflies at Oceano Dunes State Park in California and has decided that it would be desirable that the SE be no more than \(0.4 \ cm^2\)
\[ SE_{\bar{y}} = s / \sqrt{n} \]
\[ Desired \ SE = Guessed \ SD / \sqrt{n} \]
\[ \\Desired \ SE = 2.48 / \sqrt{n} \ \le 0.4 \\ n\ge 38.4 \] \[ \\ at \ least \ 39 \ butterflies \]
Note
1. Conditions on the design of the study
(a) It must be reasonable to regard the data as a random sample from a large population.
(b) The observations in the sample must be independent of each other.
2. Conditions on the form of the population distribution
(a) If \(n\) is small, the population distribution must be approximately normal.
(b) If \(n\) is large, the population distribution need not be approximately normal. The requirement that the data are a random sample is the most important condition.