
January 31th, 2024
are graphical representations of data
use different colors, shapes, and the coordinate system to summarize data
can tell a story or can be useful for exploring data
(A quick note: I used some of Dr Dogucu’s materials to this class because I love them!)
OR

- We could tell R something like…
smokesmoke on x-axis.count on y-axis.
These ideas are all correct but some are not necessary in R
smokesmoke on x-axis.count on y-axisR will do some of these steps by default.

Babies DataWe need to learn the variables before proceeding.
case: id number
bwt: birth weight, in ounces
gestation: length of gestation, in days
parity: binary indicator for a first pregnancy (0 = first pregnancy)
age: mother’s age in years
height: mother’s height in inches
weight: mother’s weight in pounds
smoke: binary indicator for whether the mother smokes
ggplot()Pick data
Map data onto aesthetics
Add the geometric layer
Let’s use smoke variable within babiesdataset which is a categorical variable indicating whether the mother smokes or not.
Let’s use smoke variable within babiesdataset which is a categorical variable indicating whether the mother smokes or not.
Let’s use smoke variable within babiesdataset which is a categorical variable indicating whether the mother smokes or not.
Let’s use bwt variable which is a numeric variable indicating birth weight in ounces
Let’s use bwt variable which is a numeric variable indicating birth weight in ounces
Let’s use bwt variable which is a numeric variable indicating birth weight in ounces
Let’s use bwt variable which is a numeric variable indicating birth weight in ounces

Choose your own color
bwt to the x-axis.
We are using the variables of parity: binary indicator for a first pregnancy and smoke: binary indicator for whether the mother smokes.
Now we will try to fill the y-axis as if it is something look like percentage which is called Standardized Bar Plot. Note that y-axis is no longer count but we will learn how to change that later.
We are visualizing a single numerical and single categorical variable by using geom_boxplot
We colored continuous variables by smoke
We put different shapes for continuous variables by smoke.
Now, we apply both different shapes and different colors.

Let’s use labs() function to increase its readability.

We added another layer called theme_bw(). This function is about the background, the size of the text etc.

Now, we elaborated this function a little bit more and omit the NA values.
One Dataset Visualized 25 Ways
Why are K-pop groups so big? (try Firefox)
We will only touch the surface of data visualization in this class. It is a rich field.