# Chapter 6 Data Visualization with Base Functions

In this chapter, we go through the base plotting functions in R.

## import the Minneapolis ACS dataset
minneapolis <- read.csv('minneapolis.csv')

## 6.1 Scatter plot

Scatter plot is a good way to show the distribution of data points.

plot(minneapolis$AGE, minneapolis$INCTOT) # with first as x, and second as y Or, you could use the variable names directly and indicate the dataset.

plot(INCTOT ~ AGE, data = minneapolis) # you have to specify the name of the data frame here ## 6.2 Line plot

You could transfer the scatter plot to a line plot by just adding a type variable to indicate that you are plotting a line. Line plot is good for presenting the trend of a variable changing by time.

library(dplyr)
mean_income <- minneapolis %>%
group_by(YEAR) %>%
summarise(AvgInc = mean(INCTOT, na.rm = T))

plot(AvgInc ~ YEAR,
data = mean_income,
type = 'l') ## type indicates the line type with l You could also choose another type by changing the value of type.

plot(AvgInc ~ YEAR,
data = mean_income,
type = 'b') # b for both line and pint You could use help(plot) to check more styles of the plots.

## 6.3 Bar plot

Bar plot is a good way to compare the values in each year or for each item. You could use barplot() to make a bar plot.

barplot(mean_income$AvgInc, names.arg = mean_income$YEAR) # names.org indicates the vector of names to be plotted under each bar ## 6.4 Add more elements in the plots

For a reader-friendly plots, you have to add more information such as title, labels, and legend. For the plot above, we could use the codes below to make it more informative.

barplot(mean_income$AvgInc, names.arg = mean_income$YEAR,
main = 'Bar plot of the average personal income in Minneapolis (2010-2019)', # add title for the plot
xlab = 'Year', # add label tag for the x-axis
ylab = 'Personal income (dollars)',  # add label for the y-axis
ylim = c(0, 65000), # set the range of y axis, you could set the range of x axis with xlim
legend = 'Personal income') # add legend name ## 6.5 Pie chart

Pie chart is a good way to show the share of each part. You could use pie() function to draw a pie chart in R.

minneaplis_race <- minneapolis %>%
mutate(
RACE = case_when( ## change RACE from numeric values to racial categories
RACE == 1 ~ 'White',
RACE == 2 ~ 'African American',
RACE == 3 ~ 'Other'
)
) %>%
group_by(RACE) %>%
summarise(count = n())

pie(minneaplis_race$count, # value for each piece labels = minneaplis_race$RACE) # label for each piece ## 6.6 Boxplot

Box plot is also called box-whisker plot. It is to present the distribution of the dataset based on their quartiles. In R, you could use boxplot() to draw a box plot.

t <- c(1, 5, 10, 7, 8, 10, 11, 19)
boxplot(t, range = 0) # set range = 0 makes the whiskers reach the smallest and largest values in the dataset t <- c(1, 5, 10, 7, 8, 10, 11, 19)
boxplot(t, range = 1) # set range = 1 makes the the whiskers extend to the most extreme data point # which is no more than range times the interquartile range from the box

## 6.7 Color in R

You could change the color of the plots by adding col = in the functions. For example.

plot(AvgInc ~ YEAR,
data = mean_income,
type = 'b',
col = 'YellowGreen') # specify the name of the color Here is a link where you could find the name of the color.

You could also use the hexadecimal color code to indicate the color. For example.

barplot(mean_income$AvgInc, names.arg = mean_income$YEAR,
main = 'Bar plot of the average personal income in Minneapolis (2010-2019)',
xlab = 'Year',
ylab = 'Personal income (dollars)',
ylim = c(0, 65000),
legend = 'Personal income',
col = '#009999') # use the hexadecimal color code, you need to start it with the hash tag By the same link, you could also find the hexadcimal code for each color.