Chapter 6 Data Visualization with Base Functions
We go through the base plotting functions in R in this chapter.
## import the Minneapolis ACS dataset <- read.csv('minneapolis.csv')minneapolis
6.1 Scatter plot
Scatter plot is a good way to show the distribution of data points.
plot(minneapolis$AGE, minneapolis$INCTOT) # with first as x, and second as y
Or, you could use the variable names directly and indicate the dataset as the codes below. You will get the same result.
plot(INCTOT ~ AGE, data = minneapolis) # you have to specify the name of the data frame here
6.2 Line plot
You could transfer the scatter plot above to a line plot by just adding a
type variable to indicate that you are plotting a line. Line plot is good for presenting the trend of a variable changing by time.
library(dplyr) <- minneapolis %>% mean_income group_by(YEAR) %>% summarise(AvgInc = mean(INCTOT, na.rm = T)) plot(AvgInc ~ YEAR, data = mean_income, type = 'l') ## type indicates the line type with l
You could also choose another type by changing the value of
type, as the one below.
plot(AvgInc ~ YEAR, data = mean_income, type = 'b') # b for both line and pint
You could use
help(plot) to check more styles of the plots.
6.3 Bar plot
Bar plot is a good way to compare the values in each year or for each item. You could use
barplot() to draw it.
barplot(mean_income$AvgInc, names.arg = mean_income$YEAR) # names.org indicates the vector of names to be plotted under each bar
6.4 Add more elements in the plots
For a reader-friendly plots, you have to add more information such as title, labels, and legend. For the plot above, we could use the codes below to make it more informative.
barplot(mean_income$AvgInc, names.arg = mean_income$YEAR, main = 'Bar plot of the average personal income in Minneapolis (2010-2019)', # add title for the plot xlab = 'Year', # add label tag for the x-axis ylab = 'Personal income (dollars)', # add label for the y-axis ylim = c(0, 65000), # set the range of y axis, you could set the range of x axis with xlim legend = 'Personal income') # add legend name
6.5 Pie chart
Pie chart is a good way to show the share of each part. You could use
pie() function to draw a pie chart in R.
<- minneapolis %>% minneaplis_race mutate( RACE = case_when( ## change RACE from numeric values to racial categories == 1 ~ 'White', RACE == 2 ~ 'African American', RACE == 3 ~ 'Other' RACE )%>% ) group_by(RACE) %>% summarise(count = n()) pie(minneaplis_race$count, # value for each piece labels = minneaplis_race$RACE) # label for each piece
Box plot is also called box-whisker plot. It is to present the distribution of the dataset based on their quartiles. In R, you could use
boxplot() to draw a box plot.
<- c(1, 5, 10, 7, 8, 10, 11, 19) t boxplot(t, range = 0) # set range = 0 makes the whiskers reach the smallest and largest values in the dataset
<- c(1, 5, 10, 7, 8, 10, 11, 19) t boxplot(t, range = 1) # set range = 1 makes the the whiskers extend to the most extreme data point which is no more than range times the interquartile range from the box
6.7 Color in R
You could change the color of the plots by adding
col = in the functions. For example.
plot(AvgInc ~ YEAR, data = mean_income, type = 'b', col = 'YellowGreen') # specify the name of the color
Here is a link where you could find the name of the color.
You could also use the hexadecimal color code to indicate the color. For example.
barplot(mean_income$AvgInc, names.arg = mean_income$YEAR, main = 'Bar plot of the average personal income in Minneapolis (2010-2019)', xlab = 'Year', ylab = 'Personal income (dollars)', ylim = c(0, 65000), legend = 'Personal income', col = '#009999') # use the hexadecimal color code, you need to start it with the hash tag
By the same link, you could also find the hexadcimal code for each color.