Chapter 3 Introduction to R

A variable is used to store data including value, vector, data frame, etc, which R could use to manipulate (tutorialspoint 2019b). This chapter introduces variable types, operations between variables, data structures, conditional statements, loops, and functions.

Before we start, let’s first see how to name a variable. The valid variable name could be constructed with letters, numbers, the dot character (.), and underline character (_). Besides that, a valid variable name should start with a letter or the dot character not followed by a number.

Examples Validity Discussion
var.name
var_name
_var_name Cannot start with the underline
.var_name
var%name Cannot contain %
.2var_name Cannot use the dot followed by a number to start with a variable name
2var_name Cannot start with a number

3.1 Variable types

There are several types of variables which R could recognize, including character, numeric, integer, logical, and complex (Blischak et al. 2019). The type of one variable is decided by the type of value it stores. We can use class() function to check the type of each variable.

Character (also known as strings)

v <- "Hello, world!"
class(v)
## [1] "character"

Numeric (real or decimal number/integer)

v <- 59.28
class(v)
## [1] "numeric"

Integer (L tells R that this number is an integer)

v <- 2L
class(v)
## [1] "integer"
v <-2
class(v)
## [1] "numeric"

Logical (Usually True or false)

v <- TRUE
class(v)
## [1] "logical"
v <- FALSE
class(v)
## [1] "logical"

Complex (complex number is another type of number, different with real number)

v <- 1 + 4i
class(v)
## [1] "complex"

It is important to know clearly what is the type of the variable you are using since different types of variables may have different methods to deal with. Another caveat is that the outlook of the variable may not show its real variable type. For example, a common situation is that a variable contains numbers could be characters.

v <- "59.28"
class(v)
## [1] "character"

Here, the number has quotation marks outside, which means it has been transferred to type character. Therefore, please be careful about this!

3.2 Operations

An operation tells R the mathematical or logical manipulations (tutorialspoint 2019a).

3.2.1 Assignment operations

Assignment operators assign values to variables.

Left assignment

a <- 1
b <<- "Hello, world!"
c = c(1, 3, 4)

Right assignment

1 -> a
2 ->> b

3.2.2 Arithmetic operations

Add

1 + 1
## [1] 2

Subtract

5 - 3
## [1] 2

Multiple

3 * 5
## [1] 15

Divide

5 - 3
## [1] 2

Power

5 ^ 2
## [1] 25
5 ** 2 # you can also do like this
## [1] 25

Mode (find the remainder)

5 %% 2
## [1] 1

3.2.3 Relational operations

The relational operators compare the two elements and return a logical value (TRUE or FALSE)

Larger

3 > 4
## [1] FALSE
5 > 3
## [1] TRUE

Smaller

3 < 5
## [1] TRUE
4 < 2
## [1] FALSE

Equal

4 == 4
## [1] TRUE
5 == 4
## [1] FALSE

No less than (larger or equal to)

3 >= 4
## [1] FALSE
3 >= 2
## [1] TRUE

No larger than (samller or equal to)

5 <= 2
## [1] FALSE
5 <= 5
## [1] TRUE

Not equal

3 != 4
## [1] TRUE
3 != 3
## [1] FALSE

3.2.4 Logical operations

Logical operators are operations only for logical, numeric, or complex types. Most of the time, we apply them on logical values or variables. For numeric variables, 0 is considered FALSE and non-zero numbers are taken as TRUE (DataMentor 2019). You could use T for TRUE or F for FALSE as abbreviation.

Logical And

TRUE & TRUE
## [1] TRUE
FALSE & TRUE
## [1] FALSE
FALSE & FALSE
## [1] FALSE

Logical Or

TRUE | TRUE
## [1] TRUE
FALSE | TRUE
## [1] TRUE
FALSE | FALSE
## [1] FALSE

Logical Not

! TRUE
## [1] FALSE
! FALSE
## [1] TRUE

3.3 Data structures

Variables and values could construct different data structures including vector, matrix, data frame, list, and factor (Kabacoff 2019).

Vetor

You could create a vetor with c() function.

a <- c(5, 9, 2, 8) # create a numeric vector
a # show the value of this vetor
## [1] 5 9 2 8
b <- c('hello', 'world', '!') # character vector
b
## [1] "hello" "world" "!"
c <- c(5, 'good') # if you create a vector containing mixed variable types, such as numeric and character, R will restrict them to be the same variable type, here, character
c
## [1] "5"    "good"

You could select elements in the vetor by using var_name[#]. Please pay attention on how R indexes its elements in the data structure.

a[3] # select the 3rd element
## [1] 2
b[1:3] # select from the 1st to the 3rd element
## [1] "hello" "world" "!"
c[1] # select the 2nd element
## [1] "5"

1:3 means from 1 to 3, so it actually stands for three numbers here, which are 1, 2, 3.

Matrix

You could create a matrix using matrix() function.

a <- matrix(1:6,      # the data to be put in the matrix, here we use numbers from 1 to 6
            nrow = 2, # number of rows in the matrix
            ncol = 3, # number of columns in the matrix
            byrow = FALSE) # how to put the data in the matrix, FALSE means by columns, TURE means by rows.
a
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

For variable selection, the intuitive way is using coordinates.

a[2,3] # select the elements in the 2nd row and 3rd column
## [1] 6

You could also select the entire row or column.

a[ ,2] # the 2nd column
## [1] 3 4
a[1, ] # the 1st row
## [1] 1 3 5

Data frame

Data frame is a frequently-used data type in R. It could include columns with different types of values stored in them. Let’s create a dataframe with mixed variables types using data.frame() function.

ID <- c(1:4) # create variable ID
Name <- c('A', 'B', 'C', 'D') # create variable Name
Score <- c(69.5, 77.5, 81.5, 90) # create variable Score
df <- data.frame(ID, Name, Score) # combine the varibles into one data frame called df
df
##   ID Name Score
## 1  1    A  69.5
## 2  2    B  77.5
## 3  3    C  81.5
## 4  4    D  90.0

We created a data frame stored the students’ ID, name, and their test scores. If we want to select elements from this data frame, there are couple of ways.

df[2,3] # 2nd row and 3rd column
## [1] 77.5
df['ID'] # column of variable ID
##   ID
## 1  1
## 2  2
## 3  3
## 4  4
df[c('ID', 'Score')] # column of ID and Score
##   ID Score
## 1  1  69.5
## 2  2  77.5
## 3  3  81.5
## 4  4  90.0

There is another way to select the column by its name. When you type $ after the name of the data frame, RStudio will list all the variable names in that data frame which makes it easier to choose the variable you want. It is more commonly used.

df$Name # column of variable Name
## [1] A B C D
## Levels: A B C D

List

A list could store mixed types of values, which is different from vetor.

a <- list(ID = c(1, 2), Name = c('A', 'B'), Score = c(69.5, 89))

When you want to select elements from a list, you could do it in a similar way as a vector. However, list does not define row or column, so you cannot use 2-D coordinates to select elements like a data frame.

a[1]
## $ID
## [1] 1 2
a[2:3]
## $Name
## [1] "A" "B"
## 
## $Score
## [1] 69.5 89.0

Someone might be confusing since list looks silimar to data frame. Here is a good discussion about it. Due to the time limitation, we will not cover this discussion in class. The main idea is that list is more flexible than data frame, while data frame has more restrictions. However, since data frame is more similar to 2-D table structure which is more frequently used in our daily work. We apply data frame more than list.

Factor

Factor is the nominal variable in R. This type will be very useful when we want to analyze data from different groups, such as gender, school, etc.

a <- c(1, 2, 1, 2, 3, 3, 1, 1)
class(a)
## [1] "numeric"
afactor <- factor(a)
class(afactor)
## [1] "factor"

3.4 Conditional statements

if (test_expression){
  statement_1
} else {
  statement_2
}

If the test_expression returns TRUE, then the codes will go to statement_1, if it returns FALSE, the codes will go to statement_2. You could also omit the else part.

if (test_expression){
  statement_1
}

If the test_expression returns FALSE, the codes will continue to next line.

x <- 5
if (x > 3){
  print('x is larger than 3')
} else {
  print('x is not larger than 3')
}
## [1] "x is larger than 3"
x <- 1
if (x > 3){
  print('x is larger than 3')
} 

Some other conditional statements include switch, which, etc.

3.5 Loops

Loops help us repeat the codes we want to run in more than one times. for loop is the intuitive and commonly-used one.

for (range){
  statement
}

range will provide the range for a variable.

for (i in 1:3){
  print(i)
}
## [1] 1
## [1] 2
## [1] 3

3.6 Functions

Functions are codes have been defined with specific usage. You only need to input some necessary variables and functions will do the tasks and return the result. For example, sum() function could help you add the all the numbers in a vector or dataframe and return the sum.

sum(c(1, 4, 10, 5))
## [1] 20

Another example is mean() function could help you average the numbers in a vector or data frame and return the mean value.

mean(c(1, 4, 10, 5))
## [1] 5

It is important to use the right function to do the right task. To do this, you have to be familiar with the functions you are using. It needs more practice.