# Chapter 3 Introduction to R

A variable is used to store data including value, vector, data frame, *etc*, which `R`

could use to manipulate (tutorialspoint 2019b). This chapter introduces variable types, operations between variables, data structures, conditional statements, loops, and functions.

Before we start, let’s first see how to name a variable. The valid variable name could be constructed with letters, numbers, the dot character (`.`

), and underline character (`_`

). Besides that, a valid variable name should start with a letter or the dot character not followed by a number.

Examples | Validity | Discussion |
---|---|---|

var.name | ✓ | |

var_name | ✓ | |

_var_name | ☓ | Cannot start with the underline |

.var_name | ✓ | |

var%name | ☓ | Cannot contain % |

.2var_name | ☓ | Cannot use the dot followed by a number to start with a variable name |

2var_name | ☓ | Cannot start with a number |

## 3.1 Variable types

There are several types of variables which R could recognize, including character, numeric, integer, logical, and complex (Blischak et al. 2019). The type of one variable is decided by the type of value it stores. We can use `class()`

function to check the type of each variable.

**Character** (also known as strings)

```
<- "Hello, world!"
v class(v)
```

`## [1] "character"`

**Numeric** (real or decimal number/integer)

```
<- 59.28
v class(v)
```

`## [1] "numeric"`

**Integer** (`L`

tells R that this number is an integer)

```
<- 2L
v class(v)
```

`## [1] "integer"`

```
<-2
v class(v)
```

`## [1] "numeric"`

**Logical** (Usually True or false)

```
<- TRUE
v class(v)
```

`## [1] "logical"`

```
<- FALSE
v class(v)
```

`## [1] "logical"`

**Complex** (complex number is another type of number, different with real number)

```
<- 1 + 4i
v class(v)
```

`## [1] "complex"`

It is important to know clearly what is the type of the variable you are using since different types of variables may have different methods to deal with. Another caveat is that the outlook of the variable may not show its real variable type. For example, a common situation is that a variable contains numbers could be characters.

```
<- "59.28"
v class(v)
```

`## [1] "character"`

Here, the number has quotation marks outside, which means it has been transferred to type character. **Therefore, please be careful about this!**

## 3.2 Operations

An operation tells `R`

the mathematical or logical manipulations (tutorialspoint 2019a).

### 3.2.1 Assignment operations

Assignment operators assign values to variables.

**Left assignment**

```
<- 1
a <<- "Hello, world!"
b = c(1, 3, 4) c
```

**Right assignment**

```
1 -> a
2 ->> b
```

### 3.2.2 Arithmetic operations

**Add**

`1 + 1`

`## [1] 2`

**Subtract**

`5 - 3`

`## [1] 2`

**Multiple**

`3 * 5`

`## [1] 15`

**Divide**

`5 - 3`

`## [1] 2`

**Power**

`5 ^ 2`

`## [1] 25`

`5 ** 2 # you can also do like this`

`## [1] 25`

**Mode** (find the remainder)

`5 %% 2`

`## [1] 1`

### 3.2.3 Relational operations

The relational operators compare the two elements and return a logical value (`TRUE`

or `FALSE`

)

**Larger**

`3 > 4`

`## [1] FALSE`

`5 > 3`

`## [1] TRUE`

**Smaller**

`3 < 5`

`## [1] TRUE`

`4 < 2`

`## [1] FALSE`

**Equal**

`4 == 4`

`## [1] TRUE`

`5 == 4`

`## [1] FALSE`

**No less than** (larger or equal to)

`3 >= 4`

`## [1] FALSE`

`3 >= 2`

`## [1] TRUE`

**No larger than** (samller or equal to)

`5 <= 2`

`## [1] FALSE`

`5 <= 5`

`## [1] TRUE`

**Not equal**

`3 != 4`

`## [1] TRUE`

`3 != 3`

`## [1] FALSE`

### 3.2.4 Logical operations

Logical operators are operations only for logical, numeric, or complex types. Most of the time, we apply them on logical values or variables. For numeric variables, 0 is considered `FALSE`

and non-zero numbers are taken as `TRUE`

(DataMentor 2019). You could use `T`

for `TRUE`

or `F`

for `FALSE`

as abbreviation.

**Logical And**

`TRUE & TRUE`

`## [1] TRUE`

`FALSE & TRUE`

`## [1] FALSE`

`FALSE & FALSE`

`## [1] FALSE`

**Logical Or**

`TRUE | TRUE`

`## [1] TRUE`

`FALSE | TRUE`

`## [1] TRUE`

`FALSE | FALSE`

`## [1] FALSE`

**Logical Not**

`! TRUE`

`## [1] FALSE`

`! FALSE`

`## [1] TRUE`

## 3.3 Data structures

Variables and values could construct different data structures including vector, matrix, data frame, list, and factor (Kabacoff 2019).

**Vetor**

You could create a vetor with `c()`

function.

```
<- c(5, 9, 2, 8) # create a numeric vector
a # show the value of this vetor a
```

`## [1] 5 9 2 8`

```
<- c('hello', 'world', '!') # character vector
b b
```

`## [1] "hello" "world" "!"`

```
<- c(5, 'good') # if you create a vector containing mixed variable types, such as numeric and character, R will restrict them to be the same variable type, here, character
c c
```

`## [1] "5" "good"`

You could select elements in the vetor by using `var_name[#]`

. Please pay attention on how `R`

indexes its elements in the data structure.

`3] # select the 3rd element a[`

`## [1] 2`

`1:3] # select from the 1st to the 3rd element b[`

`## [1] "hello" "world" "!"`

`1] # select the 2nd element c[`

`## [1] "5"`

`1:3`

means from 1 to 3, so it actually stands for three numbers here, which are 1, 2, 3.

**Matrix**

You could create a matrix using `matrix()`

function.

```
<- matrix(1:6, # the data to be put in the matrix, here we use numbers from 1 to 6
a nrow = 2, # number of rows in the matrix
ncol = 3, # number of columns in the matrix
byrow = FALSE) # how to put the data in the matrix, FALSE means by columns, TURE means by rows.
a
```

```
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
```

For variable selection, the intuitive way is using coordinates.

`2,3] # select the elements in the 2nd row and 3rd column a[`

`## [1] 6`

You could also select the entire row or column.

`2] # the 2nd column a[ ,`

`## [1] 3 4`

`1, ] # the 1st row a[`

`## [1] 1 3 5`

**Data frame**

Data frame is a **frequently-used** data type in R. It could include columns with different types of values stored in them. Let’s create a dataframe with mixed variables types using `data.frame()`

function.

```
<- c(1:4) # create variable ID
ID <- c('A', 'B', 'C', 'D') # create variable Name
Name <- c(69.5, 77.5, 81.5, 90) # create variable Score
Score <- data.frame(ID, Name, Score) # combine the varibles into one data frame called df
df df
```

```
## ID Name Score
## 1 1 A 69.5
## 2 2 B 77.5
## 3 3 C 81.5
## 4 4 D 90.0
```

We created a data frame stored the students’ ID, name, and their test scores. If we want to select elements from this data frame, there are couple of ways.

`2,3] # 2nd row and 3rd column df[`

`## [1] 77.5`

`'ID'] # column of variable ID df[`

```
## ID
## 1 1
## 2 2
## 3 3
## 4 4
```

`c('ID', 'Score')] # column of ID and Score df[`

```
## ID Score
## 1 1 69.5
## 2 2 77.5
## 3 3 81.5
## 4 4 90.0
```

There is another way to select the column by its name. When you type `$`

after the name of the data frame, RStudio will list all the variable names in that data frame which makes it easier to choose the variable you want. It is more commonly used.

`$Name # column of variable Name df`

```
## [1] A B C D
## Levels: A B C D
```

**List**

A list could store mixed types of values, which is different from vetor.

`<- list(ID = c(1, 2), Name = c('A', 'B'), Score = c(69.5, 89)) a `

When you want to select elements from a list, you could do it in a similar way as a vector. However, list does not define row or column, so you cannot use 2-D coordinates to select elements like a data frame.

`1] a[`

```
## $ID
## [1] 1 2
```

`2:3] a[`

```
## $Name
## [1] "A" "B"
##
## $Score
## [1] 69.5 89.0
```

Someone might be confusing since list looks silimar to data frame. Here is a good discussion about it. Due to the time limitation, we will not cover this discussion in class. The main idea is that list is more flexible than data frame, while data frame has more restrictions. However, since data frame is more similar to 2-D table structure which is more frequently used in our daily work. We apply data frame more than list.

**Factor**

Factor is the nominal variable in R. This type will be very useful when we want to analyze data from different groups, such as gender, school, *etc*.

```
<- c(1, 2, 1, 2, 3, 3, 1, 1)
a class(a)
```

`## [1] "numeric"`

```
<- factor(a)
afactor class(afactor)
```

`## [1] "factor"`

## 3.4 Conditional statements

```
if (test_expression){
statement_1else {
}
statement_2 }
```

If the `test_expression`

returns `TRUE`

, then the codes will go to `statement_1`

, if it returns `FALSE`

, the codes will go to `statement_2`

. You could also omit the `else`

part.

```
if (test_expression){
statement_1 }
```

If the `test_expression`

returns `FALSE`

, the codes will continue to next line.

```
<- 5
x if (x > 3){
print('x is larger than 3')
else {
} print('x is not larger than 3')
}
```

`## [1] "x is larger than 3"`

```
<- 1
x if (x > 3){
print('x is larger than 3')
}
```

Some other conditional statements include `switch`

, `which`

, *etc*.

## 3.5 Loops

Loops help us repeat the codes we want to run in more than one times. `for`

loop is the intuitive and commonly-used one.

```
for (range){
statement }
```

`range`

will provide the range for a variable.

```
for (i in 1:3){
print(i)
}
```

```
## [1] 1
## [1] 2
## [1] 3
```

## 3.6 Functions

Functions are codes have been defined with specific usage. You only need to input some necessary variables and functions will do the tasks and return the result. For example, `sum()`

function could help you add the all the numbers in a vector or dataframe and return the sum.

`sum(c(1, 4, 10, 5))`

`## [1] 20`

Another example is `mean()`

function could help you average the numbers in a vector or data frame and return the mean value.

`mean(c(1, 4, 10, 5))`

`## [1] 5`

It is important to use the right function to do the right task. To do this, you have to be familiar with the functions you are using. It needs more practice.