Chapter 12 Advanced `R`

This chapter continues with some advanced usage examples from chapter 1

12.1 More Vectorization

x = c(1, 3, 5, 7, 8, 9)
y = 1:100

x + 2

#OUT> [1]  3  5  7  9 10 11

x + rep(2, 6)

#OUT> [1]  3  5  7  9 10 11

x > 3

#OUT> [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

x > rep(3, 6)

#OUT> [1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE

x + y

#OUT> Warning in x + y: longer object length is not a multiple of shorter object
#OUT> length

#OUT>   [1]   2   5   8  11  13  15   8  11  14  17  19  21  14  17  20  23  25
#OUT>  [18]  27  20  23  26  29  31  33  26  29  32  35  37  39  32  35  38  41
#OUT>  [35]  43  45  38  41  44  47  49  51  44  47  50  53  55  57  50  53  56
#OUT>  [52]  59  61  63  56  59  62  65  67  69  62  65  68  71  73  75  68  71
#OUT>  [69]  74  77  79  81  74  77  80  83  85  87  80  83  86  89  91  93  86
#OUT>  [86]  89  92  95  97  99  92  95  98 101 103 105  98 101 104 107

length(x)

#OUT> [1] 6

length(y)

#OUT> [1] 100

length(y) / length(x)

#OUT> [1] 16.66667

(x + y) - y

#OUT> Warning in x + y: longer object length is not a multiple of shorter object
#OUT> length

#OUT>   [1] 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8
#OUT>  [36] 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7
#OUT>  [71] 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7

y = 1:60
x + y

#OUT>  [1]  2  5  8 11 13 15  8 11 14 17 19 21 14 17 20 23 25 27 20 23 26 29 31
#OUT> [24] 33 26 29 32 35 37 39 32 35 38 41 43 45 38 41 44 47 49 51 44 47 50 53
#OUT> [47] 55 57 50 53 56 59 61 63 56 59 62 65 67 69

length(y) / length(x)

#OUT> [1] 10

rep(x, 10) + y

#OUT>  [1]  2  5  8 11 13 15  8 11 14 17 19 21 14 17 20 23 25 27 20 23 26 29 31
#OUT> [24] 33 26 29 32 35 37 39 32 35 38 41 43 45 38 41 44 47 49 51 44 47 50 53
#OUT> [47] 55 57 50 53 56 59 61 63 56 59 62 65 67 69

all(x + y == rep(x, 10) + y)

#OUT> [1] TRUE

identical(x + y, rep(x, 10) + y)

#OUT> [1] TRUE

# ?any
# ?all.equal

12.2 Calculations with Vectors and Matrices

Certain operations in R, for example %*% have different behavior on vectors and matrices. To illustrate this, we will first create two vectors.

a_vec = c(1, 2, 3)
b_vec = c(2, 2, 2)

Note that these are indeed vectors. They are not matrices.

c(is.vector(a_vec), is.vector(b_vec))

#OUT> [1] TRUE TRUE

c(is.matrix(a_vec), is.matrix(b_vec))

#OUT> [1] FALSE FALSE

When this is the case, the %*% operator is used to calculate the dot product, also know as the inner product of the two vectors.

The dot product of vectors \(\boldsymbol{a} = \lbrack a_1, a_2, \cdots a_n \rbrack\) and \(\boldsymbol{b} = \lbrack b_1, b_2, \cdots b_n \rbrack\) is defined to be

\[ \boldsymbol{a} \cdot \boldsymbol{b} = \sum_{i = 1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \cdots a_n b_n. \]

a_vec %*% b_vec # inner product

#OUT>      [,1]
#OUT> [1,]   12

a_vec %o% b_vec # outer product

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    2    2    2
#OUT> [2,]    4    4    4
#OUT> [3,]    6    6    6

The %o% operator is used to calculate the outer product of the two vectors.

When vectors are coerced to become matrices, they are column vectors. So a vector of length \(n\) becomes an \(n \times 1\) matrix after coercion.

as.matrix(a_vec)

#OUT>      [,1]
#OUT> [1,]    1
#OUT> [2,]    2
#OUT> [3,]    3

If we use the %*% operator on matrices, %*% again performs the expected matrix multiplication. So you might expect the following to produce an error, because the dimensions are incorrect.

as.matrix(a_vec) %*% b_vec

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    2    2    2
#OUT> [2,]    4    4    4
#OUT> [3,]    6    6    6

At face value this is a \(3 \times 1\) matrix, multiplied by a \(3 \times 1\) matrix. However, when b_vec is automatically coerced to be a matrix, R decided to make it a “row vector”, a \(1 \times 3\) matrix, so that the multiplication has conformable dimensions.

If we had coerced both, then R would produce an error.

as.matrix(a_vec) %*% as.matrix(b_vec)

Another way to calculate a dot product is with the crossprod() function. Given two vectors, the crossprod() function calculates their dot product. The function has a rather misleading name.

crossprod(a_vec, b_vec)  # inner product

#OUT>      [,1]
#OUT> [1,]   12

tcrossprod(a_vec, b_vec)  # outer product

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    2    2    2
#OUT> [2,]    4    4    4
#OUT> [3,]    6    6    6

These functions could be very useful later. When used with matrices \(X\) and \(Y\) as arguments, it calculates

\[ X^\top Y. \]

When dealing with linear models, the calculation

\[ X^\top X \]

is used repeatedly.

C_mat = matrix(c(1, 2, 3, 4, 5, 6), 2, 3)
D_mat = matrix(c(2, 2, 2, 2, 2, 2), 2, 3)

This is useful both as a shortcut for a frequent calculation and as a more efficient implementation than using t() and %*%.

crossprod(C_mat, D_mat)

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    6    6    6
#OUT> [2,]   14   14   14
#OUT> [3,]   22   22   22

t(C_mat) %*% D_mat

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    6    6    6
#OUT> [2,]   14   14   14
#OUT> [3,]   22   22   22

all.equal(crossprod(C_mat, D_mat), t(C_mat) %*% D_mat)

#OUT> [1] TRUE

crossprod(C_mat, C_mat)

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    5   11   17
#OUT> [2,]   11   25   39
#OUT> [3,]   17   39   61

t(C_mat) %*% C_mat

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    5   11   17
#OUT> [2,]   11   25   39
#OUT> [3,]   17   39   61

all.equal(crossprod(C_mat, C_mat), t(C_mat) %*% C_mat)

#OUT> [1] TRUE

12.3 Matrices

Z = matrix(c(9, 2, -3, 2, 4, -2, -3, -2, 16), 3, byrow = TRUE)
Z

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    9    2   -3
#OUT> [2,]    2    4   -2
#OUT> [3,]   -3   -2   16

solve(Z)

#OUT>             [,1]        [,2]       [,3]
#OUT> [1,]  0.12931034 -0.05603448 0.01724138
#OUT> [2,] -0.05603448  0.29094828 0.02586207
#OUT> [3,]  0.01724138  0.02586207 0.06896552

To verify that solve(Z) returns the inverse, we multiply it by Z. We would expect this to return the identity matrix, however we see that this is not the case due to some computational issues. However, R also has the all.equal() function which checks for equality, with some small tolerance which accounts for some computational issues. The identical() function is used to check for exact equality.

solve(Z) %*% Z

#OUT>              [,1]          [,2]         [,3]
#OUT> [1,] 1.000000e+00 -6.245005e-17 0.000000e+00
#OUT> [2,] 8.326673e-17  1.000000e+00 5.551115e-17
#OUT> [3,] 2.775558e-17  0.000000e+00 1.000000e+00

diag(3)

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    1    0    0
#OUT> [2,]    0    1    0
#OUT> [3,]    0    0    1

all.equal(solve(Z) %*% Z, diag(3))

#OUT> [1] TRUE

R has a number of matrix specific functions for obtaining dimension and summary information.

X = matrix(1:6, 2, 3)
X

#OUT>      [,1] [,2] [,3]
#OUT> [1,]    1    3    5
#OUT> [2,]    2    4    6

dim(X)

#OUT> [1] 2 3

rowSums(X)

#OUT> [1]  9 12

colSums(X)

#OUT> [1]  3  7 11

rowMeans(X)

#OUT> [1] 3 4

colMeans(X)

#OUT> [1] 1.5 3.5 5.5

The diag() function can be used in a number of ways. We can extract the diagonal of a matrix.

diag(Z)

#OUT> [1]  9  4 16

Or create a matrix with specified elements on the diagonal. (And 0 on the off-diagonals.)

diag(1:5)

#OUT>      [,1] [,2] [,3] [,4] [,5]
#OUT> [1,]    1    0    0    0    0
#OUT> [2,]    0    2    0    0    0
#OUT> [3,]    0    0    3    0    0
#OUT> [4,]    0    0    0    4    0
#OUT> [5,]    0    0    0    0    5

Or, lastly, create a square matrix of a certain dimension with 1 for every element of the diagonal and 0 for the off-diagonals.

diag(5)

#OUT>      [,1] [,2] [,3] [,4] [,5]
#OUT> [1,]    1    0    0    0    0
#OUT> [2,]    0    1    0    0    0
#OUT> [3,]    0    0    1    0    0
#OUT> [4,]    0    0    0    1    0
#OUT> [5,]    0    0    0    0    1

Chapter 12 Advanced R

12.1 More Vectorization

12.2 Calculations with Vectors and Matrices

12.3 Matrices

Chapter 12 Advanced `R`