Chapter 12 Advanced R
This chapter continues with some advanced usage examples from chapter 1
12.1 More Vectorization
#OUT> [1] 3 5 7 9 10 11
#OUT> [1] 3 5 7 9 10 11
#OUT> [1] FALSE FALSE TRUE TRUE TRUE TRUE
#OUT> [1] FALSE FALSE TRUE TRUE TRUE TRUE
#OUT> Warning in x + y: longer object length is not a multiple of shorter object
#OUT> length
#OUT> [1] 2 5 8 11 13 15 8 11 14 17 19 21 14 17 20 23 25
#OUT> [18] 27 20 23 26 29 31 33 26 29 32 35 37 39 32 35 38 41
#OUT> [35] 43 45 38 41 44 47 49 51 44 47 50 53 55 57 50 53 56
#OUT> [52] 59 61 63 56 59 62 65 67 69 62 65 68 71 73 75 68 71
#OUT> [69] 74 77 79 81 74 77 80 83 85 87 80 83 86 89 91 93 86
#OUT> [86] 89 92 95 97 99 92 95 98 101 103 105 98 101 104 107
#OUT> [1] 6
#OUT> [1] 100
#OUT> [1] 16.66667
#OUT> Warning in x + y: longer object length is not a multiple of shorter object
#OUT> length
#OUT> [1] 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8
#OUT> [36] 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7
#OUT> [71] 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7 8 9 1 3 5 7
#OUT> [1] 2 5 8 11 13 15 8 11 14 17 19 21 14 17 20 23 25 27 20 23 26 29 31
#OUT> [24] 33 26 29 32 35 37 39 32 35 38 41 43 45 38 41 44 47 49 51 44 47 50 53
#OUT> [47] 55 57 50 53 56 59 61 63 56 59 62 65 67 69
#OUT> [1] 10
#OUT> [1] 2 5 8 11 13 15 8 11 14 17 19 21 14 17 20 23 25 27 20 23 26 29 31
#OUT> [24] 33 26 29 32 35 37 39 32 35 38 41 43 45 38 41 44 47 49 51 44 47 50 53
#OUT> [47] 55 57 50 53 56 59 61 63 56 59 62 65 67 69
#OUT> [1] TRUE
#OUT> [1] TRUE
12.2 Calculations with Vectors and Matrices
Certain operations in R
, for example %*%
have different behavior on vectors and matrices. To illustrate this, we will first create two vectors.
Note that these are indeed vectors. They are not matrices.
#OUT> [1] TRUE TRUE
#OUT> [1] FALSE FALSE
When this is the case, the %*%
operator is used to calculate the dot product, also know as the inner product of the two vectors.
The dot product of vectors \(\boldsymbol{a} = \lbrack a_1, a_2, \cdots a_n \rbrack\) and \(\boldsymbol{b} = \lbrack b_1, b_2, \cdots b_n \rbrack\) is defined to be
\[ \boldsymbol{a} \cdot \boldsymbol{b} = \sum_{i = 1}^{n} a_i b_i = a_1 b_1 + a_2 b_2 + \cdots a_n b_n. \]
#OUT> [,1]
#OUT> [1,] 12
#OUT> [,1] [,2] [,3]
#OUT> [1,] 2 2 2
#OUT> [2,] 4 4 4
#OUT> [3,] 6 6 6
The %o%
operator is used to calculate the outer product of the two vectors.
When vectors are coerced to become matrices, they are column vectors. So a vector of length \(n\) becomes an \(n \times 1\) matrix after coercion.
#OUT> [,1]
#OUT> [1,] 1
#OUT> [2,] 2
#OUT> [3,] 3
If we use the %*%
operator on matrices, %*%
again performs the expected matrix multiplication. So you might expect the following to produce an error, because the dimensions are incorrect.
#OUT> [,1] [,2] [,3]
#OUT> [1,] 2 2 2
#OUT> [2,] 4 4 4
#OUT> [3,] 6 6 6
At face value this is a \(3 \times 1\) matrix, multiplied by a \(3 \times 1\) matrix. However, when b_vec
is automatically coerced to be a matrix, R
decided to make it a “row vector”, a \(1 \times 3\) matrix, so that the multiplication has conformable dimensions.
If we had coerced both, then R
would produce an error.
Another way to calculate a dot product is with the crossprod()
function. Given two vectors, the crossprod()
function calculates their dot product. The function has a rather misleading name.
#OUT> [,1]
#OUT> [1,] 12
#OUT> [,1] [,2] [,3]
#OUT> [1,] 2 2 2
#OUT> [2,] 4 4 4
#OUT> [3,] 6 6 6
These functions could be very useful later. When used with matrices \(X\) and \(Y\) as arguments, it calculates
\[ X^\top Y. \]
When dealing with linear models, the calculation
\[ X^\top X \]
is used repeatedly.
This is useful both as a shortcut for a frequent calculation and as a more efficient implementation than using t()
and %*%
.
#OUT> [,1] [,2] [,3]
#OUT> [1,] 6 6 6
#OUT> [2,] 14 14 14
#OUT> [3,] 22 22 22
#OUT> [,1] [,2] [,3]
#OUT> [1,] 6 6 6
#OUT> [2,] 14 14 14
#OUT> [3,] 22 22 22
#OUT> [1] TRUE
#OUT> [,1] [,2] [,3]
#OUT> [1,] 5 11 17
#OUT> [2,] 11 25 39
#OUT> [3,] 17 39 61
#OUT> [,1] [,2] [,3]
#OUT> [1,] 5 11 17
#OUT> [2,] 11 25 39
#OUT> [3,] 17 39 61
#OUT> [1] TRUE
12.3 Matrices
#OUT> [,1] [,2] [,3]
#OUT> [1,] 9 2 -3
#OUT> [2,] 2 4 -2
#OUT> [3,] -3 -2 16
#OUT> [,1] [,2] [,3]
#OUT> [1,] 0.12931034 -0.05603448 0.01724138
#OUT> [2,] -0.05603448 0.29094828 0.02586207
#OUT> [3,] 0.01724138 0.02586207 0.06896552
To verify that solve(Z)
returns the inverse, we multiply it by Z
. We would expect this to return the identity matrix, however we see that this is not the case due to some computational issues. However, R
also has the all.equal()
function which checks for equality, with some small tolerance which accounts for some computational issues. The identical()
function is used to check for exact equality.
#OUT> [,1] [,2] [,3]
#OUT> [1,] 1.000000e+00 -6.245005e-17 0.000000e+00
#OUT> [2,] 8.326673e-17 1.000000e+00 5.551115e-17
#OUT> [3,] 2.775558e-17 0.000000e+00 1.000000e+00
#OUT> [,1] [,2] [,3]
#OUT> [1,] 1 0 0
#OUT> [2,] 0 1 0
#OUT> [3,] 0 0 1
#OUT> [1] TRUE
R
has a number of matrix specific functions for obtaining dimension and summary information.
#OUT> [,1] [,2] [,3]
#OUT> [1,] 1 3 5
#OUT> [2,] 2 4 6
#OUT> [1] 2 3
#OUT> [1] 9 12
#OUT> [1] 3 7 11
#OUT> [1] 3 4
#OUT> [1] 1.5 3.5 5.5
The diag()
function can be used in a number of ways. We can extract the diagonal of a matrix.
#OUT> [1] 9 4 16
Or create a matrix with specified elements on the diagonal. (And 0
on the off-diagonals.)
#OUT> [,1] [,2] [,3] [,4] [,5]
#OUT> [1,] 1 0 0 0 0
#OUT> [2,] 0 2 0 0 0
#OUT> [3,] 0 0 3 0 0
#OUT> [4,] 0 0 0 4 0
#OUT> [5,] 0 0 0 0 5
Or, lastly, create a square matrix of a certain dimension with 1
for every element of the diagonal and 0
for the off-diagonals.
#OUT> [,1] [,2] [,3] [,4] [,5]
#OUT> [1,] 1 0 0 0 0
#OUT> [2,] 0 1 0 0 0
#OUT> [3,] 0 0 1 0 0
#OUT> [4,] 0 0 0 1 0
#OUT> [5,] 0 0 0 0 1