Getting Started with R - Part 5: Matrices - Creating, Filling and Subsetting
Often we have the need to arrange data in a structure with rows and columns, or we need to represent a collection of vectors. R has a built-in matrix type that allows us to create a basic structure of rows and columns
I am posting this tutorial as I learn R. I will respond to feedback for errata in the comments.
Constructing a matrix with the matrix
function
Before we start, lets look at the basics in the online help by executing ?matrix
. Doing that reveals the signature of the function
matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
dimnames = NULL)
Reading further in the help you’ll see we can supply a vector for the data, or use a type that can be coerced to a vector. We can specify the number of rows and columns and how the vector should be used to fill the matrix, and we can supply dimension names (which I’ll cover a bit later). For now let us create a matrix
If we simply call matrix()
we will get all the default parameters as specified with the =
operator in definition of the help file. The =
is very similar to the <-
operator in that it is an assignment operator, but it can only be used in specific scope. So our simple matrix()
call will create a 1 by 1 matrix with an NA
value in it.
matrix()
yields
[,1]
[1,] NA
What on earth are those commas? Let us create a matrix with 1 row and 3 columns. You can see I can assign select variable like this variable_name=value
, without the need to list them all in order.
Note: Here we use the =
assignment operator to assign to a parameter, its scope in this case is limited to the call. We cannot get access to the value at the top level, by calling ncol
after the the matrix function.
vectorlike_matrix <- matrix(ncol=3)
vectorlike_matrix
yields
[,1] [,2] [,3]
[1,] NA NA NA
Now we have some more clarity. Matrix values are indexed by rows then columns in this fashion [row,column]
. To make this more clear let us add another row. To help us I need to introduce some new functions
rbind()
and cbind()
Simply put these add rows or columns to a matrix, or can combine matrices. lets add combine the vectorlike_matrix
above with itself to create a two by three matrix.
a_proper_matrix <- rbind(vectorlike_matrix, vectorlike_matrix)
a_proper_matrix
yields what we expect:
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
rbind
and cbind
are so similar that the online help shares a single entry for them. I’ll let you explore cbind
by yourself.
Filling our matrix
When we created our matrix we didn’t specify data at create time. We could have done so using a vector that would be unpacked row-wise or column-wise depending on the byrow
parameter. The byrow
parameter is FALSE if not specified so let us see what that looks like. Also, if we provide data we only need to specify either the column or row count. No need to do extra work
two_by_three <- matrix(1:6, nrow=2)
two_by_three
We can see that the data was filled by column as expected
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Basic indexing
The []
operator on a matrix looks suspiciously like the one we had for vectors so lets try it out. Try indexing from 1 to 6 on our two_by_three
and you will see that we get our values by row, without specifying two coordinates. For instance two_by_three[4]
yields 4. In case you are wondering, this has nothing to do with the byrow
option during contruction. Single indexing is by row, regardless of how the data was unpacked. This is great news, there are many cases where the two coordinate indexing of matrices just get in the way. But, most of the time its still easier to index by row and column.
two_by_three[2,3]
returns the value 6 as expected. But what if we want to work with rows or columns and treat them as vectors? Our output above describing the matrix already revealed how this is done : simply ommit the dimension you want to include completely, for instance [,2]
gives me all rows, and only column two. This is similar to what we saw in our previous lesson when we tried to set the entire vector’s values. We wanted to set the entire vector’s values without changing the vector, so we used empty brackets []
to select all values to be updated. We can do the same here
Say we want to fill our matrix, we can use the rep()
function to repeat a value…
zero_matrix <- matrix(rep(0.0, 12), nrow=3)
…or, we could have set all the values later with like this
fill_me_up <- matrix(nrow=3, ncol=4)
fill_me_up[] <- 0
fill_me_up
It would have been perfectly legal to use fill_me_up[,] <- 0
. We can extend this idea to setting entire rows or columns as vectors:
fill_me_up[,1] <- 1
fill_me_up
will set the entire first column as 1.
We can also set a specific cell with basic indexing as you’d expect
fill_me_up[2,1] <- 2
fill_me_up
So far I’ve mainly shown the replace side of the extract \ replace []
operator. You can extract columns, rows, and cells similar to our replace operation. The only difference is that instead of setting values we yield them or assign them to other variables.
row_one <- fill_me_up[1,]
column_two <- fill_me_up[,2]
cell_3_2 <- fill_me_up[3, 2]
fill_me_up[2, 2]
Using the extract \ replace operator beyond just indexing
In Part 4: Vector Extracting, Replacing and Excluding, we saw that we can use combined integer indexes, for example myvector[c(1,3)]
would select the first and third element. We can do the same here. To prevent repeated typing of the same value We can use the rep()
function.
three_by_three_identity_matrix <- matrix(rep(0.0, 9), nrow=3)
three_by_three_identity_matrix[c(1,5,9)] <- 1.0
three_by_three_identity_matrix
Here you can see an excelent example of how the single indexing of a matrix helps us… lets generalize that matrix a bit. I will use the seq()
function to generate a sequence of numbers with a step.
matrix_dim <- 10
num_cells <- matrix_dim ^ 2
identity_matrix <- matrix(rep(0.0, num_cells), nrow=matrix_dim)
identity_matrix[seq(1, num_cells, matrix_dim + 1)] <- 1.0
identity_matrix
We are not restricted to using only single dimension syntax. We can apply index vector to the row or column side of the ,
inside the matrix []
operator. For instance
matrix_dim <- 10
x_matrix <- matrix(ncol=matrix_dim, nrow=matrix_dim)
x_matrix[] <- " "
x_matrix[c(3:5, 9), c(3, 7:9)] <- "X"
x_matrix[, c(1, 10)] <- "|"
x_matrix
Gives us this output
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "|" " " " " " " " " " " " " " " " " "|"
[2,] "|" " " " " " " " " " " " " " " " " "|"
[3,] "|" " " "X" " " " " " " "X" "X" "X" "|"
[4,] "|" " " "X" " " " " " " "X" "X" "X" "|"
[5,] "|" " " "X" " " " " " " "X" "X" "X" "|"
[6,] "|" " " " " " " " " " " " " " " " " "|"
[7,] "|" " " " " " " " " " " " " " " " " "|"
[8,] "|" " " " " " " " " " " " " " " " " "|"
[9,] "|" " " "X" " " " " " " "X" "X" "X" "|"
[10,] "|" " " " " " " " " " " " " " " " " "|"
Extracting a subset from a matrix is again similar to replacing values. We either yield the value or assign it. as we saw above.
double_x_pipe <- x_matrix[3:4, 9:10]
double_x_pipe
shows that double_x_pipe
is a nice sub-matrix
[,1] [,2]
[1,] "X" "|"
[2,] "X" "|"
But, look at the rows and columns. The indices are based on their ordinal positions and not on the original source. We will address this next.
Leave a Comment