Getting Started with R - Part 5: Matrices - Creating, Filling and Subsetting

6 minute read

Often we have the need to arrange data in a structure with rows and columns, or we need to represent a collection of vectors. R has a built-in matrix type that allows us to create a basic structure of rows and columns

I am posting this tutorial as I learn R. I will respond to feedback for errata in the comments.

Constructing a matrix with the matrix function

Before we start, lets look at the basics in the online help by executing ?matrix. Doing that reveals the signature of the function

matrix(data = NA, nrow = 1, ncol = 1, byrow = FALSE,
       dimnames = NULL)

Reading further in the help you’ll see we can supply a vector for the data, or use a type that can be coerced to a vector. We can specify the number of rows and columns and how the vector should be used to fill the matrix, and we can supply dimension names (which I’ll cover a bit later). For now let us create a matrix

If we simply call matrix() we will get all the default parameters as specified with the = operator in definition of the help file. The = is very similar to the <- operator in that it is an assignment operator, but it can only be used in specific scope. So our simple matrix() call will create a 1 by 1 matrix with an NA value in it.

matrix()

yields

     [,1]
[1,]   NA

What on earth are those commas? Let us create a matrix with 1 row and 3 columns. You can see I can assign select variable like this variable_name=value, without the need to list them all in order.

Note: Here we use the = assignment operator to assign to a parameter, its scope in this case is limited to the call. We cannot get access to the value at the top level, by calling ncol after the the matrix function.

vectorlike_matrix <- matrix(ncol=3)
vectorlike_matrix

yields

     [,1] [,2] [,3]
[1,]   NA   NA   NA

Now we have some more clarity. Matrix values are indexed by rows then columns in this fashion [row,column]. To make this more clear let us add another row. To help us I need to introduce some new functions

rbind() and cbind()

Simply put these add rows or columns to a matrix, or can combine matrices. lets add combine the vectorlike_matrix above with itself to create a two by three matrix.

a_proper_matrix <- rbind(vectorlike_matrix, vectorlike_matrix)
a_proper_matrix

yields what we expect:

     [,1] [,2] [,3]
[1,]   NA   NA   NA
[2,]   NA   NA   NA

rbind and cbind are so similar that the online help shares a single entry for them. I’ll let you explore cbind by yourself.

Filling our matrix

When we created our matrix we didn’t specify data at create time. We could have done so using a vector that would be unpacked row-wise or column-wise depending on the byrow parameter. The byrow parameter is FALSE if not specified so let us see what that looks like. Also, if we provide data we only need to specify either the column or row count. No need to do extra work

two_by_three <- matrix(1:6, nrow=2)
two_by_three

We can see that the data was filled by column as expected

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Basic indexing

The [] operator on a matrix looks suspiciously like the one we had for vectors so lets try it out. Try indexing from 1 to 6 on our two_by_three and you will see that we get our values by row, without specifying two coordinates. For instance two_by_three[4] yields 4. In case you are wondering, this has nothing to do with the byrow option during contruction. Single indexing is by row, regardless of how the data was unpacked. This is great news, there are many cases where the two coordinate indexing of matrices just get in the way. But, most of the time its still easier to index by row and column.

 two_by_three[2,3]

returns the value 6 as expected. But what if we want to work with rows or columns and treat them as vectors? Our output above describing the matrix already revealed how this is done : simply ommit the dimension you want to include completely, for instance [,2] gives me all rows, and only column two. This is similar to what we saw in our previous lesson when we tried to set the entire vector’s values. We wanted to set the entire vector’s values without changing the vector, so we used empty brackets [] to select all values to be updated. We can do the same here

Say we want to fill our matrix, we can use the rep() function to repeat a value…

 zero_matrix <- matrix(rep(0.0, 12), nrow=3) 

…or, we could have set all the values later with like this

 fill_me_up <- matrix(nrow=3, ncol=4)
 fill_me_up[] <- 0
 fill_me_up

It would have been perfectly legal to use fill_me_up[,] <- 0. We can extend this idea to setting entire rows or columns as vectors:

 fill_me_up[,1] <- 1
 fill_me_up

will set the entire first column as 1.

We can also set a specific cell with basic indexing as you’d expect

 fill_me_up[2,1] <- 2
 fill_me_up

So far I’ve mainly shown the replace side of the extract \ replace [] operator. You can extract columns, rows, and cells similar to our replace operation. The only difference is that instead of setting values we yield them or assign them to other variables.

 row_one <- fill_me_up[1,]
 column_two <- fill_me_up[,2]
 cell_3_2 <- fill_me_up[3, 2]
 fill_me_up[2, 2]

Using the extract \ replace operator beyond just indexing

In Part 4: Vector Extracting, Replacing and Excluding, we saw that we can use combined integer indexes, for example myvector[c(1,3)] would select the first and third element. We can do the same here. To prevent repeated typing of the same value We can use the rep() function.

three_by_three_identity_matrix <- matrix(rep(0.0, 9), nrow=3)
three_by_three_identity_matrix[c(1,5,9)] <-  1.0
three_by_three_identity_matrix

Here you can see an excelent example of how the single indexing of a matrix helps us… lets generalize that matrix a bit. I will use the seq() function to generate a sequence of numbers with a step.

matrix_dim <- 10  
num_cells <- matrix_dim ^ 2
identity_matrix <- matrix(rep(0.0, num_cells), nrow=matrix_dim)
identity_matrix[seq(1, num_cells, matrix_dim + 1)] <-  1.0
identity_matrix

We are not restricted to using only single dimension syntax. We can apply index vector to the row or column side of the , inside the matrix [] operator. For instance

matrix_dim <- 10  
x_matrix <- matrix(ncol=matrix_dim, nrow=matrix_dim)
x_matrix[] <- " "
x_matrix[c(3:5, 9), c(3, 7:9)] <- "X"
x_matrix[, c(1, 10)] <- "|"
x_matrix

Gives us this output

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,] "|"  " "  " "  " "  " "  " "  " "  " "  " "  "|"  
 [2,] "|"  " "  " "  " "  " "  " "  " "  " "  " "  "|"  
 [3,] "|"  " "  "X"  " "  " "  " "  "X"  "X"  "X"  "|"  
 [4,] "|"  " "  "X"  " "  " "  " "  "X"  "X"  "X"  "|"  
 [5,] "|"  " "  "X"  " "  " "  " "  "X"  "X"  "X"  "|"  
 [6,] "|"  " "  " "  " "  " "  " "  " "  " "  " "  "|"  
 [7,] "|"  " "  " "  " "  " "  " "  " "  " "  " "  "|"  
 [8,] "|"  " "  " "  " "  " "  " "  " "  " "  " "  "|"  
 [9,] "|"  " "  "X"  " "  " "  " "  "X"  "X"  "X"  "|"  
[10,] "|"  " "  " "  " "  " "  " "  " "  " "  " "  "|"  

Extracting a subset from a matrix is again similar to replacing values. We either yield the value or assign it. as we saw above.

 double_x_pipe <- x_matrix[3:4, 9:10]
 double_x_pipe

shows that double_x_pipe is a nice sub-matrix

     [,1] [,2]
[1,] "X"  "|" 
[2,] "X"  "|" 

But, look at the rows and columns. The indices are based on their ordinal positions and not on the original source. We will address this next.

Leave a Comment