Combining the length() and which() commands gives a handy method of counting elements that meet particular criteria.
b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
b
Let’s count the 3s in the vector b.
count3 <- length(which(b == 3))
count3
[1] 4
In fact, you can count the number of elements that satisfy almost any given condition.
length(which(b < 7))
[1] 9
Here is an alternative approach, also using the length() command, but also using square brackets for sub-setting:
length(b[ b < 7 ])
[1] 9
The square brackets allow us to subset. For such operations using square brackets, I like to use the words “such that”. Here, we have the elements of b, such that the elements are less than 7.
R provides another alternative that not everyone knows about
sum(b < 7)
[1] 9
This syntax gives a count rather than a sum. Be aware of the meaning of syntax like sum(b < 7). Both work on logical vectors whose elements are either TRUE or FALSE. Try entering b <- 7 at the keyboard.
b < 7
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE
We see that sum(b < 7) counts the number of elements that are TRUE. There are nine such elements.
Now try:
mean(b < 7)
[1] 0.6923077
That syntax found the proportion of elements meeting the criterion rather than the mean. Again, if you use the sum() and mean() function you must be very careful to ensure that your output is what you intended. Note that sum(), length() and length(which()) all provide mechanisms for counting elements.
Now find the percentage of 7s in b.
P7 <- 100 * length(which(b == 7)) / length(b)
P7
[1] 15.38462
extension example
You can find counts and percentages using functions that involve length(which()). Here we create two functions; one for finding counts, and the other for
calculating percentages.
count <- function(x, n){ length((which(x == n))) }
perc <- function(x, n){ 100*length((which(x == n))) / length(x) }
Note the syntax involved in setting up a function in R. Now let’s use the count function to count the threes in the vector b.
count(b, 3)
[1] 4
perc(b, 4)
[1] 7.692308
That wasn’t so hard! In our next blog post we’ll discuss counting values within cases.
About the Author: David Lillis has taught R to many researchers and statisticians. His company, Sigma Statistics and Research Limited, provides both on-line instruction and face-to-face workshops on R, and coding services in R. David holds a doctorate in applied statistics.
See our full R Tutorial Series and other blog posts regarding R programming.
Rob Baer says
Missing Values
Just a note on using length() on a whole vector that includes NA. The missing values are counted in the whole vector length when using the length() function.
b <- c(7, 2, 4, 3, -1, -2, 3, 3, 6, 8, 12, 7, 3)
b1 <- c(b, NA)
length(b)
length(b1
sd(b)
sd(b1, na.rm = TRUE)
# If you want want an "n" to go with the sd for b1, don't use length().
(n = sum(!is.na(b))) #13
(n = sum(!is.na(b1))) # 13
Gastón says
Thnak you!!! I spent a lot time trying to get some instruction with this issue!
Sebas says
Hi. How can I set the dimmensions of a matrix in 2 different variables instead of a vector?
Nathalie says
I am stucked with a string counting issue and could not find any helpful post so far maybe someone here can help me:
I have a string variable tours in my dataframe df that represents the different stops an individuum did during a journey.
For example:
1. home_work_leisure_home
2. home_work_shopping_work_home
3. home_work_leisure_errand_home
In Transport planning we group activities in primary (work and education) and secondary activities (everything else). I want to count the number of secondary activities before the first primary activity, inbetween two primary activities after the last primary activity for each tour.
This means I am looking for a function in R that:
a. identifies the first work in the string variable,
b. then counts the number of activities before this first work activity
c. then identifies the last work in the string if there is more than one
d. if there is then count the number of activities between the two work activities,
e. then count the number of activities after the last work activity
The result for the three example tours then would be:
1.number of activities before first primary: 1 (home)
number of activities between first and last primary: 0
number of activities after last primary: 2 (leisure & home)
number of primary activities: 1 (work)
2.number of activities before first primary: 1 (home)
number of activities between first and last primary: 1 (shopping)
number of activities after last primary: 1 (home)
number of primary activities: 2 (work)
3.number of activities before first primary: 1 (home)
number of activities between first and last primary: 0
number of activities after last primary: 3 (leisure, errand & home)
number of primary activities: 1 (work)
I would be super thankful if someone could give me a hand with this issue – even if it is a link to a similar question.
Tank you. Kind regards N
Karen Grace-Martin says
Nathalie,
I’m not the R expert, but I’ve done a lot of this kind of thing in other software. It sounds like this will be a multi-step process. The very first thing you need to do is split this into multiple variables.
Pranjit Sarmah says
obj<-function(x,y,x_cat, y_val){
xx<-which(x==x_cat)
yy<-which(y==y_val)
return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
}
bhuvanesh says
how to provide more than 1 no. in which filter
Karol says
Hi,
I have a data something like this:
X Y
A 1
A 2
B 1
B 2
B 3
C 1
…
I meen – X variable is a fator o k categories length and Y is a continous variable.
I’d like to compute a vector (let’s say Z) counting which observation of X (in each category) is Y… Something like ID for each category of X. Can You please give me some tip?
Thank You in advanced!
Karol
Carla says
Hi Karol, did you found a solution? I’m in the same situation :/
Cheers/Carla
Pranjit Sarmah says
obj<-function(x,y,x_cat, y_val){
xx<-which(x==x_cat)
yy<-which(y==y_val)
return(xx[xx %in% yy]) ## will return the index of observation for which x_cat ##has observation value y_val
}