1.7 Data Filters: which() for Non-Missing Data

This post deals with how to filter out dataframes when they do contain missing values. First we will deal with non-missing data. Let's look at the data first using head() and suppose we want to filter out all the rows having revenue equal to 8567910.

For this we would use filter where fin$Revenue==8567910.

We would use the above function and get the output. Highlighted line is the desired output but it is also showing other two lines as well, Why So??
The reason for this is that the filter which we have created i.e fin$Revenue==8567910, has NAs in it. Have a look at the below posted image.

Thus, the value 8567910 is actually getting compared with NA and R cannot figure out if this filter is PASS or FAIL and that's why R tells you about it by supplying all rows that had missing value in the column which you were looking for. You might wonder, why R does not bring up the data for those two rows mentioning NA in the Revenue column, but this would imply that these rows PASSED the filter, which is actually not the case. Hence it displays NAs for the other variables as well. In this way R actually protecting us from incorrect analysis.
So, how to filter our dataset in correct way? For this we are going to use the which().

It gives us the row number which contains the value or Revenue equal to 8567910 because which() gives us the TRUE indices of a logical object, allowing for array indices. It ignores FALSE and NAs as well and only return TRUE values.
Thus, following command will return the row which contains the value of Revenue as 8567910.

Let's take another example to filter out rows where number of employees is equal to 45.

Now, perform the same operation using which().

Simplified Analytics

Search This Blog

1.7 Data Filters: which() for Non-Missing Data

1.7 Data Filters: which() for Non-Missing Data

Labels

Comments

Post a Comment

Popular posts from this blog

1.9 Removing Records with Missing Data

1.3 The Factor Variable Trap

Archives