1.3 The Factor Variable Trap
The Factor Variable Trap or the FVT comes into play when we ate
trying to convert a variable from factor to non-factor. It is a known
phenomenon, but isn’t very well publicized.
Let’s create a vector named a with the values “12“,“13“,“14“,“12”, “12”. (five values all in quotation marks.

The values are in character because of double quotation marks, we can verify this with the function typeof()
Now let’s convert this vector into type integer with the function an.numeric()

So, the above code was to convert characters into numeric. but how to convert factors into numeric??
For this, let’s create a factor Z which contains exactly same values as of vector a.
When we run the above command, output is shown without quotation
marks and levels are also displayed. Thus R is recognizing it as categories.
Now let’s convert it into numeric, as done before and save
it in vector Y to see the output.
OOPS!!! What happened here??? This is completely different
from what we wanted.
Type of Y is also “double”
What happened here is that we picked up actual factorization
of variable rather than the factors themselves.
Let’s have a look at type of Z, we will see that it is “integer”
Thus, Z is seen as integer, but this doesn’t mean that
whatever it contains is also an integer. The integer that is inside Z is the
factorization integer. It considers 12 as of category 1, 13 as of category 2 and 14 as of category 3. It doesn't matter what we have in vector Z, it will be considered as categorical only. Therefore, when we try to convert it using numeric function, we will get the categories which are coded as 1, 2,3, etc.
That is the essence of Factor Variable Trap, that the programmers who are not aware of this, can easily fall into this trap.
So, how do we convert this kind of categorical variable into numeric. The correct way to do this is to convert Z into character first and then into numeric. Like below.
That is the essence of Factor Variable Trap, that the programmers who are not aware of this, can easily fall into this trap.
So, how do we convert this kind of categorical variable into numeric. The correct way to do this is to convert Z into character first and then into numeric. Like below.
You can see that values 12,13,14,12,12 have been released. So from a factor they turned into a character and from a character they turned into a numeric. Also you can check for its type and it will be a double.
Therefore, keep this in mind, that you cannot convert something that is a number that is recognized as a factor, because you will end up using categories numbers instead of actual numbers. Hence, you have it to character and then convert it into a numeric.
Lets, Play around with this, create your own vectors and try experimenting on it.
Comments
Post a Comment