1.4 GSUB() and SUB()
Hello
and welcome back to the advanced course on R programming. In this tutorial, we
are going to deal with factor variables, the revenue and expenses ones and
learn how to convert them into non-factor variables.
because as we can see, they are just numeric variables which
contain information on dollar amount but it is recognized as factor, same with
revenue and growth.It is mainly due to presence of word 'Dollars'
in variable Expenses, presence of sign '$' and '%'
in variables Revenue and Growth respectively.We have to convert it into
numeric, but for that we will function SUB() and GSBU().
We
will get the details on executing following command
So,
what does these functions do is that they look for pattern and replace it with
the desired pattern. The difference between SUB() and GSUB() is that, sub()
replaces just the first instance but gsub() replaces all the instances..
Hence, lets go ahead and try these functions out.
We
will start with the expenses column in which we want to replace "
Dollars" (make a note of empty space before word Dollars) with
nothing.
Hence, lets run the following line.
and
you will see that, the Dollars is removed from the column Expenses.
Now
we have to replace the commas in the same column. Repeat the process similarly
Now,
you can see that expenses has no longer commas. Lets check the str() on fin
again.
We
will see that, expenses is no longer a factor but it is of type character now.
Now
lets deal with the variable Revenue. We will use the same gusb() as before with
a slight change.
Please note that, '$' is itself a special character, so to
make R recognize this sign as a part of a value in variable we use escape
sequence, which is two backslashes in the variable Revenue.
Let's
remove the commas now. It will also be converted into the type Character.
We
will have to repeat the same process for Growth variable as well.
Now
we have all three variables in the type character, now we can very easily
convert them in numeric with the function as.numeric().
Now, these three variables are being actually recognized as
numeric, which is exactly what we wanted.
Here
is the complete code.
Comments
Post a Comment