Skip to main content

1.15 Visualizing Results

1.15 Visualizing Results

Until now we were working on the data cleaning part which is stated in the project brief that I shared in post number 1.1, which said "Note that the dataset has numerous discrepancies that need to be addressed before analysis can be performed.Have a look at the data here by downloading the data."
So we are done with that process. Now we will proceed with the analysis and plotting the graphs and scatterplot.
For visualization we need to install the package names "ggplot2" using following command and then call it by library() function.

If you already have installed the "ggplot2" package, then you just have to call it using the function library(ggplot2).
Let's set up the plot using this function.

We also to add the layer of geometry using the function geom_point and put aesthetics into it. Revenue would be on X axis and Expenses on Y axis, the profit would a size parameter. We can also segregate the industries by assigning different color to each one. The complete command looks like this. 

The output of above command is to our first task in project brief i.e. "A scatter-plot classified by industry showing revenue, expenses, profit"

This was the output for our first task in project brief i.e. "A scatter-plot classified by industry showing revenue, expenses, profit".
The next task is to plot "A scatter-plot that includes industry trends for the expenses~revenue relationship. We can do this by creating a base plot with predefined aesthetics, Revenue on X axis and Expenses on Y axis and colour by Industry. After that we can put the plot these points and then apply a smoother which shows us the trends.
The output looks like this
We can make some conclusions about some industries and their trends in terms of revenues and expenses.
The last plot required us to plot a box-plots showing growth by industry. Here we need industry to be on X axis and Growth on y axis. Then we can plot the box-plot using following command.


The output looks like this



The complete code for visualization is as follows
 
This completes our visualization and tasks for project brief stated in post 1.1.
So the Total recall until now from post 1.1 to 1.15 is as below:
  • Factor Variable Trap (when factors need to converted to numerics, then convert it into character and then into numerics)
  • How to use SUB() and GSUB() for cleaning data
  • Methods for dealing with missing data
  • NA-the 3rd logical constant
  • How to locate missing data using complete.cases()
  • Filtering techniques: which() and is.na()
  • Median Imputation Method
  • Factual Analysis and Deriving values method
  • Visualization

Comments

Popular posts from this blog

1.9 Removing Records with Missing Data

1.9 Removing Records with Missing Data In the post number 1.5 Dealing with missing data, we saw various methods and lets implement few of those in this tutorial. First of all let's have a look at the CSV file. We see that, we have decided the option of removing rows where values in Industry column are missing. Before proceeding to the R, I would suggest you to always make a back up of the data so that in case you do any mistake in between you always have the original data to start again. Let's create the backup of our fin dataset. . And this one line can save us a lot of trouble. Now, let's find out all of the rows that have empty value in any of the column. We see two rows where values in Industry column is missing. Let's single out these rows using is.na() So we got two rows with ID 14, 15 where value in Industry column is missing. Now to remove these two rows, we just do the opposite and find out the rows which don't have NA in them and assign it b...

1.3 The Factor Variable Trap

1.3 The Factor Variable Trap The Factor Variable Trap or the FVT comes into play when we ate trying to convert a variable from factor to non-factor. It is a known phenomenon, but isn’t very well publicized. Let’s create a vector named a with the values “12“,“13“,“14“,“12”, “12”. (five values all in quotation marks. The values are in character because of double quotation marks, we can verify this with the function typeof() Now let’s convert this vector into type integer with the function an.numeric() So, the above code was to convert characters into numeric. but how to convert factors into numeric?? For this, let’s create a factor Z which contains exactly same values as of vector a.   When we run the above command, output is shown without quotation marks and levels are also displayed. Thus R is recognizing it as categories. Now let’s convert it into numeric, as done before and save it in vector Y to see the output. OOPS!!! What happ...
1.8 Data Filters: is.na() for Missing Data In previous post we have learned how to filter data for non missing data. In this one, we will learn how to filter out missing data using is.na(). Let's look at first 24 rows using head() to see the missing values. Just like previous post, if we use the same logic we get NA. Thus, it is not helping at all. The other way to tackle this is is.na(). This function checks if the value contained is NA or not. We try this function, by creating a vector named "a" putting some NAs in it and checking it with is.na(). It gives the value FALSE if its not NA and TRUE if it contains NA. We will use the similar function for our dataset to find out NA in Revenue column. It correctly identifies the values in Revenue column which are equal to NAs. Try to implement it in other columns as well and find out the rows which contains missing values .