1.15 Visualizing Results

Until now we were working on the data cleaning part which is stated in the project brief that I shared in post number 1.1, which said "Note that the dataset has numerous discrepancies that need to be addressed before analysis can be performed.Have a look at the data here by downloading the data."
So we are done with that process. Now we will proceed with the analysis and plotting the graphs and scatterplot.
For visualization we need to install the package names "ggplot2" using following command and then call it by library() function.

If you already have installed the "ggplot2" package, then you just have to call it using the function library(ggplot2).
Let's set up the plot using this function.

We also to add the layer of geometry using the function geom_point and put aesthetics into it. Revenue would be on X axis and Expenses on Y axis, the profit would a size parameter. We can also segregate the industries by assigning different color to each one. The complete command looks like this.

The output of above command is to our first task in project brief i.e. "A scatter-plot classified by industry showing revenue, expenses, profit"

This was the output for our first task in project brief i.e. "A scatter-plot classified by industry showing revenue, expenses, profit".
The next task is to plot "A scatter-plot that includes industry trends for the expenses~revenue relationship. We can do this by creating a base plot with predefined aesthetics, Revenue on X axis and Expenses on Y axis and colour by Industry. After that we can put the plot these points and then apply a smoother which shows us the trends.

The output looks like this

We can make some conclusions about some industries and their trends in terms of revenues and expenses.
The last plot required us to plot a box-plots showing growth by industry. Here we need industry to be on X axis and Growth on y axis. Then we can plot the box-plot using following command.

The output looks like this

The complete code for visualization is as follows

This completes our visualization and tasks for project brief stated in post 1.1.

So the Total recall until now from post 1.1 to 1.15 is as below:

Factor Variable Trap (when factors need to converted to numerics, then convert it into character and then into numerics)
How to use SUB() and GSUB() for cleaning data
Methods for dealing with missing data
NA-the 3rd logical constant
How to locate missing data using complete.cases()
Filtering techniques: which() and is.na()
Median Imputation Method
Factual Analysis and Deriving values method
Visualization

Simplified Analytics

Search This Blog

1.15 Visualizing Results

1.15 Visualizing Results

Labels

Comments

Post a Comment

Popular posts from this blog

1.9 Removing Records with Missing Data

1.3 The Factor Variable Trap

Archives