1.1 Data Preparation

In this section we are going to learn about the process of Data Preparation. we have to focus on this section very much because it is very tedious and time consuming but most important task of all. Nearly 70% of the time while working on any project is actually spent on Data Preparation. We will learn about how to load data, how to find missing values, how to replace missing data, etc. We are going to learn it though a simple project named “Financial Review”

Project Brief:

You have been hired by the “Future 500” magazine. The stakeholders have supplied you a list of 500 companies and would like you to create some draft visualizations for their upcoming online publication.They have requested the following charts:

A scatter-plot classified by industry showing revenue, expenses, profit
A scatter-plot that includes industry trends for the expenses~revenue relationship
Box-plots showing growth by industry

Note that the dataset has numerous discrepancies that need to be addressed before analysis can be performed.Have a look at the data here by downloading the data from the link
https://drive.google.com/file/d/0B3RXs3bXRr3oR2c5cnhJZ2laejQ/view?usp=sharing

Let’s Import data in R now:-

Save the data file in particular folder in your computer. Hence we have to set the working directory as that same folder so that we can work in that folder right away.

I have saved the file in directory C:\SAM\R\SA, hence have to set the working directory as same using setwd() and getw() displays this working directory.
Read the data using following command
Reading the CSV file

This will create a data-object or dataframe named fin. You can see this in Data Pane at the right. Now let’s look at the top few rows of fin using following code
1.1.3.1

We can also have a look at bottom rows. let’s view last 10 rows.
Last few rows using the tail function

Now, let’s look at the structure of our dataframe using str(), note that it shows the data type of each column.
function str to the structure of dataset

Summary(), this function gives the summary of each column, with various details as follows.
Summary fucntion gives the overall details of dataset

The Complete Code is like this

After carefully analyzing these dataset fin we understand that numerous columns which are factors i.e Categorical variables. we will analyze this more in next tutorial.

Comments

Unknown31 May 2018 at 05:17
Nice it is very useful thank you for sharing
Tableau Online Course Bangalore
ReplyDelete
Replies

Add comment

Simplified Analytics

Search This Blog