Sales Toll Free No: 1-800-481-2338

# Data Analysis

Top
 Sub Topics The data analysis is the process by which we can extract the useful information and all is happen when we pass our data through many steps, the steps are these:-Inspecting the data.Cleaning the data.Transforming the data.And last modeling the data. These four are the key characteristics of the data analysis. The data analysis is used for the decision making. The Algebra data analysis is also a same approach that contains the same procedure to analyze the data. It is only used when we have algebraic data, equations and formulas. The aim of data analysis is same in all scenarios. Data analysis has many facets, methods, procedures, approaches and techniques so that we can easily analyze the data. The most common and useful technique for the data analysis is data mining. Sometimes the data analysis can be considered as the data modeling. Before data analyzing, we should be familiar with the types of data, data can be following types: - Quantitative data: - It means those types of data that can be measured in terms of number. Categorical data: - It means variety of data. Qualitative data: - It means that data that has good features. Let’s see the step by step process of data analysis: - Data cleaning: - This is the first useful step; in this step we examine the data, if errors are there then resolve them. The process of data cleaning is made at the time of data entry. During the data cleaning we need to maintain both the types of data that is updated data and original data. After it all the alterations should be properly documented. Initial data analysis: - This analysis includes the following steps, that are:- i) Quality of data:- In this phase we check the quality of data by doing the several operation on the data such as analysis of missing observations, frequency counts, mean - median, extreme observations analysis and many more. ii) Quality of measurements:- After checking the data quality we determine the ways of quality measurements that are confirmatory factor analysis and internal consistency analysis. iii) Initial Transformations:- In this step we do the transform for the one or more variables and for this we have some transformation such as Square root, long and inverse transformations. Main data analysis:- This is the final step of data analysis that also includes the some steps:- i) Exploratory and confirmatory approaches:- The exploratory approaches does not deals with the no clear hypothesis, it only find the data for the models. And in confirmatory approaches or analysis we only include the clear hypothesis. ii) Stability of results:- In this we check that our results are reliable or not? To specify this we use two methods that are cross validation and sensitivity analysis.

## Measure of Central Tendency

Measure of Central Tendency Definition is a value that can describe a Set by its central data or in other words we can say that it describe a particular set of data by its central Position. Measure of Central Tendency has other name as well like measures of central and summary Statistics. Measure Of Central Tendency have three main parts, mean, mode and Median. Mean is mainly used for measuring central tendency and you all are very familiar with this term and this is the simplest method to calculate the central tendency of a data, although we can also calculate central tendency with the help of Mode and median as well but it will be little tougher for you. All the three parameters can measure central tendency but in different conditions for some specified conditions Mean measures central tendency more accurate than median and mode, sometime median is more accurate than mode and mean and sometime mode is more accurate than median and mean. Now we will see the suitable conditions for all the three parameters.
First we will discuss about mean,
Mean is the most popular and most frequently used parameter for measuring central tendency. Mean can be used for all types of data whether it is discreet or continuous but mainly it is used for continuous data. We can obtain mean just by adding all the elements in the data and then divide by number of elements in this way we can measure mean of any given data. If A, B, C, D , E, F be the element of the group then we can measure mean as X= A+B+C+D+E+F/6.
Mean has a disadvantage that if the data is skewed then we prefer mode and median over it. Now we will move to median, we can find the median of a data is the middle score of all the terms, it gives very accurate result when we are asked to find the central tendency of skewed elements that is why we prefer median over mean in skewed cases. Now we will talk about mode,
Mode is the most accurate element to find the central tendency but for that data should be continuous. It is also very accurate when we are given all the values very nearer like 30.1, 30.2, 30.3 and so on. For this type of data mode is the best way to measure the central tendency.

## Deviation and Z Score

Standard deviation z score is used to measure the deviation of data from its Mean. It is equals to Square root of variance or z score Standard Deviation tells us that how many data items are above and below the mean.
The formula is for standard deviation is:
S = √∑(x – x’2) / N – 1
Where s = the standard deviation
X = each value in the sample
X’ = the mean of the values
N = the number of the values (the simple size)
For finding the standard deviation we follow some steps:
Step 1: Find the mean.
Step 2: Subtract mean from each value in data Set.
Step 3: Find the assumed mean of values we got in step 2.
Step 4: Now find the square root of value we got from step 3, the value we get now is the standard deviation.
Suppose the amount of silver coin 4 pirates have is 6, 4, 8, 10 respectively, we can Calculate the standard deviation of amount of silver coins as shown below:
We know that the formula for standard deviation is:
S = √∑(x – x’2) / N – 1
Here the value of‘s’ is the standard deviation, and ‘x’ is each value in the sample, and x’ is the mean of the values, and ‘N’ is the number of the values. Now we can calculate the standard deviation step by step:
First we calculate the mean of the given data, and the formula for the finding the mean is:
X’ = ∑x / N
or
X’ =( x1+ x2+ x3 + x4 …. + xN) / N,
= (6 + 4 + 8 + 10) / 4 ,
= 28
4
= 7;
So the mean value is 7;
Now we calculate x – x’ from the given data:
X1 – x = 6 – 7 = -1,
X2 – x = 4 – 7 = -3,
X3 – x = 8 – 7 = 1,
X4 – x = 10 – 7 = 3,
Now we have to calculate ∑(X1 – x)2;
∑(X1 – x)2 = (X1 – x)2 + (X2 – x)2 +… (Xn – x)2,
= (-1)2 + (-3)2 + (1)2 + (3)2,
= 1 + 9 + 1 + 9 = 20;
Now put all the values in the standard deviation formula:
S = √∑(x – x’2) / N – 1
= √ 20 / 7 – 1
= √20 / 6,
So the value of standard deviation and z score is 1.82.

## Line of Best Fit

A line which passes through the center of group of data points that are plotted on a scatter plot is known as line of best fit. Scatter plots are used to depict the results of gathering data on two variables and line of best fit is used to find whether these two variables are correlated or not. There are many methods for determining the line of best fit:
Line of best fit is a mathematical tool which is known as least squares method. It is also used in the regression analysis, in the statistical calculation it is an input key such as the sum of squares.

Now we will see line of best fit or least Square method:

The lines of best fit are used to show the relationship between two variables:

For finding the line of best fit or in least square method we need to follow some steps shown below:

Step 1: First we find the Mean of the ‘x’ values and the mean of ‘y’ values.
Step 2: Then we put the sum of the squares of the x – values in expression.
Step 3: Then after putting the sum of the squares of the x – values we multiply it by its corresponding y – values.
Step 4: After that we find the Slope of the line.

And the formula for finding the Slope of the line is given as:

$\frac{\varepsilon xy \frac{(\varepsilon x)(\varepsilon y)}{n}}{\varepsilon x^{2} \frac{(\varepsilon x)^{2}}{n}}$

Where, ‘n’ represents the total number of data points and ‘m’ is the slope of the line.

Step 5: Now put the y – intercept of the line with the help of the formula:

b = y’ – mx’;
where, y’ and x’ both are the Median of the x – and y – coordinates of the data points respectively.

Step 6: At last we use the slope and y – intercept and we get the equation of the line.

Now we will see how to find the equation of line of best fit for the data by using least square method;

Suppose we have data for the line:
X: - 5 6 8 6
Y: - 9 4 3 7

For finding the equation we have to follow the above steps:

First we calculate the mean of the ‘x’ and ‘y’ values:
X Y XY X2
5 9 54 25
6 4 24 36
8 3 24 64
6 7 42 36
∑ x = 25 ∑y = 23 ∑xy = 144 ∑x2 = 161

Then we put these values in the slope intercept formula:

$\frac{\varepsilon xy \frac{(\varepsilon x)(\varepsilon y)}{n}}{\varepsilon x^{2} \frac{(\varepsilon x)^{2}}{n}}$

$\frac{144 \frac{25 \times 23}{4}}{161 - \frac{(25)^{2}}{4}}$

$\frac{144 - 143.75}{161 - 156.25}$

m = 0.92;

Then find the mean ‘x’ and ‘y’ values:
X’ = 25/4 = 6.25;
Y’ = 23/4 = 5.75;
Then put in the y – intercept:
b = y’ – mx’;
b = 5.75 – (0.92 * 6.25);
b = 0;
So the equation is y = 0.92x + 0;

## Box and Whisker Graphs

A figure drawn using a number line to represent the distribution of data is known as box and whisker. Along the number of lines a box and whisker plot is used to distribute a Set of data. The end value of the whisker denotes several possible alternative values among them.
The minimum and maximum value of data and Standard Deviation is used above and below the Mean of the data. Some of the Box Plots are used for addition character to represent the mean of the given data.
Now we will see how to design the box and whiskers graph.
For drawing a box and whisker graphs we have to follow some steps which are given below:
Step 1: When we design the box and whisker graph, we have to start by ordering the data.
Step 2: We put the values of box and whisker in the numeric order, if they are not ordered.
Step 3: After that we find the Median of the given data, the median divides the data into two halves.
Step 4: For dividing the data into quarters, it is necessary to find the median of these two halves.
Step 5: Suppose the number of values is even then first median is the average value of two middle values, and then the median is an actual data Point. It can not be added in sub – median computation.
It is sometimes also known as a box and whisker plot or box chart.
Suppose we have the set of data 4, 9, 7, 8, 3 and we have to construct box and whisker plot then;
First we put the data in the ordered form:
Greatest to least or from least to greatest:
If we write least to greatest we get:
3, 4, 7, 8, 9;
In the given set the smallest value is 3;
So the lower extreme is 3.
Now greatest value in the given set:
In this given set the greatest value is 9; so the upper extreme is 9.
Now see 3, 4, 7, 8, 9;
7 lies in the middle of the set of the data.
So the number 7 is the median of the given data:
Now find the lower extreme, we take all the data before the median;
3 and 4 is the lower extreme.
Now find the upper extreme, we take all the data after the median;
So 8 and 9 are the upper extreme.
In this way we get box and whisker.