Data analysis is a process of extracting useful information. There are four steps to analysis data, - Inspecting the data.
- Cleaning the data.
- Transforming the data.
- And last modeling the data.
The data modelling can be considered as data analysis. There are three types of data which we analysis. They are as follows, Categorical Data : Variety of data.Qualitative Data : Data that has good features.Quantitative Data : Data that can be measured in terms of number.Let discuss data analysis below step by steps, Data cleaning: Data cleaning is the process of examine the data and resolving the errors. This is made at the time when data enter. During this process we need to have updated data and original data as well. After all this the data should be documented. Initial data analysis: Find steps of initial data analysis below, 1) Quality of Data: Quality of data check the quality of data by firing many operations such as frequency counts, analysis of observations, mean-median, and many more.2) Quality of Measurements: Quality of measurements determine in many ways like confirmatory factor analysis, internal and many more.3) Initial Transformations: Initial transformations transform 1 or more variables. It works on transformation such as long, square root and inverse transformations.The last step is main data analysis:- This includes Exploratory and confirmatory approaches and Stability of results. |

**Mean**is used for all types of data like discreet or continuous. Mainly it is used in continuous data. Let A, B, C, D , E, F be the element of the

group then mean as $X$= $\frac{A+B+C+D+E+F}{6}$.

**Mode**is used to calculate the accurate element to find the central tendency with continuous data. It is more accurate when the values are like 30.1, 30.2, 30.3 and so on. For this type of data mode is the best way to measure the central tendency.

The deviation of data from its mean can be measure by standard deviation. It is denoted by Z. Standard deviation is equal to square root of z score or variation.

The formula is for standard deviation is:

$S$ = $\sqrt{\sum_{i=1}^{n}\frac{(x_{i}-x')^{2}}{N-1}}$

Where,

$S$ is the standard deviation

$S$ is the standard deviation

$X_{i}$ is each value in the sample

$X'$ is the mean of the values

N is the number of the values (the simple size)

There are some steps to find the standard deviation,

**Step 1:**First find the mean of given data.

**Step 2:**Put mean values from each data Set.

**Step 3:**From step 2 assumed mean of values.

**Step 4:**From step 3 to calculate the square root of value,This is the standard deviation.

**Examples:**

Let the amount of silver coin 4 pirates have is 6, 4, 8, 10 respectively, we can Calculate the standard deviation of amount of silver coins as shown below:

**Solution:**

We know that the formula for standard deviation is:

S = $\sqrt{\sum_{i=1}^{n}\frac{(x-x_{i})^{2}}{N-1}}$

Here the value of‘s’ is the standard deviation, and ‘x’ is each value in the sample, and x’ is the mean of the values, and ‘N’ is the number of the values. Now we can calculate the standard deviation step by step:

First we calculate the mean of the given data, and the formula for the finding the mean is:

X’ = $\frac{x_{i}}{N}$

or

X’ = $\frac{( x_1+ x_2+ x_3 + x_4+ …. + x_N)}{N}$,

= $\frac{(6 + 4 + 8 + 10)}{4}$,

= $\frac{28}{4}$

= 7;

So the mean value is 7;

Now we calculate x – x’ from the given data:

X

_{1}– x = 6 – 7 = -1,

X

_{2}– x = 4 – 7 = -3, X

_{3}– x = 8 – 7 = 1, X

_{4}– x = 10 – 7 = 3,Now we have to calculate ${\sum_{i=1}^{n}(X_{i} – x')^{2}}$

^{}

$\sum_{i=1}^{n}{(X_{i} – x')^{2}}$ = (X_1 – x)

^{2}+ (X_2 – x)

^{2}+… ..+(X_n – x)

^{2},

= (-1)

^{2}+ (-3)

^{2}+ (1)

^{2}+ (3)

^{2},

= 1 + 9 + 1 + 9 = 20;

Now put all the values in the standard deviation formula:

S = $\sqrt{\sum_{i=1}^{n}\frac{(x-x_{i})^{2}}{N-1}}$

S = $\sqrt{{20}{7-1}}$

So the value of standard deviation and z score is 1.82.

Line of best fit is the line which passes through the center and it is plotted on a scatter plot. Scatter plots are basically shows the result of graphic data on line fit of best. it is used to find the two variables are correlated or not. Line of best fit can be determined in many ways. The lines of best fit are used to show the relationship between two variables:

For finding the line of best fit or in least square method we need to follow some steps shown below:

**Step 1:**First we find the Mean of the ‘x’ values and the mean of ‘y’ values.

**Then we put the sum of the squares of the x – values in expression.**

Step 2:

Step 2:

**Then after putting the sum of the squares of the x – values we multiply it by its corresponding y – values.**

Step 3:

Step 3:

**After that we find the Slope of the line.**

Step 4:

Step 4:

And the formula for finding the Slope of the line of best fit is given as:

$m$ = $\frac{\sum xy -\frac{(\sum x)(\sum y)}{n}}{\sum x^{2} -\frac{(\sum x)^{2}}{n}}$

Where, ‘n’ represents the total number of data points and ‘m’ is the slope of the line.

**Step 5:**Now put the y – intercept of the line with the help of the formula:

b = y’ – mx’;

where, y’ and x’ both are the Median of the x and y coordinates of the data points respectively.

**Step 6:**At last we use the slope and y – intercept and we get the equation of the line.

Now we will see how to find the equation of line of best fit for the data by using least square method;

Suppose we have data for the line:

X: | -5 | 6 | 8 | 6 |

Y: | -9 | 4 | 3 | 7 |

For finding the equation we have to follow the above steps:

First we calculate the mean of the ‘x’ and ‘y’ values:

X |
Y |
XY |
X2 |

5 | 9 | 54 | 25 |

6 | 4 | 24 | 36 |

8 | 3 | 24 | 64 |

6 | 7 | 42 | 36 |

$\sum x =25$ | $\sum y =23$ | $\sum xy = 144$ | $\sum x^{2} = 161$ |

$m$ = $\frac{\sum xy -\frac{(\sum x)(\sum y)}{n}}{\sum x^{2} -\frac{(\sum x)^{2}}{n}}$

$\frac{144 -\frac{25 \times23}{4}}{161 - \frac{(25)^{2}}{4}}$

$\frac{144 - 143.75}{161 - 156.25}$

m = 0.92;

Then find the mean ‘x’ and ‘y’ values:

X’ = 25/4 = 6.25;

Y’ = 23/4 = 5.75;

Then put in the y-intercept:

b = y’ - mx’;

b = 5.75 - (0.92 * 6.25);

b = 0;

So the equation is y = 0.92x + 0;

**Example:**

Box and whisker are the figures which drawn by using a number line to represent the distribution of data. It is also used to plot distribute a set of data. Several alternative values are denoted by the end value of whisker. Data and standard deviation are used below and above mean data for maximum and minimum values.

**Solution:**

Find the drawing steps for box and whisker graph,

**Step 1:**To design the box and whisker first order the data.

**Step 2:**Substitute the value of box and whisker in the numeric order.

**Step 3:**Find the median of data.

**Step 4:**Again find the median of these two halves.

**Step 5:**If the number of values are in even number then average of two middle values will gives you actual data point.

**Example 2:**Let the set of data 13, 12, 11, 14, 15, 17, and 16, we have to construct box and whisker plot then;

**Step 1:**Substitute the data in the ordered form:

Greatest to least or from least to greatest:

**If we write least to greatest we get:**

Step 2:

Step 2:

11,12,13,14,15,16,17 ;

**Step 3:**In the given set the smallest value is 11;

The greatest value in the data set is 17;

**Step 4:**Now 11,12,13,14,15,16,17;

14 lies in the middle of the set of the data.

**Step 5:**Since the number 14 is the median of the given data:

Now find the lower extreme, we take all the data before the median;

and 12 is the lower extreme.

**Step 6:**Now find the upper extreme, we take all the data after the median;

So 16 is the upper extreme.

In this way we get box and whisker.