Sales Toll Free No: 1-855-666-7446

# Statistical Analysis

Top
 Sub Topics Statistical analysis is a term which refers to a wide range of techniques to : describe, explore, understand, prove and predict based on sample datasets collected from populations using some sampling strategy. Statistical analysis refers to a collection of methods used to process large amounts of data and report overall trends. It plays an important role when dealing with noisy data. We determine the nature of the dataset and the relation of the dataset to its underlying populations. Nature of the dataset can be inferred through the available data. The model can be applied to unsampled entities in the underlying population and we assume that the sample on which the model is based is representative of the population in which the predictions are made. Here we investigate causality and draw conclusions based on the effect of changes in the values of predictors on response. Well known statistical methods are analysis of variance, correlation, factor analysis, mann-whitney u, regression, non parametric methods, categorical data analysis etc.,

## Graphs in Statistical Analysis

Bar graph: Graphical display of data using bars of different heights. It compares the amounts of occurence of different characteristics of data. It is very easy to understand. The bars on a bar graph can be horizontal or vertical and is best suited for a qualitative independent variable. Easy to extract trends between bars but you cannot calculate a slope from the heights of the bars.

Line graph:  This is a graph that shows information that is connected in some way and uses points connected by lines to show how something changes in value. It is typically drawn bordered by two perpendicular lines, called axes. Horizontal axis and vertical axis are respectively called the x axis and the y axis. Each axis represents one of the data quantities to be plotted. It uses points and lines on a grid to show change over a period of time. Broken scale can be used when the data starts at a large number. A line graph should have a title, labels, scales, points and lines.

Box and whisker plot:  Box and whisker plot is also known as histogram like method of displaying data showing the distribution of a dataset, depicts groups of numerical data through their quartiles. It can be drawn vertically and horizontally.When the data points are clustered around some central value then the box and whisker plot comes handy! where in the box contains and highlights the middle half of data points. First and third quartiles are at the end of the box. Median is represented with a vertical line in the interior. Helpful in interpreting the distribution of data.

## Types of Statistical Analysis

Statistical analysis summarize a collection of data and is a method of  representing statistical data. As scientists rarely observe entire population sampling and statistical inference are essential. There is a strong emphasis on the choice of appropriate methods of statistical inference based on the data to be used. Given below are the different types of statistical analysis used.

Regression: Regression is a statistical technique that determines the strength of the relationship between dependent variable and a series of other changing (independent) variables. Regression is to be studied rigorously and is used extensively in practical applications, in regression it is easier to fit models linearly on their unknown parameters when compared to non linear models as the statistical properties of the estimators are easier to determine. Regression analysis with a single explanatory variable is termed “simple regression.”
A regression equation is of the form Y = a + bx + c, where,

Y: Dependent variable
x: Independent variable
a: Y - intercept of the line.
b: Slope.
c: Regression residual.
The purpose of regression is to find a formula, that fits the relationship between the two variables.

Categorical data analysis: In a categorical variable, range will be countable and is of mostly like lurking variables type (yes, no, don't know). When the data is collected in categories, we record counts. A categorical variable has a measurement scale consisting of categories. Can even occur in highly quantitative fields such as engineering sciences and industrial quality control. There are of many types and are given below.
1. Response explanatory variable
2. Nominal ordinal scale distinction
3. Continuous discrete variable distinction
4. Quantitative Qualitative variable distinction
Example: Different types of eye color: Blue, Green, Black and Brown.

A continuous variable will not be a categorical variable as it within a specified range, there will be several values which vary from smaller to larger.
Example: Age of a person.

Correlation: Correlation measures the degree to which two variables vary together. For example, Icecream sales and temperature are related during summer there will be more demand, however there might be a third factor involved. It is also possible to perform correlation on two dependent variables.
The formula for correlation is

r =  $\frac{N\sum ab-(\sum a)(\sum b)}{\sqrt{[N\sum a^{2}-(\sum a)^{2}] [N\sum b^{2}-(\sum b)^{2}]}}$
where r : Correlation lies between - 1 and + 1.
N : Number of pairs of scores.
$\sum$ab : Sum of the products of paired scores.
$\sum$a : Sum of x scores.
$\sum$b : Sum of y scores.
$\sum$a$^{2}$ : Sum of squared x scores.
$\sum$b$^{2}$ : Sum of squared y scores.