Sales Toll Free No: 1-800-481-2338

# Numerical Descriptive Statistics

Top
 Sub Topics Numerical Descriptive Statistics is the collection of different techniques like measures of central tendency, Measures of variability, Measures of relative standing and measures of linear relationship. These techniques mentioned above include many other methods which are shown below- 1. Measures of Central Tendency include different methods that are Mean, mode and Median, · Mean is the sum of values divided by total number of values. · Mode is the most repetitive value in a data Set. · Median is the central value in a data set. 2. Measures of variability range include Standard Deviation, variance, and coefficient of variation. · Variance is used to measure variability from the given data. · Standard deviation is Square of variance. 3. Measures of relative standing include Percentiles and Quatriles. · Percentile can be defined as a value below some observations. · Quartile is a set of 3 points which partition a group into 4 equal parts. 4. Measures of linear relationship include Covariance, correlation, least squares line. · Correlation is used to show relationship among the entities. · Covariance shows the degree of changing of two random variables in Probability.

## Standard Deviation

In Statistics standard deviation is used to measure the diversity. It is widely used in statistics and Probability Theory. The statistics standard deviation shows that how much diversity exists and how much the Dispersion present from the average Mean value. The standard deviation in statistics is basically classified into two categories; which are:

1. Low standard deviation: This deviation shows that the data points are very close to their average or expected or mean value. In this it indicates that these data values are clustered closely to the mean value of all the data points. In general the data points are not farther from the mean.

2. High standard deviation: This deviation indicates that all the given data points are not close to the mean value in fact all are spread out over a wide range of the given values.
The standard deviation is not like variance and its unit is same as the data. The standard deviation may also be calculated for the random variables, population in statistics and Probability distribution which is basically the root value of their variances. Also to show the variability of statistical population we use standard deviation; for this we simple calculate the confidence in the statistical conclusions. For instance in the margin of error in polling problems; we get the data by simply calculating the standard deviation in case if the same poll is used more than two times (or we can say multiple times the same poll is used). The margin of error is just double of the standard deviation of the data points.
Sample standard deviation is also a type of standard deviation. It is calculated when sample of the data from a large amount of data or population is used.

For example let’s assume that a sample has following data points:

1, 2, 4, 4, 5, 5, 5, 6, 9, 9

Now the average of these ten data values can be given as:

(1 + 2 + 4 + 4 + 5 + 5 + 5 + 6 + 9 + 9) / 10,

50 / 10,

So the mean or average value is 5.

Now to calculate the Sample standard deviation we first need to compute the difference of each data value from the above mean or average value and then Square the resultant data value.

The data points are: 1, 2, 4, 4, 5, 5, 5, 6, 9, 9 and mean is 5 so:
(1 - 5)2 = (-4)2 = 16,
(2 - 5)2 = (-3)2 = 9,
(4 - 5)2 = (-1)2 = 1,
(4 - 5)2 = (-1)2 = 1,
(5 - 5)2 = (0)2 = 0
(5 - 5)2 = (0)2 = 0
(5 - 5)2 = (0)2 = 0
(6 - 5)2 = (1)2 = 1
(9 - 5)2 = (4)2 = 16
(9 - 5)2 = (4)2 = 16
Now we take the square root of the average of the above values:

$\sqrt{\frac{(16 + 9 + 1 + 1 + 0 + 0 + 0 + 1 + 16 + 16)}{8}}$ = $\sqrt{\frac{60}{8}}$ = $\sqrt{7.5}$.

So this quantity is called the sample standard deviation. It is the square root of the variance.

Now here is the definition of the population values:

Consider that the Y be a Random Variable where the mean value can be denoted as µ:

E[Y] = µ.

In the above ‘E’ is an operator which is denoting the average or mean value of random variable ‘Y’. Now the standard deviation of the random variable ‘Y’ is:

σ = square root of the E [(Y - µ) 2]

Here the sigma sign or ‘σ’ shows the standard deviation which is the square root of the variance of the random variable ‘Y’ that means the variance of the random variable ‘Y’ is (Y - µ)2.

The standard deviation has several applications in many fields. Here are some applications the standard deviation:
1. Climate: The standard deviation is also useful in Climate. To understand this lets take an example: Assume that the average daily maximum temperature of two cities say city ‘A’ and city ‘B’. Now it is very helpful to get that The Range of daily maximum temperatures for all the cities which are near the city ‘B’ is smaller than the cities near the city ‘A’. Although these two cities can have the same average temperature still the standard deviation of daily temperature for city ‘A’ will be less than the other city; because the actual temperature is farther from the average maximum temperature for the city ‘A’.
2. Sports: It is very much useful in Sports too. Here this is used for prediction that which team at any given day will win; by taking the standard deviation of the various teams start ratings where we can get the comparison between the strengths and the weaknesses to understand that which factor can be beneficial or a stronger indicator to win the match. In racing a driver is arranged on the successive time laps. If the driver is with a low standard deviation of the lap time then it is much better and consistent then the driver with a high standard deviation of the lap time.
3. Finance: In this field the standard deviation has its own Position. In finance standard deviation represents the level of risk that is associated with the price change or fluctuations of the given assets say property, bonds etc. In any business the risk is very important factor. The risk is used in estimating that how to efficiently manage a portfolio of the investments. So we use standard deviation to calculate the risk premium.
4. Geometric interpretation: It is also useful in many cases of Geometry where instead of a lengthy process we use standard deviation to solve a particular problem.
The other applications of standard deviation are in the ‘Chebyshev’s’ inequality where he gave a table for standard deviation which shows the relation between the minimum population and the distance from the mean or average. Also it is useful in the normally distributed data; where in central unit theorem, this property is used.
So these are the applications and uses of standard deviation statistics.

## Central Tendency in Statistics

We can give definition of measure of central tendency "as the measurement of three parameters named as Mean, mode and Median". Central tendency in Statistics is basically a unique and single value which defines a data Set by simply determining the central Position of that data set. Central tendency is simply the calculation of the central location of a set of data points or values. The central tendency can be measured by three terms which are:
1. Mean
2. Median
3. Mode
The mean is the most familiar measure. All these three measures are valid measures of central tendency. These measures are calculated under some conditions and constraints; according to the requirement we calculate them. Now we will learn how to calculate these measures and in what conditions and constraints we should use them.
Mean: This is the most familiar and most well known measure of central tendency statistics. This is also known as average or expected value. This measure of central tendency can be used with both types of data: discrete data and continuous data.
A mean is basically the sum of all the data values of a given set of data which is divided by the total number of values of that data set. Suppose we have ‘n’ data values in a data set which are y1, y2, y3, . . . . . . . ,yn then the mean of the given data set will be:
ε = (y1 + y2 + y3 + . . . . . . . + yn) / n
In above formula the ‘ε’ is denoting the mean of the data set. We can also write this formula as:
ε = ∑y / n,
Here the ‘∑’ is a Greek capital letter or symbol which means is “Sum Of..”
Above formula is basically referring to the mean of a sample. So the question arises that why this is called a sample mean instead of just mean? So its answer is that it is called the sample mean because in statistics there is a difference between the Samples and the Populations. They have a different meaning; thus to acknowledge and identifying that what we actually are calculating we use terms Sample Mean and Population Mean. If in the problem we have a sample of data values then we call it as Sample mean and in case we are given a population then we call it population mean; although both the means are calculated in the same manner.
For the Sample mean we use a Greek lower case letter “mu”, which is denoted as ‘µ’.
Sample mean = µ = ∑y / n
The mean is basically the value that is most common and is a model of a given set of data points. The most important and useful property of the mean is that it reduces the error in the prediction of any one data Point in the given set of data. This is the value which generates the lowest error from all other values of that data set. One more special property of mean is that when we calculate it, we use all the data values of the data set in the calculations. Also the mean is the only one measure of the central tendency whose sum of the deviations of each value from the mean is every time zero means the Dispersion is always zero.
Disadvantage of mean: The mean has a main disadvantage which is that it is influenced by the outliers. Outliers are the values which are out of the sample and are unusually compared with the rest of the data set. In so many situations where this disadvantage overlaps the properties and advantages of Mode; then we use the other two measures of central tendency which are median and mode. If the data is skewed means the frequency distribution of the given data set is skewed then we use median. If the data is normally distributed then we can use any measure to calculate the central tendency because in this all are identical. But as the data is skewed then in case the mean will not provide the best central location or tendency for the data. There median is best for the best central tendency.
Median: The median is a one of the measure of central tendency. Where the mean is not an appropriate option to measure the central location in a data set; we use median for the calculation of central location. It is a middle score for a data set which must be arranged in a specific order of magnitude. It is less affected by the outliers and the skewed data and gives the best result. The median can be found by simply getting the middle one value from an arranged list of finite Numbers.
In case of population or in a sample of data; it is not possible to get any value which is identical to median, means any population may not have any single median so in that case we need to use mode for finding out the central tendency. Mean can be calculated as average of the two middle values of a data set. So this is one of the cases when instead of median we use mode for central tendency.
Mode: The mode is also a measure of central tendency which is the most frequently occur data value in the given data set. It is similar to median and mean because it is also used to get necessary information about any finite population or data set or sample of points in a single value or quantity. For this we may also use the Histograms where it represents the highest bar. It is used for nominal data which is not a set of numeral value where median and mean works on numeral values. It is very different for the distributions of data which are highly skewed. It can be unique or not; depends on the data set. It may not necessarily give a unique value in case of normal distributions.
So these are the only measures to finding out the central tendency of any data set.

## Dispersion

Dispersion Statistics is defined as science which deals with the collection, presentation, analysis and interpretation of numerical data. Statistics deals with the groups and doesn't deal with the individuals. Statistical laws are not exact but true on Averages.
The word dispersion in statistics refers to a technical meaning. It is a one aspect observation. Another feature of the observations is that how the observations are spread about the center of the Set of the values. The observations can be close to the center or the observations may move away from the center. If these observations are close to the center then they are named as small dispersion for the set of the values and if the dispersion tends to be away from the center then this is named as dispersion, scatter or variation is large. The dispersion is small generally for the arithmetic Mean or Median.
Statistical dispersion also named as statistical variation. Statistical dispersion measurement is a real number zero, if all the data that are present are identical. The value of dispersion cannot be less than zero. The unit of dispersion is same as the quantity being measured.
In a Sample Space the data values are not the same. The variation between these values is called as dispersion. When the dispersion between these values is large they get scattered and when the dispersion is small between these values they are tightly clustered.
In other words the extent to which the observations in a population varies about their mean then this set of observations is called as dispersion. The quantity that is used to measure the dispersion, For example the width of diagrams like plots (dot, box, stem and leaf) is greater for samples with more dispersion. Some measures of dispersion and their description are as follows.
· Standard deviation
· inter decile range
· Range
· mean difference
· Median absolute deviation
· Average absolute deviation or average deviation
· distance Standard Deviation
Dispersion can be calculated in several measures these measures show that what degree the individual observations of a data set are dispersed and the most popular is being the standard deviation.
· The other measure of the dispersion is The Range that is the difference between the largest and the smallest value observed. Here it should be noted that range deals only with the maximum and minimum values not with the values between them.
· Spread of dispersion is measured by the parameter Sample variance within a set of sample data. This is the sum of the squared deviation from their average divided by one less than the number of observations in the data set. For example if there are n observations x1, x2, x3, …, xn with sample mean then the average is
x̅ = (1/n) ∑xi,
and the sample variance is given by
s2 = (1/n-1) ∑ (xi - x̅ )2,
· Standard deviation is the another important term of dispersion that is the measure of the spread of set of data. It can be calculated by taking the Square root of the variance and abbreviated as ‘s’. Mathematically it can be represented as
√ F(x) = √ (σ2) = s,
This implies that with the increment in the values the standard deviation also increases.
· The coefficient of variation refers to the spread of a set of data as a portion of its mean generally it is expressed in the terms of Percentage.
All the measures of statistical dispersion defined above have the property that they are not only location invariant but also linear in scale too.
Some other measures also there which are dimensionless or scale free means that they have no unit even if the variable have the units itself such as
· Coefficient of variation
· Quartile coefficient of dispersion
· Relative mean difference that is equal to the twice the Gini coefficient.
If the measures of dispersion are not linear then the other measures of dispersion are,
· Variance that is the square of the standard deviation
· Variance to mean ratio.
In nominal dispersion an index of qualitative variation is a measure of statistical dispersion in the nominal dispersion.
A quantity that measures dispersion in a population is known as measure of dispersion. The measure of dispersion is classified in two terms that are absolute and relative dispersion. An absolute measure of dispersion measures the dispersion in terms of the same units as given in the units of the data problem or square of the units. In relative measurement of dispersion the measurement is expressed in the terms of ratio, coefficient and percentage and will not depend upon the measures of units.
Variance is the average of the squared deviation.
Standard deviation is the square root of the variance.

## Interpreting and Understanding Standard Deviation

In Statistics standard deviation is represented by the symbol sigma (σ). In mathematics Standard Deviation defines the deviation of data from the average or Mean. The Square root of variance is equals to standard deviation. Here we will see process of interpreting standard deviation.
z score term is also related to the standard deviations, it is also used to find the distance which is measured in standard deviations from the mean of the data.

We can interpret standard deviation as given below:

S = $\sqrt{\frac{\varepsilon (x - x^{1})^{2}}{N}}$

Where, the value of ‘S’ represents the standard deviation;

‘X’ represents each value in the sample;

X’ denotes the mean of the values.

‘N’ denotes the number of the values.

Now we will see how to Calculate the standard deviation:

Here we will follow some steps for finding the standard deviation:

Step 1: First we take some data items for finding the standard deviation.
Step 2: Then we find the mean of data values.
Step 3: Then we subtract the mean value from original value.
Step 4: Then we put all these values in the formula so that we get the value of standard deviation.

Suppose we have data values 16, 4, 10 and 2 then we can find standard deviation as shown below:

Formula of standard deviation is:

$\sqrt{\frac{\varepsilon (x - x^{1})^{2}}{N}}$

Now we can calculate the standard deviation step by step:

Step 2: we find the mean of given data:

x1 = $\frac{16+4+10+2}{4}$

x1 = $\frac{32}{4}$

= 8

So the mean value is 8;
Now we calculate x – x’ from the given data:
X1 – x’ = 16 – 8 = 8;
X2 – x’ = 4 – 8 = -4;
X3 – x’ = 10 – 8 = 2;
X4 – x’ = 2 – 8 = -6;

Now we will calculate ∑(X1 – x’)2;
∑(X1 – x’)2 = (X1 – x’)2 + (X2 – x’)2 +… (Xn – x’)2

= (8)2 + (-4)2 + (2)2 + (-6)2;
= 64 + 16 + 4 + 36 = 120;

Now put all the values in the formula:

S = $\sqrt{\frac{\varepsilon (x - x1)}{N}}$
N

S = $\sqrt{\frac{120}{4}}$

S = $\sqrt{30}$

= 5.47;
Above process helps us in understanding standard deviation.

## Position

Statistics position basically indicates the position of an element, which is also relative to other values in a Set of observations. Most common attributes of position are:
Percentiles
Quartiles
Standard score
Let’s have small introduction about all these common attributes of position Statistics.
Percentiles: - In case of percentiles, we have to guess the elements in a data set which are ranked or ordered from small to large format value. Values which are used to divide an order set of terms by 100 same parts are known as percentiles. Suppose an element that has a percentile rank of Ai would have large value than 'i' percent of entire elements in a set. Here inspection at 60 th percentile would be represented as A60, and it is larger that 60 percent of observation in a set. Inspection at 60 th percentile would corresponds to Median value present in a set.

Quartile:
In the case of quartile, it is used to divide a rank – ordered data element into four same parts. Here values which divide every part is said to be first, second and third quartiles. These first, second and third quartiles are represented as Q1, Q2, and Q3 respectively. Here we will see relationship among quartiles and percentiles, so Q1 corresponds to A30, and Q2 tends to P60­, Q3 tends to P90 here value of Q2 is denoted by values in a set.
Now we will understand the concept of standard score or z – score.
Standard score or z – score is used to find out the Standard Deviation from its Mean value. z – Score is calculated by following formula:
Z = (X – u) / ∑, here value of z- score is denoted by 'Z' and ‘u’ shows mean population and ‘∑’ denotes standard deviation.
This is all about statistics positions.