Sales Toll Free No: 1-855-666-7446

Statistics

Top

Humans are surrounded by information 24/7 and information pops at us in a number of different ways and in different forms. How many calories did each of us eat for breakfast? How far from home did everyone travel today? To make sense of all this information certain tools and ways of thinking are necessary. The mathematical science called statistics is what helps us to deal with this information overload.

Statistics is the study of how to collect, organize, analyze and interpret numerical information from data. Deals with all these, including the planning of data collection in terms of designing surveys and experiments. It is not a branch of mathematics or science but considered as a distinct mathematical science. The data quality can be improved by developing specific experiment designs and survey samples.

Statisticians apply statistical thinking to a wide variety of scientific, social and business endeavors in areas such as astronomy, biology, education, economics, engineering, genetics, marketing, medicine, psychology, public health, sports, among many. "The best thing about being a statistician is that you get to play in everyone else's backyard." 

Statistics Definition

Back to Top
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data. It is a type of mathematical analysis involving the use of quantified representations, models and summarizes for a given set of real world observations. The word statistics can either be singular or plural. In its singular form, it refers to the mathematical science and in its plural form, it refers to a quantity (such as mean) calculated from a set of data.

Statistical Topics

Back to Top
Given below is the list of statistical topics

Acturial science
Acturial science is the discipline that applies mathematical and statistical methods to assess risk in the insurance and finance industries.
Anova Anova is a statistical test of whether or not the means of several groups are all equal.
Autocorrelation Autocorrelation is a mathematical representation of the degree of similarity between a given time series and a lagged version of itself over successive time intervals.
Autoregression Autoregression is a stochastic process used in statistical calculations in which future values are estimated based on a weighted sum of past values.
Asymptotic distribution Asymptotic distribution is a hypothetical distribution in a sense the limiting distribution of a sequence of distributions.
Bar chart Bar Chart is with rectangular bars with lengths proportional to the values they represent. Shows comparisons among categories.
Biostatistics Application of statistics to a wide range of topics in biology.
Categorical data A categorical data is a set of data sorted into different categories according to the attributes of the data.
Confidence interval Confidence interval is a type of interval estimate of a population parameter used to indicate the reliability of an estimate.
Correlation
The simultaneous change in value of two numerically valued random variables.
Data Data are values of qualitative or quantitative variables belonging to a set of items.
Data mining Data mining is the process of analyzing data from different perspectives and summarizing it into useful information.
Deviation Difference between the value of an observation and the mean of the population.
Estimator Rule for calculating an estimate of a given quantity based on observed data.
Factor analysis Factor analysis identifies underlying variables that explain the pattern of correlations within a set of observed variables.
Five number summary A descriptive statistic that provides information about a set of observations, minimum, maximum, median, first quartile, third quartile.
Frequency distribution Frequency distribution is a statistical table that distributes the total frequency to a number of classes.
Goodness of fit Goodness of fit is a statistical model describes how well it fits a set of observations.
Histogram A visual graph that shows the frequency of a range of variables.
Kurtosis A statistical measure used to describe the distribution of observed data around the mean.
Latin square designThe Latin square design is used where the researcher desires to control the variation in an experiment that is related to rows and columns in the field.
Lorenz curve Lorenz curve shows the degree of inequality that exists in the distributions of two variables.
Mean Sum of a collection of numbers divided by the number of numbers in the collection.
Probability The likelihood that a particular event will happen in the future. Probability can be expressed as a fraction, ratio, or percentage.
Population mean The mean of a numerical set that includes all the numbers within the entire group.
Range The difference between the smallest and the largest values within a numerical set.
Standard Deviation A measure of the dispersion of a set of data from its mean.
Variance Variance is a measure of how far a set of numbers is spread out.

Statistical Methods

Back to Top
Statistical methods summarize a collection of data and is a method of analyzing or representing statistical data. As scientists rarely observe entire population sampling and statistical inference are essential. Strong emphasis is put on the choice of appropriate methods of statistical inference based on the data.
  • Data sample and experimental design
  • Levels of measurement.

Data sample and experimental design

Data sample is a set of data collected from a population by a defined procedure. Data is collected to estimate the values of characteristics of the parent population and to conduct a hypothesis test. An experiment attempts to define cause and effect relationship between two or more variables and they aim at comparing two sub populations and determining if there is a significant difference between them. How can samples be chosen from populations? Choosing a random sample is not easy as they are two types of errors associated, sampling errors and non sampling errors. Sampling errors are due to chance variation resulting from a population while non sampling errors are due to variations associated with improper sampling.

In an experiment, change one or more process variables to observe the changes they have on one or more response variables. Experimental design is an efficient procedure for planning experiments so that the data obtained can be analyzed to yield valid and objective conclusions. Experimental designs maximize the amount of information obtained for a given amount of experimental effort.

Level of measurement

Level of measurement refers to the relationship among the values that are assigned to the attributes for a variable.

Nominal - Values assigned to variables represent a descriptive category, however there is no inherent numerical value with respect to magnitude. No ordering of cases is implied. For example, Jersey numbers in basket ball are measures at the nominal level. A player with number 30 is not more of anything than a player with number 15, and is certainly not twice whatever number 15 is. They are the weakest.

Ordinal - Ordinal refers to order in measurement and indicates direction. Difference between attributes do not have any meaning. Attributes can be ordered.
Examples :
Rank - First, second,.... , last
Level of agreement - No, yes, maybe
Political orientation - Left, center, right

Interval - The interval or cardinal scale has equal units of measurement, thus making it possible to interpret not only the order of scale scores but also the distance between them, distance is meaningful.
Example : Time of day on a 12 hour clock.
When we measure temperature, distance from 30 - 40 is same as distance from 70 - 80. Ratios don't make any sense in interval measurements. 80 degrees is not twice as hot as 40 degrees

Ratio - Highest level of measurement is a ratio scale. It has the properties of nominal, ordinal, interval scale, a ratio scale has an absolute zero (a point where none of the quality being measured exists). Comparisons such as twice as high, one-half as much can be made.
Example : 14 o'clock is twice as long from midnight as 7 o'clock.
Ruler : Inches, centimeters.

Statistics Problem

Back to Top
Given below are some of the example problems in statistics.

Solved Examples

Question 1: Find the mean and median for the given data below:
9, 12, 14, 17, 25, 34, 45, 72, 89, 95
Solution:
Mean = $\sum_{i=1}^{n}\frac{x_{i}}{n}$
= $\frac{9 + 12 + 14 + 17 + 25 + 34 + 45 + 72 + 89 + 95}{10}$
= 41.2

Median = The data is in ascending order and consist of even amount of numbers.
Add the middle two numbers and divide by 2.
$\frac{25+34}{2}$ = 29.5
Therefore, the mean and median are 41.2 and 29.5 respectively.

Question 2: Given below is the marks of 20 students in statistics paper. Maximum allotted mark to statistics paper is 30. Construct a tally chart for the given data.
25, 25, 19, 19, 19, 18, 17, 29, 30, 29, 29, 30, 30, 26, 20, 15, 18, 25, 26, 27
Solution:
First, arrange the given data in ascending order and then, the tally chart should be constructed.
Marks of students in Statistics
 Tally marks  Frequency
 15  |  1
 17  |  1
 18  ||  2
 19  |||  3
 20  |  1
 25  |||  3
 26  ||  2
 27  |  1
 29  |||  3
 30 ||| 3