Statistics graphs are a ways to represent Statistics data in pictorial figure because numerical figures are more noticeable and easily intelligible, leaving a more lasting effect on the mind of the observer. With the help of these statistics graphs, data can be compared easily. There are many types of statistics graphs that exist in mathematics. A few of them are Bar Graph, Histograms Frequency Polygon, etc. |

A bargraph is a visual display used to compare the amounts or frequency of occurrence of different characteristics of data. This type of display allows us to compare different groups of data and to make predictions from the bar graph very quickly.

There are different components of the bar graph: Now let us view various components of the bar graph individually. Each component has its own importance. A bar graph is a self explaining method of Graphing so it is most common method of graph representation. We must remember the following names used with bar graph:

**1.**

**Graph Title:**It gives us an overview of the information of the graph, and so it is always displayed at the top of the graph.

**2.**

**Axis and their labeling:**As there are two axes X- axis and the Y- axis. The labeling done on the axes gives us the information that what field is represented on which axis. So it makes the graph self explanatory.

**3.**

**Grouped Data Axis:**It is always placed at the bottom of the bars, which is used to show the type of data being displayed.

The graphs can be represented in different ways as per the need and the requirement of the presentation. Some of the Common Graphs are Bar Graph, line graph, histogram, pie chart etc. A pie chart is also called as circular chart, where the whole graph is divided into sectors of different angles. While representing the pie chart, we observe that the Arc Length of each sector, which is its central angle and area, is proportional to the quantity represented by it. The angle so measured will be represented in form of degree. The first pie chart was credited to William Playfair.

The most common use of pie chart is seen in the business world, election Statistics and in the world of mass media. We have also observed that it is difficult to compare the rise and fall of certain data like profit, loss, Income and many other such fields, so many people also avoid recommending the use of pie charts. Although it has been observed that pie chart is preferred in many situations of comparison in contrast with tables and charts. Sometimes we only use dot plots, bar charts to represent two dimensional data, where pie charts do not work.

We observe that Pie charts are useful tool when we are comparing the data in form of fraction, or to compare different parts of a whole amount. We mostly use pie charts to represent financial information. Suppose any ‘x’ company has to present his annual expenses under different heads like salaries, infrastructure, taxation, raw material, power expenses running cost and many other areas, then they are summed up and each head is divided by total and then multiplied by 360 degrees . This helps to get the angle of each sector. Each sector visually represents an item in a data Set to match the amount of the item as fraction of the whole data on the pie graph.

Box plots in Statistics are methods of representing numerical data graphically. They consist of 5 parts:

1. Small observation

2. Lower quarter

3. Median

4. Upper quarter

5. Large observation

Boxplots also tell about the Numbers which are away from numerical data. Boxplots are mainly used to describe differences of population without considering statistical distribution. The gap between different parts of the box describes the deviation in the data. It can be plotted horizontally as well as vertically. The box plots are constant and the lower quarter and the upper quarter are essentially 25

^{th}and 75

^{th}part of the total box. The middle 50% is the Median.

The box plots can be represented in the following manner:

1. The minimum and the maximum quantity of the whole data.

2. Standard deviation in the Mean of data.

3. Any percentile of data.

The numbers of the value which are not included in the box plots or the outliers are represented with dot or Circle or star. The box plots use different characters to represent the mean of the numbers. A cross hatch is presented before the end of each box plot.

There are two types of Box Plots

1. Variable width box plots

2. Notched box plots

Variable width box plots represent the group of data by the width of the box proportional to the size of the group. Sometimes the width of the box plots is made proportional to the Square root of the size of the group.

Notched box plots narrow the size of the box around median briefly describe the difference of medians. The width of the notches is proportional to the inter quarter range and inversely proportional to the square root of the size of data.

± 1.58 * IQR / √n

The box plot is a very fast of evaluating the data graphically. The concept of box plots came earlier than Histograms.

Advantages of box plots

1. They take less space.

2. Most useful for comparison of data Sets.

3. The width and the length of the box plots can be decided by its user.

It is not easy to analyze large amount of data by manually. So, for resolving these kind of problems, we use graphical, statistical and Probability methods, which are called as a steam and leaf plot. With the help of stem leaf plot, we can easily find out relative density and value of particular data, means we can easily summarize data with the help of stem and leaf plot. Now we will discuss method for creating stem and leaf plot:

We use following steps for creating the stem and leaf plot -

**Step 1:**First of all, we will analyze all the data values, like if we have data values from 25 to 100, then we can calculate any number which repeats itself or not and if number is repeated then how many times it repeats itself.

**Step 2:**Now if we take tenth digit, like we have 61, then we take 6 this tenth digit is called as a stem.

**Step 3:**After evaluation of stem, now we calculate all values which are related with stem like we have 42 43 47 49 as a data values, then we have 4 as a stem and 2, 3, 7, 9 are values which are related with stem and these data values which are related with stem is called as leaf.

Suppose we have following data values -

22 25 28 44 49 35 67 63 69 71 73 78

Then-

**Step 1:**First of all we analyze these Numbers -

Here no number is repeating itself, so, we will go to next step.

**Step 2:**Now we will evaluate stem value from above data values -

2

3

4

6

7

Are stem values because these values are tenth digit values from above data values.

**Step 3:**Now we evaluate all values which are related with stem means here we evaluate leaf values -

2 = 2 5 8,

3 = 5,

4 = 4 9,

6 = 3 7 9,

7 = 1 3 8,

So, right hand side values are called as a leaf values because they are related with stem, which are shown on left hand side. Therefore this method is called as stem and leaf plots.

It is very easy to understand data graphically view rather than mathematical view. So, for representing large amount of data we use different graphical methods in Probability and Statistics. Here we discuss a graphical method which is related with frequency and when we create this graphical plot, then this plot is called as a cumulative frequency graph. With the help of this cumulative frequency graph, we can easily analyze Median of data, percentage of data and we can easily observe that which data is bigger and which data is smaller. So, this cumulative frequency graph is useful for observing data. Now we discuss whole method which is useful for creating cumulative frequency graph.

We use following steps for creating cumulative frequency graph-

Step 1: First of all, we analyze frequency table, means it is in ascending order or in descending order and if there is no order, then we order it in ascending way because with the help of ascending order, we can easily evaluate cumulative frequency graph.

Step 2: After a perfect order we can add frequency one by one for evaluating the cumulative total like we have frequencies,

A - 4

B - 5

C - 3

D – 2

Now we add one by means,

A – 4,

B – 4 + 5 = 9,

C – 9 + 3 = 12,

D – 12 + 2 = 14.

So, cumulative total for above frequencies is 14

Step 3 : After first two steps, we will plot the graph between value and cumulative frequencies where cumulative total makes an important role because each and every sum, where we evaluate cumulative total property is called as cumulative frequency for that value like cumulative frequencies for above frequencies is:

Value cumulative frequency,

A. 4

B. 9

C. 12

D. 14

Generally in cumulative frequency graph, we make value as a horizontal line and cumulative frequency as a vertical line.

Suppose we have following frequency table and we have to make cumulative frequency graph of given frequency table -

Marks frequency

21-30 11

41-50 15

11-20 2

31-40 19

71-80 13

91-100 40

81-90 6

51-60 42

61-70 31

Step 1: First of all, we analyze frequency table and it is not in any proper order. So, we make a proper.

Ascending order of given frequency table -

Marks frequency

11-20 2

21-30 11

31-40 19

41-50 15

51-60 42

61-70 31

71-80 13

81-90 6

91-100 40

Step 2: after a perfect order we can add frequency one by one for evaluating the cumulative total

Marks frequency cumulative total

11-20 2 2

21-30 11 2 + 11 = 13

31-40 19 13 + 19 = 32

41-50 15 32 + 15 = 47

51-60 42 47 + 42 = 89

61-70 31 89 + 31 = 120

71-80 14 120 + 14 = 134

81-90 6 134 + 6 = 140

91-100 40 140 + 40 = 180

So, cumulative total of above frequency table is 180.

Step 3: This each and every sum which we calculated in above step is called as cumulative frequency.

For each value -

Marks frequency cumulative total cumulative frequency

11-20 2 2 2

21-30 11 2 + 11 = 13 13

31-40 19 13 + 19 = 32 32

41-50 15 32 + 15 = 47 47

51-60 42 47 + 42 = 89 89

61-70 31 89 + 31 = 120 120

71-80 14 120 + 14 = 134 134

81-90 6 134 + 6 = 140 140

91-100 40 140 + 40 = 180 180

Now we make cumulative frequency plot graph between marks and cumulative frequency of marks, where we assume horizontal line as marks and vertical line as a cumulative frequency.

With the help of these cumulative frequency graphs we can easily evaluate median of this cumulative frequency table like in above cumulative frequency table -

Median of cumulative frequency is 180 / 2 = 90 because it is a Mean Point of this cumulative frequency table; so, marks which are related with this cumulative frequency are called as median of this cumulative frequency table.

We can easily analyze many mathematical questions from above cumulative frequency table, like how many students have scored more than 40 marks and how many students who have marks greater than 90 marks. The answer is -

Number of students, who are greater than 40 is

= [cumulative total] - [student, who have less than 40 marks]

= 180 – (2 + 11 + 19)

= 180 – 32

= 148

So, there are 148 students, who have greater than 40 marks.

Now we calculate number of students, who are greater have marks greater than 90.

In cumulative frequency table, there are 40 such students, which have more than 90 marks.

So, with the help of this cumulative frequency plot, we easily evaluate many mathematical problems like mean, median of data, observation which is related with data.

We can say that the experiment data is the Set of raw data arranged in the organized form. With a view to start analysis, we first require data which is further processed, so we say that the collection of data is basically needed for any statistical study.

We say that organizing experiment data is necessary to come to certain conclusions, so the collected data should always be correct otherwise it may lead to wrong conclusions and results. First raw data is collected, which is not processed and not at all organized or manipulated and it is a source data. Processing and re arrangement of the experiment data is very important.

Raw data is unprocessed and raw source data. This data is further converted in the form of output data which is the processed and the organized form of data. This form of data helps us to come to certain conclusions and outputs. We conclude that raw data processing is the basic requirement of most of the surveys and the experiments that are conducted.

So we come to the conclusion that statistical data Sets form the basis depending on which conclusions are drawn. This type of processed data helps us to analyze and come to certain conclusions and predictions.

Statistical data sets may record as much information as is required by any experiment. For instance, if we are studying the relationship between height and age relationship, then we can only collect these two parameters in raw data instead of all other information of the group. On the other hand, if we record the height with the date of birth, weight and other family background to check the diet style of a person, then the data so collected will be more descriptive .We conclude that certain things are common to all statistical data. i.e. the order of the data does not matter, which means the arrangement of the data within the data set is of less importance. Whatever we interpret from the collected data is more important as it is the output and the prediction of the data. We call it analysis. This is not possible to analyze any output without collection of the raw data. Raw data is collected, processed and then it helps us to predict and analyze. Thus we say that they help us in doing most of the research work which can be in the field of medicine, information technology or in social science.

Huge statistical data sets are already available for many areas.

We can use particular experimental data set for a number of researches. The census data, for example, contains comprehensive data about the demographics of a country, which can then be utilized by a number of social scientists to study family structures, incomes and their living cost in all is the raw data.

An experimental data set is therefore not an end in itself but sometimes it work as the starting Point for specific surveys. Data collection basically depends upon the person or the group of people who are basically collecting it for specific purpose.

We represent our data in different forms of graph which help us to predict and come to easy and fast conclusions. While giving the diagrammatic and pictorial presentation of the data we use different types of figures and charts. All these methods are non quantitative modes of presentation of the data but they are the perfect way to analyze. This is done by the visual presentation of the data which gives the proportional sizes of the figures. Following are the uses of presenting the pictorial form of data:

These presentations are elegant and attractive way to represent the figures.

Thus they give a good visual impact.

It helps us to make the readings of the large figures simple and easy.

It saves the time of comparison and analysis.

Here we are going to define histogram,

If we look at a histogram definition, it is a representation of the given frequency distribution with the help of rectangles, with a constant width which represent the class intervals and whose areas are proportion to the corresponding frequency. While drawing a histogram first X- axis is divided into intervals. We tabulate the data in these intervals, which help us to create the graph which shows the interval and not the numbers on the axis.

After marking ‘X’, and ‘Y’ axis, the points and bars are marked. We say that Histogram is a bar diagram. If the data is tabulated in form of frequency and intervals, it becomes easy to draw a histogram. First we mark the interval on the x- axis and then we mark the frequencies in the Y- axis. At each interval, mark the corresponding frequency and draw the bars for each class interval up to the marked frequency in the shape of the rectangles. The same is repeated for all the intervals and the bars are drawn. Now the vertically adjacent bars in the shape of the rectangles are drawn. All the class intervals are taken on the X- axis and all the frequencies are marked on the Y- axis. Finally the rectangles are marked, which are proportionate to their frequencies.

We must always remember, while drawing the Histogram that all the class intervals must be equal, and if we find the intervals are not equal, and then we try to make them equal. We come across two types of Histogram, 1. Histograms with equal class intervals and 2. Histograms with class intervals not equal. When we have equal class-intervals we take frequency on Y axis and the variable on X-axis and construct adjacent rectangles.

In such cases the height of the rectangles will be proportional to the frequency. In case when we have unequal class-intervals then a correction of unequal class-intervals must be made. The correction consists of finding for each class the frequency density or the relative frequency density. The frequency density is the frequency for that class divided by the width of that class. A histogram is constructed when these density values would have the same general appearance as the corresponding graphical display developed from equal class intervals.

For making adjustment we take the class which has the lowest class-interval and adjust the frequency of other classes in the following manner. If one class-interval is twice as wide as the one having lowest class interval, we divide the height of its Rectangle by two, if it is three times more we divide the height of its rectangle by three and so on. We often use histogram to plot density of data and it is often used for the purpose of density estimation. We always use histogram in order to analyze extremely large data Sets by just reducing the large number of data sets in the form of simple graph, which help to show primary, secondary and maximum peaks in the data as well as to give a visual representation of the histogram in the Statistics. Usually we are confused with the two terms Bar graphs and the histograms. We simply observe that in both Bar Graph and in Histogram bars are drawn on the X- axis and so they both appear to be same. When we read a histogram we observe that the columns are positioned over a label which is used to represent any quantitative variable. We find that the column values in the Histogram can be a single value or a range of values and the height of each column indicate the size of the group.

There exist a difference between the Bar Graph and the Histogram. We see that the bars of Bar graph are not connected where as the bars of a histogram are connected together. In Bar graph, different shades of colors can be used to represent different bars which correspond to different numerical values. But in case of histogram, the colors don't correspond to any of the numerical value.

Bar graph is an important part of Statistical graphs and according to Statistics; Bar graph is a pictorial representation of statistical data. Bar graph definition suggests that the independent variable in a Bar Graph can accommodate only few discrete values while the dependent variable may be continuous and discrete as well. The column graph or vertical bar graph is the most common form of bar graph.

There are many characteristics of bar graphs which help to present and understand statistical data in a better manner. Few of them are:

1> The bar graph representation make comparisons among different variables easy to see and understand.

2> The vertical bars on graph show trends in data, it clearly shows the affect of one variable on other: like if one variable falls or rise, and by using bar graph we can notice its affect on other variables.

3> By using value of one variable, we can identify the value of other easily.

The values of independent variables are plotted on vertical bar graph by plotting along horizontal axis from left to right. While values of independent variables are plotted on the horizontal bar by plotting along vertical axis from bottom up.