Statistics refers to a scientific and systematic methods of collecting, recording, summarizing, analyzing and representation of numerical data in precise manner.
Or
The study of methods of collecting, recording, summarizing, analyzing and presentation of data in precise manner by using numbers
Or
A science of observing, collecting, recording, summarizing, analyzing and presentation of data in precise manner by using numbers.
Numerical data understood as a body of information which given in numbers. Or Exact numerical facts or figures collected systematically and arranged for a certain purpose.
NATURE OF DATA
Statistical data according to their varied nature
Statistical data according to their varied nature include the following:-
Discrete data
It is a form of statistical data for variables whose values expressed or given in whole numbers. i.e. The data is for cases which do not exist in fractions.
For instance; the data for the number of people which can be given as 102 people who can not be divided into either decimal or fractions
Continuous data
The data for the variables whose values can be expressed in fraction or decimals. In this type of data, any value within the range can be given.
For instance; the data for temperature, rainfall, pressure, distance, growth rate, and other cases which also reflect the same. They are presented in continuity manner of fraction or decimals
Individual data
The set of data which provides specific value to every item in a sample given. For instance; Juma has weight of 47 kg. They consider every item as an important entity and singly presented
Grouped data
It is a form of data which gives values in range or classes. This type of data is of no precise as exact figures are quoted but values range in groups.
The classic example of the grouped data is that of population distribution by age and sex which may appear as follow:-
AGE | FEMALES | MALES |
0-9 | 14,897 | 14,567 |
10-19 | 15,432 | 14,329 |
20 – 29 | 17,987 | 13,098 |
30 – 39 | 16,876 | 17,654 |
Statistical data according to scale of measurements
This aspect is considerably on how the values of statistical data are given.
The scale of measurement include the following.
Nominal data
The type of data according to scale of measurement of which the values are given according to the name of items in a given sample. e.g. 10 apples, 5 oranges, 7 mangoes, 5 banana and 2 cherish.
Ordinal data
The data of which the values are given in an order of magnitude of observation in such a way the numbers indicate the rank order among objects. i.e. the values are commonly given in either ascending or descending order e.g. 91, 82, 79, 74, 68, 67, 58, 54 and 49.
The interval data
The data of which values are given in range at regular distance by being grouped. e.g. The data for population distribution by age and sex expressed in interval scale.
Ratio data
The data of which the values given show the number of times items of has relatively to another e.g. 1:3, 2:5, 3:7. e.t.c.
VARIABLES
Variable is an attribute that has values of which fluctuate under a given condition . For instance; production is a considerable variable as whose values change under conditions of policies lie; climate, technology, marketability and other which may make the same.
Variables are considerably varied and are classified into dependent and independent variables.
Dependent variable
Dependent variable is the one whose values fluctuate due to the force of another variable. i.e. the variable whose values change irregularly as controlled by another variable. For instance; production is one among the most pronounced variables as changes due to the force of other variables like climate, level of technology applied, demand of the products produced, and others which might cause it to change.
CLASSIFICATION OF STATISTICS
Statistics being the scientific and systematic methods dealing with numerical facts is broadly categorized into two depending on how data handled. The main broad categories include; descriptive and inferential statistics.
Descriptive Statistics
Descriptive statistics deal with recording, summarization, analyzing and presentation of numerical facts that have been actually collected. The actual collection of data can be like to population by conducting census.
Inferential statistics
Inferential statistics deal with recording, summarization, analyzing and presentation of numerical facts that have been handled by quantifying the uncertainties through prediction e.g. the likely harvest output in the next year or season.
STATISTICAL DATA
As already pointed out, statistical data are understood as the exact numerical facts or figures collected systematically and arranged for a certain purpose or body of information which is usually treated in numerical values.
Statistical data assessed being extremely varied and thus recognized be of different types. The categories of statistical data recognized with regards to their derived sources, varied nature and scale of measurements.
Statistical data according to their varied sources
Data by sources classified into two and include primary and secondary data.
Primary data
These are the numerical facts collected from the field or handled for the first time. i.e. They are the first hand or original information. The data are not available in the existing sources like books. Primary statistical data are handled by the techniques of interview, the use of questionnaires, observation, counting, measurements and other methods.
Secondary data
These are the numerical facts derived from the stored sources. The data were compiled by other people who carried out research. The sources of this type of data include; text books, reference books, magazines, maps, video tapes, audio tapes, and other sources which deliver the same.
Independent variable
Independent variable is the one whose values change on its own without being influenced by another variable. i.e. the variable whose values change steadily and regularly e.g. distance.
SOURCES OF STATISTICAL DATA
The sources of statistical data are simply the techniques employed to gather the numerical facts. These are broadly two and include; the numerical facts. These are broadly two and include; primary and secondary sources.
Some of the primary techniques (sources) providing statistical data include the following:-
- Interview method
- Questionnaire
- Scheduling
- Field observation method
- Literature review
Interview method
The technique of interview involves the collection of data through the asking of questions verbally by researcher to a respondent.
Or
Is a verbal interaction between an interviewer and interviewee designed to list the information, news, opinion and feelings they have on their own. Generally an interview is an oral organization of questions asked to respondents by a researcher.
Questionnaire method
Questionnaire is a set of research questions printed on a piece of paper then presented to respondents to replay the questions in writing. It is thus; questionnaire method is a way (means) of gathering statistical details done with the use of questionnaires given to the respondents to answer.
Field observation method
It is a method of gathering primary research data which done by a researcher looking over the phenomena. It is of two types and include; participant and non participant observation.
Scheduling method
This method of data collection is very much familiar to questionnaire. But it has little difference to questionnaire. The difference is that, schedule involves a prepared set of questions which are filled in by enumerators who are especially appointed for the purpose and of which carefully selected and trained enough to perform their job well. This method of data collection is very useful for carrying out population census. The secondary sources providing statistical data include
Literature review method
It is a systematic survey of the past documentary sources prepared by other researchers related to the study. The documentary sources include; text books, statistical obstruct census report, research articles, journals, news paper, and official reports.
Other methods for data collection include; measurements, counting and the carrying out of experiments.
Strengths of statistics application in Geography
Application of statistics in geography offers the following vital significance
Summarizes massive information by making more simple and thus, enable the geographers to handle large sets of data.
Statistics facilitate the process of data computation techniques possible in geography
Statistics make easy the process of data comparison. It is so; as it is impossible to make comparison without statistics of the variables to be compared.
Statistics application facilities the process of drawing relationship between the geographical variables like; climate and production, population and time; rainfall and temperature etc.
Application of statistics makes easy the process of data storage inform of numbers, tables, graphs, diagrams, and maps.
Application of statistics makes the geographical data be clearly understood and easy for being analyzed and interpreted.
Statistics enhance validity testing of the geographical models, theories, and concepts to the real world situations.
STATISTICAL MEASURES
Numerical values which make statistics are analyzed or examined to judge their implication (results) by taking into consideration of the statistical measures.
It is thus; statistical measures refer to the computed numerical values used to make data analysis as related to other values in a data set provided.
Statistical measures are numerous but with regards to their nature and roles, broadly divided into the following categories.
- Measures of central tendency
- Measures of variability
MEASURES OF CENTRAL TENDENCY
These are the measurements which show the central values and include; arithmetic mean, mode and median.
ARITHMETIC MEAN
Arithmetic mean is an average of all values in a set of distribution. It is determined by adding up all values and divided by the sum of observation added. Arithmetic mean is used to assess the distribution value weather was high or low.
Computation of the arithmetic mean
Computation of the arithmetic mean depends up on the nature of data given whether ungrouped or grouped.
For the ungrouped data set; arithmetic mean is computed by applying the following formula
Where by:
N = The total number of observation added.
Example:
Find the arithmetic mean for the following set of data.
5,7,10,12,13,14,15,7, and 2.
Solution
The arithmetic mean for the given set of data above is calculated as follow:
5+7+10+12+13+14+7+2=85
N = 9
Thus: The Arithmetic mean = 9.4
For the grouped data set; the arithmetic mean is calculated by the following application:
Where by;
X = Class mark
f = Frequency
Example:
Find the arithmetic mean for the following s cores of marks
Class Interval | F | X | fx |
91-95 | 93 | ||
86-90 | 1 | 88 | 88 |
81-85 | 6 | 83 | 498 |
76-80 | 10 | 78 | 780 |
71-75 | 15 | 73 | 1095 |
66-70 | 34 | 68 | 2312 |
61-65 | 22 | 63 | 1386 |
56-60 | 10 | 58 | 580 |
51-55 | 2 | 53 | 106 |
Solution:-
According to the given data;
= 6845
= 100
Thus; the arithmetic mean = 68.45
Advantages of the Arithmetic mean
It is easy to calculate and the majority of people use to understand it
It is used to check the values if high or low
It can be used for further calculation. For instance; arithmetic mean is used to calculate standard deviation.
Disadvantage of the arithmetic mean
Arithmetic mean has a big weakness of being pulled towards an outlier (extreme scores).
It needs high mathematical knowledge to calculate arithmetic mean for the grouped data set.
MODE
Mode is a value number which occurs most frequently in a data set given
Or
Is the most commonly attained measurement value in a data set
Or
Is the measurement value that appears most in a particular variable among a sample of subjects.
Mode helps us to know concentration of values which can stimulate scientific investigation.
Calculation of a mode
Determination of a mode is depend much up on the nature of data set whether ungrouped or grouped.
For the ungrouped data set; mode is obtained by taking the number that appears most frequently or the one that has highest frequency than the rest
Example;
Determine the mode for the following data set.
2, 4, 2, 2, 5, 6, 4
Value | Concentration |
2 | 3 |
4 | 2 |
5 | 1 |
6 | 1 |
Thus; the mode for the data set given = 2
Note
Sometimes; a given data set may have more than one modes or no more at all. The one mode obtained in a set of distribution is known as unimodal or monomodal. If two modes obtained from data set; described as bimodal.
Example:
(1) 2, 5, 4, 3, 5, 6, 6, 8, 5, 6.
The modes for the data set are 5 and 6
(2) 4, 9, 8, 5, 6, 7
The given data set has no mode.
For the grouped data; mode is assessed by the following application.
Whereby:
· L = The lower limit of the modal class
· t1 = The excess of the modal frequency over the frequency of the next lower class
· t2 = the excess of the modal frequency over the frequency of the next higher class
· (i) = the class interval
Example;-
The tabled data below shows the score of marks in geography subject test form V students
Class interval | Frequency |
40 – 44 | 7 |
45 – 49 | 8 |
50 – 54 | 11 |
55 – 59 | 10 |
60 – 64 | 4 |
Solution
The mode for the given data set above is calculated as follow:-
According to the given data set;
L = 49.5
t1 = 3
t2 = 1
i = 5
Then;
49.5 + (0.75 x 5)
49.5 + 3.75 = 53.25
Thus; the mode = 53.25
Advantages of a mode
It helps to make determination of predominance of a certain geographical feature in a place.
It helps to know number of occurrence of the values in data set.
Disadvantages of a mode
It needs high mathematical knowledge to calculate mode for the grouped data set
It is unreliable measures of central tendency as a data set may have more than one modes or no mode at all.
MEDIAN
Median refers to a point value that divides the other values in a set of distribution into two equal parts after to have been arranged in ascending or descending order.
Computation of the median
The computation of the median chiefly depends on the nature of data set given if ungrouped or grouped.
For the ungrouped data set, the calculation of median should further take into account the nature of data set given whether odd or even.
If the ungrouped data set is odd; the median is just the middle value and it is obtained after the value numbers to have been arranged in ascending or descending order.
E.g.
1, 2, 1, 4, 6, 5, 3
Solution
The ascending order of the values is as follow:-
1, 1, 2, 3, 4, 5, 6
Thus; the median = 3.
If the data set is even; median is the average of the two middle values and obtained after the value numbers to have been arranged in ascending descending order.
E.g.
1,4,5,2,7,8,3,2
The ascending order for the values is as follows:-
1,2,2,3,4,5,7,8
Thus; the median = 3.5
Median determination for the grouped data
For the grouped data; median is determined by applying the following formula:-
Where by:-
L = The lower limit of the median class
N = Total number of observation
nb = the number of elements in the classes below the median class
nw = number of elements in the median class
i = class interval
Example:-
The tabled data below: shows the score of marks in geography subject for form V students.
Class interval | Frequency |
40 – 44 | 7 |
45 – 49 | 8 |
50 – 54 | 11 |
55 – 59 | 10 |
60 – 64 | 4 |
Example:-
The tabled data below; shows the score of marks in geography subject for form V students.
According to the given data
L = 49.5
N = 40
nb = 15
nw = 11
i = 5
nb = the number of elements in the classes below the median class
nw = number of elements in the median class
i = class interval
49.5 + (0.45 x 5)
49.5 + 2.25 = 51.75
Thus the median = 51.75
Advantages of median
It helps to understand the middle value among of the numerous values in a certain data set.
It is easy to make determination particularly for the simple data set.
Disadvantages of the median
If the values are numerous, it becomes cumbersome to arrange in ascending or descending order to get the median
It needs high skill to determine median for the grouped data set.
MEASURES OF VARIABILITY
These are the ones which asses the variation of values in data set. The common measures of variability include the following:-
- Range
- Standard deviation
- Variance
- Mean deviation
RANGE
Range is the difference between highest and lowest values in a given set of distribution. It is used to assess the existing variation between the highest score and lowest score.
Calculation of the range
Calculation of a range also considers the nature of a data set given whether ungrouped or grouped.
For the ungrouped data set, range is calculated by subtracting the lowest value from the highest value in a data set given.
Example:-
Determine the range for the following data set 4, 2, 3,5, 6,4, 8
Solution
The range for the data set given is computed as following:-
Range = Highest value – lowest value
According to the given data set:-
· Highest value = 8
· Lowest value = 2
· 8 – 2 = 6
· Thus; The range = 6
With the result of range; If it is high implies greater variation. If the range is small, it implies there is small variation.
For the grouped data; range is calculated by subtracting the lowest class mark from the highest subtracting the lowest lower boundary from the highest lower boundary or by subtracting the lowest higher boundary from the highest higher boundary.
Example:-
Determine the range for the following data set.
Class interval |
10 – 1415 – 19
20 – 24 25 – 29 30 – 34 35 – 39 |
Solution
The range for the data set given is calculated as follow:
Range = Highest class mark – Lowest class
Determination of the class mark
Class interval | Class marks |
10 – 1415 – 19
20 – 24 25 – 29 30 – 34 35 – 39 |
1217
22 27 32 37 |
According to the computed class marks
· Highest class mark = 37
· Lowest class mark = 12
37 – 12 = 25,
Thus, the range = 25
Advantages of a range
Range gives a quick rough estimate of variability
It is simple to calculate and the majority are much aware with it.
Disadvantages of a range
It considers only two values of highest and lowest and thus not sensitive to the total distribution
It is affected by the extreme values
STANDARD DEVIATION
Deviation is the difference between the value and the mean. It is computed by subtracting a the mean from the value.
Whereby:-
X = value given in a set of distribution
X = average of all values
Standard deviation
refers to the common difference of all values from the mean. It is the root mean square deviation from the mean. It is the measure which determines how far or scattered are the values from the mean.
Standard deviation is represented by sigma symbol of
Computation of a standard deviation
Calculation of a standard deviation also depends on the nature of dataset given whether ungrouped or grouped.
For the ungrouped data; standard deviation is calculated by the following application.
Where by:-
X = value in a set of distribution
N = The total number of observation
Example:-
Calculate the standard deviation for the following data set.
3, 2, 1, 4, 6
Solution
Mean determination
X | 3 | 2 | 1 | 4 | 6 |
X- | -0.2 | -1.2 | -2.2 | 0.8 | 2.8 |
X-X2 | 0.0.4 | 1.44 | 4.84 | 0.64 | 7.84 |
·
Then;
Hence; The SD = 1.541
For the grouped data set; standard deviation is computed by the following application:-
Example:-
Calculate the SD for the following set of grouped data.
Class interval | Frequency |
40 – 44 | 7 |
45 – 49 | 8 |
50 – 54 | 11 |
55 – 59 | 10 |
60 – 64 | 4 |
Procedure:
· Determination of the mean
Class interval | F | X | Fx |
40 – 44 | 7 | 42 | 294 |
45 – 49 | 8 | 47 | 376 |
50 – 54 | 11 | 52 | 572 |
55 – 59 | 10 | 57 | 570 |
60 – 64 | 4 | 62 | 248 |
Hence; 51.5
Then:-
X | 42 | 47 | 52 | 57 | 62 |
X – X | -9.5 | -4.5 | 0.5 | 5.5 | 10.5 |
(X-X)2 | 90.25 | 20.25 | 0.25 | 30.25 | 110.25 |
F(X – X)2 | 631.75 | 162 | 2.75 | 302.5 | 441 |
= 1540
= 40
Thus; The SD = 6.204
Note:-
The square root of SD is known as variance. Its computation is done by the following applications which also consider the nature of data set whether ungrouped or grouped.
For the ungrouped data; variance is computed by the following application:-
MEAN DEVIATION
Mean deviation is the average of all deviation values. Or is the amount by which the individual values deviate from mean irrespective of its sign. It is computed by dividing the sum of all deviations irrespective of signs by the number of observation.
Calculation of mean deviation
Calculation of a mean deviation also depends on the nature of data set given whether ungrouped or grouped.
For the ungrouped data set; the mean deviation is calculated by the following application:-
Example:-
Determine the mean deviation for the following data set. 4, 7, 8, 2, 9, 6
Solution
Mean determination
4 + 7 + 8 +2 + 9 + 6 = 36
Hence; the mean = 6
Deviations determination
X | X – | D |
4 | 4 – 6 | 2 |
7 | 7 – 6 | 1 |
8 | 8 – 6 | 2 |
2 | 2 – 6 | 4 |
9 | 9 – 6 | 3 |
6 | 6 – 6 |
The sum of deviations determination.
· 2 + 1 + 2 +4 + 3 + 0 = 12
Then;
Thus; the mean deviation = 2
For the grouped data set, mean deviation is computed by the following application:-
Example:-
Class interval | Frequency |
40 – 44 | 7 |
45 – 49 | 8 |
50 – 54 | 11 |
55 – 59 | 10 |
60 – 64 | 4 |
Determination of the mean
Class interval | F | X | Fx |
40 – 44 | 7 | 42 | 294 |
45 – 49 | 8 | 47 | 376 |
50 – 54 | 11 | 52 | 572 |
55 – 59 | 10 | 57 | 570 |
60 – 64 | 4 | 62 | 248 |
Hence; The mean = 51.5
Determination of the deviations.
Where by:
X = Class mark
X | X – | D | F | Fd |
42 | 42 – 51.5 | 9.5 | 7 | 66.5 |
47 | 47 – 51.5 | 4.5 | 8 | 36 |
52 | 52 – 51.5 | 0.5 | 8 | 36 |
57 | 57 – 51.5 | 5.5 | 10 | 55 |
62 | 62 – 51.5 | 10.5 | 4 | 42 |
The sum of (fd) determination
66.5 + 36 + 5.5 + 55 + 42 = 205
Then;
Thus; The mean deviation = 5.125
...
No comments:
Post a Comment