MEASURES OF CENTRAL TENDENCY
I. Levels of Inquiry
a. Research studies are conducted to answer questions and to test
hypotheses. Research questions take on different forms depending on the level of
inquiry.
1) What is this?
2) What's happening here?
3) What will happen if?
4) How can I make .... happen?
b. Type of data collected and methods of statistical analysis must be
considered during the planning of the project.
II. Levels of Measurement
a. Measurement: assignment of numbers to characteristics according to some
rule. Types:
1) Nominal: the first and lowest scale in the hierarchy. Includes labeling
and categories.
2) Ordinal Scale: Members of a set (e.g., objects, people) are ordered from
most to least with respect to some characteristic. Examples-measures of
performance, attitude, personality.
3) Interval Scale: In addition to being able to rank or order individuals or
objects on some characteristic, the distance between the points on the scale is
known, the measure is at the interval level. Distance between any two adjacent
points on the scale is the same as the distance between any other two adjacent
points.
4) Ratio Scale: Most precise measurement scale-includes all the
characteristics of the lower three scales, and in addition, has a true zero.
III. Presenting Descriptive Data
a. Purpose of statistics is to reduce data to a manageable and
understandable form. Descriptive statistics used to describe characteristics of
the sample under study.
b. Frequency Distributions
1) List of scores from highest to lowest and tally how many people got each
score.
2) Class intervals: If more than 10 to 20 scores, combine scores into groups
for presentation in tables.
1. class intervals should be mutually exclusive and exhaustive
2. avoid open-ended intervals
3) Absolute Frequency (f): is the number of subjects who received a certain
score or whose score fell within a particular class interval. (Example: n=50, 20
people get scores between 70-79)
4) Relative Frequency: calculated by converting actual numbers to percents.
(Example from above: n=50, 20/50 x 100 = 40%)
c. Rounding Off
1) If last digit to be rounded off is less than 5, round to the lower
number, if higher than 5 round to the higher number.
Example: 4.423=4.42
4.426=4.43
2) If the number to be rounded is a 5, round to the nearest even number,
avoids systematic bias up or down.
Example: 4.425=4.42
4.435=4.44
3) Whole numbers, same rules prevail.
Example: 8.2=8 8.8=9
8.5=8 9.5=10
d. Graphic Methods
1) Histogram: most commonly used graphical method. Type of bar graph and
used with continuous rather than categorical data (interval and ratio data).
Factors to keep in mind:
1. Use "true" class intervals on the horizontal line.
2. If possible, have the class intervals equal.
3. With equal class intervals, all columns have the same width.
4. The vertical and horizontal axes should be about equal in length.
5. The graph should be clearly labeled with a title and labels for both
axes.
6. The graph should be clear, w/o reference to text
2) Frequency Polygon: also commonly used graphic method, used separate from
histogram. Useful if you want to place two or more distributions on the same
pair of axes for sake of making comparisons.
3) Cumulative Percentage Polygon: used less often than the frequency
polygon, but preferred when trying to indicate position of a given score in
relation to the distribution rather than the overall form of the distribution.
Vertical axis is composed of cumulative percentages rather than frequencies.
Horizontal axis is the same as in the histogram.
4) Bar Graph: Used with categorical data arranged from lowest to highest.
Bars may be either horizontal or vertical.
5) Pie Charts: Not used very often in research reports, generally used by
newspapers.
IV. Descriptive Statistics
a. Samples and Populations
1) Population: includes all members of a defined group.
2) Sample: subset of the target population.
3) Parameters: characteristics of populations (denoted by Greek letters).
4) Statistics: characteristics of samples (denoted by Roman letters).
b. Symbols
1) Greek letter sigma (∑) means "the sum of" Example: ∑X means sum of X when
all scores in the X distribution are added up
X Scores
X1 = 4
X2 = 4
X3 = 10
X4 = 5
X5 = 7
_______
∑X 30
Distinguish between
∑X
∑X2
(∑X)2
To calculate ∑X2, you must first
square each of the numbers in the X distribution. This results in 16, 16, 100,
and so on. Sum of all these squared numbers is ∑X2 .
To calculate (∑X)2 simply square the
∑X (∑X)2 = (30)2 = 900
c. Measures
of Central Tendency: Measures of central tendency are single points on the
measurement scale for a given variable. Many of the variables used in
behavioral sciences are distributed so that most scores fall in the middle, with
fewer scores falling on either side, in the "tails" of the distribution. There
are distributions however that do not assume such a "normal" distribution. Need
to know shape of the distribution and dispersion of the scores in order to
interpret the data correctly. Most common measures are mean, median, and mode.
These measures describe the "middle" of a group of scores.
1) Mean: Mean is the arithmetic average. Add up the scores and divide by
the number of scores in the distribution. Symbol is x bar which stands for the
mean of the sample, and, µ which stands for the mean of the population. The
formula for the sample mean is x bar = ∑X/n, where n = the number
in the sample. Characteristics of the mean include:
1. Extreme values can distort
2. The sum of the deviations of the scores in the distribution from the mean
always equals zero
X |
_
(X – X = 2) |
4 |
4 - 6 = ‑2 |
4 |
4 - 6 = ‑2 |
10 |
10 - 6 = 4 |
5 |
5 – 6 = ‑1 |
7 |
7 – 6 = 1 |
∑X = 30 |
∑X= 0 |
Mean = 6
3. The sum of the squares of the deviations around the mean is smaller than
the sum of squares around any other mean.
_
X |
X=(X‑X) |
X2 |
X‑median |
(X‑median)2 |
X‑mode |
(X‑mode)2 |
4 |
4‑6= ‑2 |
4 |
4‑5= ‑1 |
1 |
4 ‑ 4 = 0 |
0 |
4 |
4‑6= ‑2 |
4 |
4‑5= ‑1 |
1 |
4 ‑ 4 = 0 |
0 |
10 |
10‑6= 4 |
16 |
10‑5= 5 |
25 |
10 ‑ 4 = 6 |
36 |
5 |
5‑6= -1 |
1 |
5‑5= 0 |
0 |
5 ‑ 4 = 1 |
1 |
7 |
7‑6= 1 |
1 |
7‑5= 2 |
4 |
7 ‑ 4 = 3 |
9 |
30 |
0 |
26 |
5 |
31 |
10 |
46 |
Mean = 6
Median = 5
Mode = 4
1) Median: this is
the midpoint in a set of ranked scores. In other words, the median is the point
below which one half of the scores lie. It is the 50th percentile. To calculate
the median, you must first put the numbers in order from lowest to highest. If
the number of scores is odd, the median is the middle score. If the number of
scores is even, the median is halfway between the two middle scores.
Scores |
Scores in Rank Order |
Median |
2 7 6 3 |
2 3 6 7 |
(3 + 6)/2 = 4.5 |
4 1 3 5 7 |
1 3 4 5 7 |
4 |
8 7 9 3 |
3 7 8 9 |
(7 + 8)/2 = 7.5 |
6 2 5 3 1 |
1 2 3 5 6 |
3 |
2) Mode: least frequently used but is the only measure applicable to
categorical data. It is the most frequently occurring score in a distribution.
Usually it is located at the center of the distribution, but not always the
case. If there are two modes, the distribution is called bimodal.
Scores |
Mode |
5 1 7 9 3 5 |
5 |
1 3 8 7 7 3 8 7 |
7 |
1 4 5 4 6 5 8 7 |
4 and 5 |
1 3 8 2 9 5 |
No mode |
4) Comparison of Mean, Median, and Mode
1. Selection of a method for describing the central tendency of the data
depends part on the scale of measurement of the variable. If data are nominal,
only mode used. With ordinal, mode or median used, if ordinal data treated at
interval level, may use the mean when describing the center of the data. When
data are interval or ratio level of measurement, any of these measures may be
used.
2. Statistically, mean is more stable than median or mode, considered to be
the most sensitive.
3. Median is helpful with extreme scores or truncated data.
4. Mode used mostly for qualitative data.
V. Measures of Dispersion
a. Range
1) Simplest to calculate, used for range of scores.
2) To calculate, simply subtract lowest score from the highest. Example: If
we subtract Joan's score from Sally's (98-65=33), we find that the range for the
examination is 33.
3) In reporting results, might say that the scores ranged from 65 to 98,
more meaningful than saying range was 33.
4) Range can be used to compare variability among distributions. For
example, means for two exams were 80, however, one went from 50 to 100, the
other 70 to 90. Although, means were identical, range of one was 50, range of
other 20.
b. Interquartile Range (IR)
1) Scores from standardized examinations often reported in percentiles. The
percentile rank for a score received on a particular test indicates what percent
of the scores fall below that score. For example, a score of 78 may represent a
percentile of 94, and you would know that you had done better than 94 percent of
the individuals on whom the test was standardized. Percentiles let you know how
you stand among your peers.
2) 50th percentile is at the midpoint, also the median. Percentiles are
reported in terms of quartiles. The 100 percentile points are divided into four
"quarters." The 25th percentile is called the first quartile (Q1),
the 50th percentile is the second quartile (Q2), the 75th percentile
is the third quartile (Q3), and the 100th percentile is the fourth
quartile (Q4).
3) Simple range may be unstable because of extreme values. Interquartile
Range (IR) can be used to deal with this difficulty. The IR is defined as Q3 - Q1. This gives the range of scores from the 25th to the 75th
percentile, the middle 50% of the data.
c. Semi-interquartile range (Q)
1) Defined as (Q3 - Q1)/2, or the average amount by
which these two quartiles vary from the median (50th percentile).
2) To locate the 25th percentile, take the number of cases in a distribution
(n) and divide by 4 (or multiply by .25), and to locate the 75th percentile,
take 3/4 of n (or multiply n times .75).
3) IR and Q reported when range is not representative of the distribution
because of some extreme values.
d. Standard Deviation
1) Most commonly reported measure of variability. Usually, if mean is
reported as the measure of central tendency, the standard deviation is reported
as the measure of variability. Means and standard deviations are generally
reported together, whether in texts or tables. Standard deviation represents the
average amount by which the scores vary from the central score, the mean.
2) Reminder: X stands for deviation of a given score from the mean
and is calculated as x =x – x bar. ∑x2 stands for the
sum of the squared deviation. Small letter n stands for the number of subjects
in a sample, and uppercase N stands for the number of subjects in a population.
Greek letters are used for measures of the population (parameters), and Roman
letters for measures of the sample (statistic). Small case sigma, s, is
used for the parameter, and s is used for the statistic.
_____
s (standard deviation for population = √∑x2/N
______
s (standard deviation for population = √∑x2/n-1
1. Subtracting 1 from the number of subjects in the sample gives an
“unbiased” estimate of the standard deviation of the population.
2. Standard deviation is a measure of squared deviations around the mean. It
is a measure of the “least squares” around the mean.
3) Variance
1. Variance is the average of the squared deviations.
2. Formula is the one for standard deviation without the square root sign.
Population variance is denoted s 2 and the sample is s 2. This is because the
variance is the standard deviation squared. The formulas are:
∑X2 – (∑X)2
-----
∑x2
N
s 2 = ------ = -----------------
N N
∑X2 – (∑X)2
-----
∑x2
n
s 2 = ------ =
----------------
n - 1 n - 1
Standard Deviation Example
|
X |
X2 |
n=8 |
|
96 |
9216 |
Mean = 86 |
|
81 |
6561 |
Mode = 97 |
|
97 |
9409 |
Median = 85 |
|
97 |
9409 |
|
|
87 |
7569 |
|
|
70 |
4900 |
|
|
83 |
6889 |
|
|
77 |
5929 |
|
|
----------
∑X=688 |
-------------
∑X2=59882 |
|
|
|
|
|
|
|
|
SD for Sample |
_________
s = √ ∑x2 – (∑x)2
n
---------------
n
- 1 |
____________
= √ 59882 – (688)2
8
---------------
8 -
1 |
_______________
= √ 59882 – (473344)
8
-------------------
7 |
|
_______________
= √ 59882 –
59168
-------------------
7 |
______
= √102 |
= 10.10 |
|
|
|
|
SD for
Population |
_________
s = √ ∑x2 – (∑x)2
N
---------------
N |
____________
= √ 59882 – (688)2
8
---------------
8 |
_______________
= √ 59882 – (473344)
8
-------------------
8 |
|
_______________
= √ 59882 –
59168
-------------------
8 |
______
= √89.25 |
= 9.45 |
|