MULTIPLE
CORRELATION & REGRESSION
I. Multiple
Correlations (Overview)
a. The
relationship is measured between one variable and a combination of other
variables. In r, talking about one independent variable (X), and
one dependent variable (Y). In multiple correlation (R), talking
about more than one independent variable (X1, X2, X3 and so on) and one dependent variable (Y).
II. Regression
a. Introduction
1) Regression
is a technique that makes use of the correlation between variables and the
notion of a straight line to develop a prediction equation. Once a
relationship has been established between two variables, it is possible to
develop an equation that allows us to predict the score of one of the
variables, given the score of the other.
2) In multiple
correlation, regression is used to establish a prediction equation
(independent variables are each assigned a weight based on their relationship
to the dependent variable).
3) Regression
may be used in relation-searching and association-testing.
III. Simple
Linear Regression
a. Simple
Regression: A correlation between two variables used to develop a prediction
equation. Based on a linear relationship.
1) The higher
the correlation, the more accurate the prediction.
2) To be able
to make predictions, the relationship between two variables, the independent (X)
and the dependent (Y) must be measured. If there is a correlation, a
regression equation can be developed that will allow prediction of Y,
given X.
3) “Regression”
means literally a falling back toward the mean. Each prediction "regresses"
back toward the mean, depending on the strength of the correlation.
4) Prediction
Equation
Y' = a + bX
1. Y'
is the predicted score. Given data on X and Y from a sample of
subjects called the regression sample, a and b can be calculated. With those
two measures, Y can be predicted given X.
2. The letter a is called the intercept constant and is the value
of Y when X=0. It is the point at which the regression line
intercepts the Y axis.
3. The letter b is called
the regression coefficient and is the rate of change in Y with
a unit change in X. It is a measure of the slope of the
regression line.
4. The regression line is the "line of best fit" and is formed by a
technique called the method of least squares. Because the mean is the center
of the data, the sum of the deviations of the scores around the mean ∑ (x-M),
adds up to 0. Also, if you square those deviations and add them, that number
will be smaller than the sum of the squared deviations around any other method
of central tendency. In the same way, the regression line passes through the
exact center of the scatter diagram; thus, it is the “line of best fit”.
Regression line represents the predicted scores (Y’s), but since
prediction is not perfect, actual scores (Ys) would deviate somewhat
from predicted scores. Because regression line passes through the center of
the pairs of scores, if you add up the deviations from the regression line (Y
- Y’), they would equal 0.
IV. Multiple Regression: This is possible when there is a measurable
multiple correlation between a group of predictor variables and one dependent
variable. The prediction equation is:
Y' = a + b1X1 + b2X2 + b3X3 + ....bkXk