Like standard multiple regression, hierarchical multiple regression (also known as sequential multiple regression) allows you to predict a dependent variable based on multiple independent variables. However, the procedure that it uses to do this in SPSS Statistics, and the goals of hierarchical multiple regression, are different from standard multiple regression. In standard multiple regression, all the independent variables are entered into the regression equation at the same time. By contrast, hierarchical multiple regression enables you to enter the independent variables into the regression equation in an order of your choosing. This has a number of advantages, such as allowing you to: (a) control for the effects of covariates on your results; and (b) take into account the possible causal effects of independent variables when predicting a dependent variable. Nonetheless, all hierarchical multiple regressions answer the same statistical question: How much extra variation in the dependent variable can be explained by the addition of one or more independent variables?
For example, you could use hierarchical multiple regression to understand whether exam performance can be predicted based on revision time, lecture attendance and prior academic achievement. Here, your continuous dependent variable would be “exam performance”, whilst you would have three continuous independent variables – “revision time”, measured in hours, “lecture attendance”, measured as a percentage of classes attended, and “prior academic achievement”, measured using SAT scores. Let’s say we know that on average, students with higher SAT scores, which is an important university admission exam, also get higher marks in exams at university. If we didn’t take account of this prior difference in academic achievement between students, it could confound our results, such that we do not adequately account for the variance in exam performance explained by our other two independent variables: revision time and lecture attendance. As such, you could use hierarchical multiple regression to distinguish between the variation in exam performance that can be explained by revision time and lecture attendance, compared to prior academic performance by itself. You could also use this hierarchical regression model to predict exam performance based on different values of revision time, lecture attendance, and prior academic achievement.
In order to run a hierarchical multiple regression, there are eight assumptions that need to be considered. The first two assumptions relate to your choice of study design and the measurements you chose to make, whilst the other six assumptions relate to how your data fits the hierarchical multiple regression model. These assumptions are:
- Assumption #1: You have one dependent variable that is measured at the continuous level (i.e., the interval or ratio level). Examples of continuous variables include height (measured in centimeters), temperature (measured in °C), salary (measured in US dollars), revision time (measured in hours), intelligence (measured using IQ score), firm size (measured in terms of the number of employees), age (measured in years), reaction time (measured in milliseconds), grip strength (measured in kg), weight (measured in kg), power output (measured in watts), test performance (measured from 0 to 100), sales (measured in number of transactions per month), academic achievement (measured in terms of GMAT score), and so forth.
Note 1: You should note that SPSS Statistics refers to continuous variables as Scale variables.
Note 2: The dependent variable can also be referred to as the “outcome”, “target” or “criterion” variable. It does not matter which of these you use, but we will continue to use “dependent variable” for consistency.
- Assumption #2: You have two or more independent variables that are measured either at the continuous or nominal level.Examples of continuous variables are provided above. Examples of nominal variables include gender (e.g., two categories: male and male), ethnicity (e.g., three categories: Caucasian, African American, and Hispanic), and profession (e.g., five categories: surgeon, doctor, nurse, dentist, therapist).
Note 1: The “categories” of the independent variable are also referred to as “groups” or “levels”, but the term “levels” is usually reserved for the categories of an ordinal variable (e.g., an ordinal variable such as “fitness level”, which has three levels: “low”, “moderate” and “high”). However, these three terms – “categories”, “groups” and “levels” – can be used interchangeably. We refer to them as categories in this guide.
Note 2: An independent variable with only two categories is known as a dichotomous variable whereas an independent variable with three or more categories is referred to as a polytomous or multinomial variable.
Important: If one of your independent variables was measured at the ordinal level, it can still be entered in a hierarchical multiple regression, but it must be treated as either a continuous or nominal variable. It cannot be entered as an ordinal variable. Examples of ordinal variables include Likert items (e.g., a 7-point scale from strongly agree through to strongly disagree), physical activity level (e.g., 4 groups: sedentary, low, moderate and high), customer liking a product (ranging from “Not very much”, to “It is OK”, to “Yes, a lot”), and so forth.
- Assumption #3: You should have independence of observations (i.e., independence of residuals)The assumption of independence of observations in a multiple regression is designed to test for 1st-order autocorrelation, which means that adjacent observations (specifically, their errors) are correlated (i.e., not independent). This is largely a study design issue because the observations in a multiple regression must not be related or you would need to run a different statistical test such as time series methods. In SPSS Statistics, independence of observations can be checked using the Durbin-Watson statistic.
- Assumption #4: There needs to be a linear relationship between (a) the dependent variable and each of your independent variables, and (b) the dependent variable and the independent variables collectively. The assumption of linearity in a multiple regression needs to be tested in two parts (but in no particular order). You need to (a), establish if a linear relationship exists between the dependent and independent variables collectively, which can be achieved by plotting a scatterplot of the studentized residuals against the (unstandardized) predicted values. You also need to (b), establish if a linear relationship exists between the dependent variable and each of your independent variables, which can be achieved using partial regression plots between each independent variable and the dependent variable (although you can ignore any categorical independent variables; e.g., gender).
- Assumption #5: Your data needs to show homoscedasticity of residuals (equal error variances). The assumption of homoscedasticity is that the residuals are equal for all values of the predicted dependent variable (i.e., the variances along the line of best fit remain similar as you move along the line). To check for heteroscedasticity, you can use the plot you created to check linearity in the previous section, namely plotting the studentized residuals against the unstandardized predicted values. When you analyze your own data, you will need to plot the studentized residuals against the unstandardized predicted values.
- Assumption #6: Your data must not show multicollinearity. Multicollinearity occurs when you have two or more independent variables that are highly correlated with each other. This leads to problems with understanding which independent variable contributes to the variance explained in the dependent variable, as well as technical issues in calculating a multiple regression model.
You can use SPSS Statistics to detect for multicollinearity through an inspection of correlation coefficients and Tolerance/VIF values.
- Assumption #7: There should be no significant outliers, high leverage points or highly influential points. Outliers, leverage and influential points are different terms used to represent observations in your data set that are in some way unusual when you wish to perform a multiple regression analysis. These different classifications of unusual points reflect the different impact they have on the regression line. An observation can be classified as more than one type of unusual point. However, all these points can have a very negative effect on the regression equation that is used to predict the value of the dependent variable based on the independent variables. This can change the output that SPSS Statistics produces and reduce the predictive accuracy of your results as well as the statistical significance. Fortunately, when using SPSS Statistics to run multiple regression on your data, you can detect possible outliers, high leverage points and highly influential points.
- Assumption #8: You need to check that the residuals (errors) are approximately normally distributed. In order to be able to run inferential statistics (i.e., determine statistical significance), the errors in prediction – the residuals – need to be normally distributed. Two common methods you can use to check for the assumption of normality of the residuals are: (a) a histogram with superimposed normal curve and a P-P Plot; or (b) a Normal Q-Q Plot of the studentized residuals.
After running the hierarchical multiple regression procedure and testing that your data meet the assumptions of a hierarchical multiple regression, SPSS Statistics will have generated a number of tables that contain all the information you need to report the results of your hierarchical multiple regression. We also show how to write up this output as you work through the section.
The main objective of a hierarchical multiple regression is to determine the proportion of the variation in the dependent variable explained by the addition of new independent variables. However, it is also possible to use hierarchical multiple regression to predict dependent variable values based on new values of the independent variables, as well as determining how much the dependent variable changes for a one unit change in the independent variables. In this section, we focus on the main objective of determining the proportion of the variation in the dependent variable explained by the addition of new independent variables.
When interpreting and reporting your results from a hierarchical multiple regression, we suggest working through three stages: (a) evaluate the regression models that you are comparing; (b) determine whether the hierarchical multiple regression model is a good fit for the data; and (c) understand the coefficients of the regression model. To recap:
- First, it is worth evaluating the regression models you are comparing in your hierarchical multiple regression. Hierarchical multiple regression is effectively the comparison of multiple regression models. Therefore, the first major point to understand are the models in your analysis.
- Second, you need to determine whether the hierarchical multiple regression model is a good fit for the data, as well as evaluating the differences between the models and their statistical significance: There are a number of statistics you can use to determine whether the hierarchical multiple regression model is a good fit for the data (i.e., how well it explains the dependent variable). These are: (a) the percentage (or proportion) of variance explained; (b) the change in the R2 value from the previous model; and (c) the statistical significance of the full model.
- Third, you need to understand the coefficients of the regression model: These coefficients are useful in order to understand whether there is a linear relationship between the dependent variable and the independent variables.