The graph shows the difference between the Cp and number of parameters in the model the model number. As you can see we have all of the key indicators of model fit, e.g., AIC, Mallow's Cp, Adj R-square and so on.īelow the table you also have a graph showing the difference between Mallow's Cp and the number of parameters in the model/subset: First is a big table with all of the subsets arranged by AIC (Akaike Information Criterion). If you click OK, the output looks like this. I had to change "PSOC.Sat" to "PSOCSat", because it did not like the dot. Note, the variables should be named with no more than 8 characters, with no strange symbols. Selecting this option opens a simple window with space for an dependent variable and a set of independent variables. Once installed, you can find the routine under the Analyze menu as seen here:
(There are many cool extensions here, check them out)
So, you can go ahead and install this by ticking the Get Extension box and clicking OK.
This extension will run without the R or Python plug-ins. The Extension Hub opens up and you can search for subsets. You can do this by going to the Extension Menu and select the Extension Hub: To use this extension you will need to install it first. Here I am using the Regression Best Subsets extension available in SPSS. Automated processing is therefore rather useful. SO, in the example below, 7 predictors gives 127 different models from which to choose! Thus, the number of subsets grows exponentially as you add predictor variables. Of course, the number of possible subsets is n^2 -1. In the old days, I would test all-subsets by running all combinations of the independent variables and examining the model R-square and Mallow's Cp and so on (see Kleinbaum et al., 2008) for a description of interpreting model fit and identifying the best subset). For now, it is important to proceed with caution if you chose this approach. Best can mean maximizing R-square (variance explained), for example, or other indicators of model fit. When you look at the results below, you can probably see some of this in action.īasically we try to find the 'best' combination of variables based on all of the variables in our data to predict some outcome, our dependent variable. I don't want to discuss this in too much detail right now, but at a later date I will jot down some notes including describing suppression effects. All subsets regression will also tend to capitalize on chance. The problem is that these methods capitalize on chance and variable selection can be rather arbitrary. Numerous warnings have been written around step-wise regression, and other automated forms of regression (e.g., Thompson, 2006). Variable selection in regression and other forms of modelling is an interesting topic I will discuss another day.