Model Selection Tests
Model Selection Tests
Statistical inference and forecasting, widely used in all of the sciences, are usually based on a statistical model. Yet, the specification of an appropriate statistical model is a difficult problem that has yet to be satisfactorily solved. Model building is as much an art as a science. All statistical models are necessarily false, and their usefulness is their ability to provide the best approximation to the “true” model. A summary of the complementary approaches used in this important problem of model specification is provided below.
The conceptual approach is based on the use of subject matter theory in the specification of a model. However, there may be many competing theories leading to many alternative models. The many different macroeconomic models are an example. In other situations, theory may provide little, if any, information on model specification. For example, dynamic modeling theory typically provides little information on the dynamic relations among variables.
Another source of model specification is the data. Previously unknown relations between variables can be suggested from the data. An example is given by the Phillips curve relation in macroeconomics.
The problem of model specification can also be addressed using statistical hypothesis testing. In this approach, the comparisons are generally between two competing models. If the two models are nested—that is, one model can be obtained as a special case of the other by specifying appropriate restrictions—standard hypothesis tests can be used to choose between the two models. Although easy to implement, this approach can be problematic. First, the significance level of the test is an arbitrary choice that can affect the conclusion. Also, using the conventional 5 percent level of significance, the null hypothesis has an advantage over the alternative hypothesis.
In situations where two models are nonnested, an artificial compound model can be formulated that includes both rival models as special cases, and then the above nested testing procedure can be applied. Common examples include the J -test of Russell Davidson and James MacKinnon (1981) and the D. R. Cox test (1961). Both the J and Cox tests can be generalized to situations where there are more than two models. However, these tests include the additional possibility that one accepts or rejects both models. In the hypothesis testing approaches, the order of testing is important and can and usually does affect the final outcome. Thus, two different researchers with exactly the same models and data can arrive at different conclusions based on different orders of testing and significance levels.
An alternative statistical approach to model specification is to construct a metric M, which measures the deviation of the data from the model. Model selection criteria have been devised to help choose the “best” model among a number of alternative models on the basis of the sample information. Both nested and nonnested models can be compared, and all models are treated symmetrically. Model simplicity and goodness of fit are both taken into account in choosing a “best” model. The principle of parsimony is an important requirement in modeling and forecasting. It is only worthwhile to adopt a more complex model if that model does a substantially better job of explaining the data than some simpler model.
The most commonly used selection criteria are the Akaike information criteria (AIC) introduced by Hirotugu Akaike (1969); the Schwarz information criteria, also known as the Bayesian information criteria (BIC), introduced by Gideon Schwarz (1978); and the final prediction error criteria (FPE), introduced by Akaike (1970). All three criteria, along with the conventional adjusted R-squared criteria (written as a function of the S2), can be written as;
where SSE = the sum of squared residuals in the sample, TSS = the total sum of squares of the dependent variable Y, T= the sample size, and K= the number of parameters in the model.
The goal is to choose the model that minimizes the chosen criteria. The difference among the criteria is in the penalty factor, which multiplies the common goodness-of-fit term SSE/T. This penalty factor for model complexity is a function of the number of parameters k and sample size T. These criteria can be used for all data types; however, the sample period T must be the same for all models considered. In realistic samples (T > 8), the ranking of these criteria is S2 < FPE < AIC < BIC. Thus, BIC penalizes complex models the most, while the adjusted R-squared statistic penalizes complex models the least. Unlike the other three criteria, the adjusted R-squared statistic is not based on the explicit consideration of a loss function and is a poor choice, leading to models that are too complex.
Other selection criteria are available based on similar theoretical arguments, and in theory one can go on inventing new criteria indefinitely using this approach. If all the criteria achieve a minimum for the same model, then we have a unique choice for the “best” model. In situations where the criteria choose different models, one must proceed with caution. Of the four criteria described above, only the BIC criterion is consistent. For the BIC criteria, the probability of choosing the best approximation to the true model approaches 1 as the sample size becomes infinitely large. The other criteria will tend to choose models that are too complex in large samples. However, small sample property considerations imply that the BIC criterion tends to choose models that are overly simplistic in small samples. Both Monte Carlo evidence and small sample considerations have shown that the AIC is a better choice than the SIC. The amount by which the model selection criterion differs across models has no meaning, thus any monotonic transformation can also be used. A common choice is the logarithmic transformation. Generalized versions of these selection criteria are also available to compare multivariate models.
An alternative approach is to split the data into an in-sample and out-of-sample period and evaluate and compare the out-of-sample forecasting performance of the different models. This approach provides an independent check on the specification of the model suggested by the in-sample period. These different approaches to model selection serve only as an aid. Recall that one is choosing the best model in the set of models considered; thus a better model may exist outside of those considered. Also, in some situations it may be advisable to carry more than one model forward.
SEE ALSO Distribution, Normal; Hypothesis and Hypothesis Testing; Instrumental Variables Regression; Loss Functions; Monte Carlo Experiments; Properties of Estimators (Asymptotic and Exact); Regression; Specification Error
Akaike, Hirotugu. 1969. Fitting Autoregressive Models for Prediction. Annals of the Institute of Statistical Mathematics 21: 243–247.
Akaike, Hirotugu. 1970. Statistical Predictor Identification. Annals of the Institute of Statistical Mathematics 22: 203–217.
Cox, D. R. 1961. Tests of Separate Families of Hypothesis. Vol. 1: Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press.
Davidson, Russell, and James G. MacKinnon. 1981. Several Tests for Model Specification in the Presence of Alternative Hypotheses. Econometrica 49: 781–793.
Kennedy, Peter. 2003. A Guide to Econometrics. 5th ed. Cambridge, MA: MIT Press.
Schwarz, Gideon. 1978. Estimating the Dimension of a Model. Annals of Statistics 6 (2): 461–464.