Maximum Likelihood Regression
Maximum Likelihood Regression
Maximum likelihood is a methodology used to estimate the parameters of an econometric or statistical model. It was first proposed by Ronald Aylmer Fisher (1890–1962) and is now considered the workhorse of modern econometrics, not only because of its flexibility but also due to the availability of computer power, which has permitted the resolution of complicated numerical problems associated with this technique. Maximum likelihood estimation seeks to determine the parameters of a statistical process that have the highest probability of generating the observed sample of data.
Consider the following regression model: Y_{i} = β_{0} + β_{1}X _{1i } + ……β_{k}X_{ki} + ε_{i} for i = 1,2,.….n. In the simplest case, one can assume that the error term ε_{i} is an independent and identically distributed (iid) normal random variable with variance σ ^{2}; that is, ε_{i } → N (0, σ ^{2}). It will be shown below that the assumption of a particular density function for ε_{i} is paramount to write the likelihood function. Under the assumption of normality, the probability density function of the error term is written as
Since ε_{i} = Y _{i} – β_{0} – β_{1}X _{1i } –...... – β_{k}X_{ki}, the assumption of normality of the error term is equivalent to the assumption of normality for the conditional probability density function of Y given X: f(Y_{i} ǀ X _{1i }, X _{2i },.....X_{ki} ; β _{0},β _{1},.....β_{k},σ^{2}) → N (β _{0} + β _{1}X_{1i } +....β_{k}X_{ki},σ^{2} )—that is,
The objective is to estimate the parameter vector θ ≡ (β_{0}, β_{1}β_{2},.….β_{k}σ^{2})’. For a sample of size n, and because of the iid assumption, the joint probability density function of ε_{1} ε_{2},.… ε_{n}, is then the product of the marginal densities: f (ε_{1} ε_{2},.…ε_{n}; θ ) = f (ε_{1};θ)f (ε_{2};θ).….. f (ε_{n}; θ). The likelihood function L(θ ε_{1},ε_{2},.…. ε _{n}) is based on the joint probability density function of ε_{1}, ε_{2}, ….ε_{n}, where ε_{1},ε_{2},.…. ε_{n} is taken as fixed data and the parameter vector θ is the argument of the function: L(θ ε_{1}, ε_{2} ,.…. ε_{n}) = f (ε_{1};θ)f (ε_{2}θ)……f (ε_{n};θ). Note that ε_{1},ε_{2},.….ε_{n} is a function of the data (Y_{i},X_{1i}X_{2i},….X_{ki} through the regression model; that is to say, ε _{i} = Y_{i} – β _{0} – β_{0}  β_{1}X_{1i} – β_{2}X_{2} β_{2}X_{2i} –.…. – β_{i}X_{ki}
Though the aforementioned regression model deals with a crosssectional data set, the maximum likelihood principle also applies to timeseries data and panel data. For instance, a timeseries regression such as Y _{t} = ФY_{t1} + ε_{t} for t = 1,2.…T, with iid ε_{t}→N(0,σ^{2} ), implies that the conditional density function of Y_{n} given Y_{t1} is also normal, f(Y_{t}ǀY_{t1} ) → N(ФY_{t1},σ^{2}), that is,
As before, the object of interest is to estimate θ ≡ (Φ, σ ^{2})’ and the likelihood function for a sample of size T is L(θ;Y_{1}, Y_{2} .… Y_{T}) = f(Y_{1}; θ)f(Y_{2} ǀ Y_{1};θ)…….f(Y_{T}ǀ Y_{T1} ;θ). This function requires the knowledge of the marginaldensity of the first observation f (Y_{1};θ). When the samplesize is very large, the contribution of the first observationis almost negligible for any practical purposes. Conditioning on the first observation (Y _{1} is known) wedefine the conditional likelihood function as L(θ;Y_{T}, Y_{T1} .…Y_{2} ǀ Y_{1}) = f(Y_{2} ǀ Y_{1}; θ)…….f(Y_{T}ǀ Y_{T1};θ).
Mathematically it is more convenient to work with the logarithm of the likelihood function. The loglikelihood function is defined as:
The maximum likelihood estimator (MLE) of θ is thevalue of θ that maximizes the likelihood of observing the sample (Y_{i},X_{1i},X_{2i.} …. X_{ki }) i= 1… n. In estimating a regression model using maximum likelihood, the question is which value of θ ≡ (β_{0}, β_{1}, β_{2},.… β_{k}, σ^{2})″, out of all the possible values, makes the possibility of occurrence of the observed data the largest. Since the log transformation is monotonic, the value of θ that maximizes the likelihood function is the same as the one that maximizes the loglikelihood function. The statistical inference problem is reduced to a mathematical problem, which under the assumption of normality, looks like:
Equating the firstorder conditions (the score vector of firstorder partial derivatives) to zero results in a system of k + 2 equations with k + 2 unknowns. The solution to this system is the maximum likelihood estimator. The solution is the maximum of the function if the Hessian (the matrix of secondorder partial derivatives) is negative semidefinite.
For a linear regression model, the MLE is very easy to compute. First, we compute the solution for the system of equations corresponding to the parameter vector (β_{0}, β_{1},.… β_{k})’. This system is linear, and its solution is identical to the ordinary least squares (OLS) estimator that is, β̂_{mle} = β̂_{ols} = (XX)^{1}XY. Second, the maximum likelihood estimator of the variance is straightforward to compute once the MLE β̂_{i} s are obtained. The MLE σ^{2} corresponds to the sample variance of the residuals which is a biased estimator of the population variance. The β̂_{mle} is identical to the β̂_{ols} when the likelihood function is constructed under the assumption of normality.
For a nonlinear regression model, the system of equations is usually nonlinear, and to obtain the MLE solution numerical optimization methods are needed. In addition, there is the possibility of heteroscedasticity. This is the case when the variance of the error term is not constant but it depends on a set of variables, for instance, σ^{2} (X,γ). In this instance, there is an additional set of parameters γ to estimate. The system of equations will be nonlinear and the solution will again be obtained by numerical optimization. In nonlinear models, the system of equations may have several solutions, for the likelihood function may exhibit a complicated profile with several local maxima. In this case, the researcher needs to make sure that the global maximum has been achieved by either plotting the profile of the likelihood function (when possible), or by using different initial values to start the iterative procedures within the optimization routine.
The significance of the maximum likelihood estimator derives from its optimal asymptotic properties. In large samples and under correct model specification, the MLE θ_{mle} is in most cases consistent, asymptotically efficient, and asymptotically normal. Among these properties, maximum efficiency (estimators with the smallest variance) is the most significant property of the ML estimator. Maximum efficiency is a very desirable property because it allows the construction of more powerful tests and smaller confidence intervals than those based in less efficient estimators.
Within the maximum likelihood framework and after obtaining the ML estimators, we can also perform hypothesis testing by comparing the value of the estimates with other fixed values. The likelihood ratio test assesses the likelihood that the data may have been generated by a different set of parameter values (those under the null hypothesis). The test compares the value of the likelihood function under the null hypothesis with that under the alternative. The test is computed as two times the difference of the loglikelihood functions and it is asymptotically distributed as a chisquare with as many degrees of freedom as the number of parameters under the null.
The maximum likelihood estimator is also known as a full information estimator because the estimation is based on the most comprehensive characterization of a random variable, which is the specification of its probability density function. In practice, it is not known what density function should be assumed, and a potential shortcoming of the maximum likelihood estimator is that one could write a likelihood function under a false probability density function. The consequences of this are severe, for the asymptotic properties will not hold anymore. However, one can still apply quasimaximum likelihood estimation (QMLE). The QML estimator requires that the conditional mean and conditional variance are correctly specified. It assumes that the error is normally distributed, though this may be a false assumption. The quasimaximum likelihood estimator is still consistent, though the maximum efficiency property is lost. The efficiency loss depends on how far the normal density is from the true density. In practice, estimation is done within the quasimaximum likelihood framework, because a full knowledge of the density is rare. Other estimators can recover some of the efficiency loss but, these belong to the family of nonparametric and semiparametric estimators.
SEE ALSO Linear Regression; Models and Modeling; Probability Distributions; Properties of Estimators (Asymptotic and Exact); Regression; Regression Analysis
BIBLIOGRAPHY
GonzálezRivera, Gloria, and Feike C. Drost. 1999. Efficiency Comparisons of MaximumLikelihoodBased Estimators in GARCH Models. Journal of Econometrics 93 (1): 93–111.
White, Halbert L. 1994. Estimation, Inference, and Specification Analysis. Cambridge, UK: Cambridge University Press.
Gloria GonzálezRivera
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"Maximum Likelihood Regression." International Encyclopedia of the Social Sciences. . Encyclopedia.com. 23 Jan. 2019 <https://www.encyclopedia.com>.
"Maximum Likelihood Regression." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (January 23, 2019). https://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/maximumlikelihoodregression
"Maximum Likelihood Regression." International Encyclopedia of the Social Sciences. . Retrieved January 23, 2019 from Encyclopedia.com: https://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/maximumlikelihoodregression
Citation styles
Encyclopedia.com gives you the ability to cite reference entries and articles according to common styles from the Modern Language Association (MLA), The Chicago Manual of Style, and the American Psychological Association (APA).
Within the “Cite this article” tool, pick a style to see how all available information looks when formatted according to that style. Then, copy and paste the text into your bibliography or works cited list.
Because each style has its own formatting nuances that evolve over time and not all information is available for every reference entry or article, Encyclopedia.com cannot guarantee each citation it generates. Therefore, it’s best to use Encyclopedia.com citations as a starting point before checking the style against your school or publication’s requirements and the mostrecent information available at these sites:
Modern Language Association
The Chicago Manual of Style
http://www.chicagomanualofstyle.org/tools_citationguide.html
American Psychological Association
Notes:
 Most online reference entries and articles do not have page numbers. Therefore, that information is unavailable for most Encyclopedia.com content. However, the date of retrieval is often important. Refer to each style’s convention regarding the best way to format page numbers and retrieval dates.
 In addition to the MLA, Chicago, and APA styles, your school, university, publication, or institution may have its own requirements for citations. Therefore, be sure to refer to those guidelines when editing your bibliography or works cited list.