## Semiparametric Estimation

## Semiparametric Estimation

# Semiparametric Estimation

Econometrics and other statistical sciences deal with the estimation of various functions (models) such as conditional density function, regression function (conditional mean), heteroskedasticity function (conditional variance), and auto-covariance function (conditional covariance). For empirical research, economic theory suggests the types of variables that can be used in these models under consideration, but often it does not provide the functional form of these models. In view of this, empirical and theoretical work in econometrics is usually done by assuming linear or nonlinear parametric functional forms of these models (see Hartley 1961 and Gallant 1987). However, these parametric models may be false and may therefore provide biased and misleading estimates and inferences. In view of this, econometrics moved in the direction of databased local modeling for studying these econometric models of unknown functional forms. This approach is also called the “nonparametric (NP) method” or “NP smoothing.” There are various NP methods, including spline methods, series methods, differencing methods, and neural-network methods, but the NP kernel smoothing method has become particularly popular because of its vast applicability, its simplicity, and its well-developed theoretical results. NP kernel methods involve local averaging; for example, a consistent estimate of the regression model is obtained by locally averaging those values of the dependent variable that are “close” in terms of the values taken on by the regressors, and these are determined by a “window width.” The NP kernel estimation procedure was developed in the seminal works of Murray Rosenblatt (1956), Elisbar Nadaraya (1964), and Geoffrey Watson (1964). (See also Fan and Gijbels 1996 and Pagan and Ullah 1999 for a detailed development on this subject.)

A major complication in a purely NP method is the “curse of dimensionality.” The cost associated with using the NP method is the need for very large data, especially when the number of variables in the model is large, if an efficient measurement of the model is to be made. This problem leads to the idea that one might try to restrict some variables, say in a regression model, to have a linear parametric impact while allowing others to have an unknown functional form. As an example, in the female wage model the wage is considered as being linearly affected by the females’ personal characteristics, but the variable, the number of years of job experience, can be of an unknown functional form. Effectively, estimation involves a combination of parametric and NP methods, leading to the estimators being described as semiparametric (SP). In general, SP models contain both parametric and NP models, and over the years, a large class of SP models have appeared in the econometrics literature (see Robinson 1988; Linton 1995; and Pagan and Ullah 1999).

Suppose one wanted to estimate the function *g* in the regression function

where *y _{i}* is the dependent variable,

*x*is a vector of

_{i}*q*regressors, and

*u*

_{i}is an additive error. A parametric approach fits the data to a parametric model

*g*(

*x*) =

_{i}*g*(

*x*,

_{i}*θ*) often a linear model with

*g*(

*x*,

_{i}*θ*) =

*α*+

*x*, where

_{i}β*θ*is a parameter set of the model. The least squares estimator θ is obtained by minimizing

over θ, and this estimator is consistent and asymptotically normally distributed (see Gallant 1987). In the NP estimation method, first the regression function *g* ( x_{i} ) is considered as a local polynomial regression, say linear, as *Y _{i}* =

*α*(

*x*) + (

*x*–

_{i}*x*)

*β*(

*x*) +

*u*for

_{i}*x*in

_{i}*x*±

*h*/2. Then the NP local linear regression estimation method leads to the following weighted squared loss minimization

where *K* (·), a nonnegative weight (kernel) function, is a decreasing function of distances of *xi* from the point *x,* and *h* is a window width that determines how rapidly the weights decrease as the distance of *xi* from *x* increases (see Pagan and Ullah 1999).

The SP estimation method deals with the SP models where one component is parametric and the other is NP. A popular SP model is *g* (*x _{î}θ* ) =

*x*+

_{i1β}*g*(

*x*

_{i2}), where

*x*

_{i 1}and

*x*

_{i 2}are

*q*

_{1}and

*q*

_{2}(

*q = q*

_{1}

*q*

_{2}) sets of variables and the model is linear in

*x*

_{i 1}but the functional form in

*x*

_{i 2}is unknown. The SP estimation of β involves the parametric least squares estimation of βin the regression of on , where and are generated by first estimating the conditional means appearing in them by the NP method. Another SP model arises where

*g*(

*x*,

_{i}*θ*) is taken to be a known parametric model and where the distribution function of the error

*u*is unknown and not assumed to be normal. In this case, the SP estimation of

*θ*is done by writing the likelihood function of the model under the density specified by its NP kernel estimator. The efficiency properties of these estimators have been extensively studied (see Robinson 1988; Bickel, Klaassen, Ritov, and Wellner 1992; and Pagan and Ullah 1999). Other SP models’ estimation includes the situations where

*g*(

*x*,

_{i}*θ*) =

*x*, but

_{i}β*V*(

*u*ǀ

_{i}*x*) = σ

_{i}^{2}(

*x*) or the serial correlation in

_{i}*u*is of unknown form. In addition, there is an extensive class of applied SP models with limited dependent variables (see Linton 1995; Horowitz and Lee 2002; and Pagan and Ullah 1999).

_{i}Extensive work on the empirical applications of SP models has begun to appear in both cross-section and time-series econometrics, especially in labor econometrics and financial econometrics. Although some related work is being done, several challenging research issues remain. The first is the development of a unified approach toward a data-driven window width and the development of user-friendly software. Others include the systematic development of work on SP estimation of panel-data models, especially when the time-series component is nonstationary, and the development of the theory of SP estimation of models with both continuous and discrete variables (see Racine and Li 2004).

The SP estimation method is a fast-growing area of research in econometrics and statistics. With advances in computer technology the applications of the SP approach are rapidly increasing. In a wide sense, the frontier of this research area has moved on, and it is expected to continue growing in both theory and applications.

**SEE ALSO** *Nonparametric Estimation; Parameters*

## BIBLIOGRAPHY

Bickel, Peter J., Chris A. J. Klaassen, Ya’acov Ritov, and Jon A. Wellner. 1992. *Efficient and Adaptive Estimation for Semiparametric Models*. Baltimore, MD: Johns Hopkins Press.

Fan, Jianqing, and Irene Gijbels. 1996. *Local Polynomial Modeling and Its Applications*. London: Chapman Hall.

Gallant, A. Ronald. 1987. *Nonlinear Statistical Models*. New York: Wiley.

Hartley, Herman O. 1961. The Modified Gaus-Newton Method for the Fitting of Nonlinear Regression Functions by Least Squares. *Technometrics* 3: 269–280.

Horowitz, Joel L., and Sokbae Lee. 2002. Semiparametric Methods in Applied Econometrics: Do the Models Fit the Data? *Statistical Modeling* 2: 3–22.

Linton, Oliver B. 1995. Estimation in Semiparametric Models: A Review. In *Advances in Econometrics and Quantitative Economics: Essays in Honor of C. R. Rao*, ed. Gangadharrao. S. Maddala, Peter C. B. Phillips, and Thirukodikaval N. Srinivasan. Oxford: Blackwell.

Nadaraya, Elisbar. 1964. On Estimating Regression. *Theory of Probability and Its Applications* 9: 141–142.

Pagan, Adrian, and Aman Ullah. 1999. *Nonparametric Econometrics*. New York: Cambridge University Press.

Racine, Jeffrey S., and Qi Li. 2004. Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data. *Journal of Econometrics* 119: 99–130.

Robinson, Peter M. 1988. Semiparametric Econometrics: A Survey. *Journal of Applied Econometrics* 3: 35–51.

Rosenblatt, Murray. 1956. Remarks on Some Nonparametric Estimates of a Density Function. *Annals of Mathematical Statistics* 27: 642–669.

Watson, Geoffrey S. 1964. Smooth Regression Analysis. *Sankhya* 26 Series A: 359–372.

*Aman Ullah*