Censoring, Left and Right

views updated

Censoring, Left and Right

Censoring occurs when values of a variable within a certain range are unobserved, but it is known that the variable falls within this range. This differs from truncation, where values of a variable within a certain range are unobserved and it is unknown when the variable falls within this range. Both phenomena represent a loss of information, but the loss is less with censoring than with truncation. The two are sometimes confused in the literature; some examples are given in Léopold Simar and Paul Wilson (2007). George Maddala (1983) and Takeshi Amemiya (1984) list a number of empirical applications where censoring occurs.

Consider a sample of n draws Y _i, i = 1 …, n from a distribution function F(y ) = P (Y≤ y ). If the sample is left censored at c ₁, then the values Y_i are not observed; instead, values are observed, where otherwise. For the cases where , all that is known about the underlying corresponding values Y_i is that they are less than or equal to c₁. Alternatively, if the sample is right-censored at c ₂, then values Y_i^* are observed, where if Y_i < c₂, and otherwise. In this scenario, for the cases where all that is kinown about the Y_i is that they are greater than or equal to c₂. Samples can also be both left- and right-censored.

In models of duration, right-censoring often occurs, but left-censoring can also occur. For example, if agents are observed in some state (e.g., unemployment, in the case of individuals, or solvency, in the case of firms) until either they are observed to exit the state or until the period of observation ends, then some agents may still be in the given state at the end of the observation window. Observations on these agents will be right-censored. Similarly, at the beginning of the study, some (perhaps all) agents are observed to be already in the state of interest; for any agents whose time of entry into the state is unknown, their duration in the given state is left-censored (and perhaps also right-censored).

To illustrate censoring in a regression context, suppose

where E(ε_i) = 0. If Y_i is censored, then one must estimate the model

after replacing Y_i in (1) with , which necessarily results in a new error term in (2). Unless the censoring occurs in the extreme tails of the distribution of Y, ordinary least squares (OLS) estimation of the coefficients in (2) will yield biased and inconsistent estimates since OLS does not account for the censoring.

Censored regression models are typically estimated by the maximum likelihood method. If the errors in model (1) are assumed normally distributed with mean 0 and variance σ², then in the case of left-censoring at c₁ the likelihood function is given by

where ψ and Φ denote the standard normal density and distribution functions, respectively. This model was first proposed by James Tobin (1958), and is sometimes called the tobit model. The first product in (3) gives, for each observed value Y_i^* equal to c ₁, the probability of obtaining a draw Y from F(y ) less than c₁.

The models presented above potentially suffer from several problems. Heteroskedasticity in the error terms can lead to inconsistent estimation. D. Petersen and Donald Waldman (1981) proposed modifications of the tobit-type models involving specification of particular models for the error variances. John Cragg (1971) proposed a generalized version of the tobit model that allows the probability of censoring to be independent of the regression model for the uncensored data. Perhaps the most vexing problem is the requirement of a distributional assumption for the errors in (2). It is straightforward to assume distributions other than the normal distribution and then work out the resulting likelihood functions, but rather more difficult to avoid such assumptions altogether by using semi- or nonparametric methods. Adrian Pagan and Aman Ullah (1999) discuss several proposals, but these involve significant increases in computational burden or data requirements.

SEE ALSO Censoring, Sample; Heckman Selection Correction Procedure; Heteroskedasticity; Logistic Regression; Probabilistic Regression; Properties of Estimators (Asymptotic and Exact)

BIBLIOGRAPHY

Amemiya, Takeshi. 1984. Tobit Models: A Survey. Journal of Econometrics 24: 3–61.

Cragg, John G. 1971. Some Statistical Models for Limited Dependent Variables with Application to the Demand for Durable Goods. Econometrica 39 (5): 829–844.

Maddala, George S. 1983. Limited-Dependent and Qualitative Variables in Econometrics. Cambridge, U.K.: Cambridge University Press.

Pagan, Adrian, and Aman Ullah. 1999. Nonparametric Econometrics. Cambridge, U.K.: Cambridge University Press.

Petersen, D., and Donald Waldman. 1981. The Treatment of Heteroskedasticity in the Limited Dependent Variable Model. Unpublished working paper. Department of Economics. Chapel Hill: University of North Carolina.

Simar, Léopold, and Paul W. Wilson. 2007. Estimation and Inference in Two-stage, Semi-parametric Models of Productive Efficiency. Journal of Econometrics 136 (1): 31–64.

Tobin, James. 1958. Estimation of Relationships for Limited Dependent Variables. Econometrica 26 (1): 24–36.

Paul W. Wilson

International Encyclopedia of the Social Sciences