Log-Linear Models

views updated

Log-Linear Models

Within the realm of regression modeling, the term log-linear is used in two distinct ways. In the first case, it refers to nonlinear model specifications that—following a logarithmic transformation—become linear and can be estimated using tools available for the classical linear regression model. Most importantly, they can be estimated using ordinary least squares when assuming that the error terms are independent and normally distributed with mean 0 and equal variance σ₂. The multiplicative or double-log model and the exponential or semi-log model are two such nonlinear specifications. The multiplicative model takes on the form

or, in its log-transformed equivalent,

where ln yi is the dependent variable for observation i, ln x _ij is the j -th covariate or predictor variable,β₀,…β_m parameters to be estimated and ε_i are the error terms. The multiplicative model implies that a 1 percent change in x (not ln x) yields a β percent change in y. It is frequently used to model large-scale traffic flows and migration flows. A well-known application in economics is the Cobb-Douglas production function that links output to capital and labor in a double-log specification, and where the parameters are the associated elasticities. The exponential or semi-log model takes on the form

or, equivalently

The exponential specification is most well-known for modeling unlimited and rapid population growth over time, where the dependent variable y represents the population, and time is the only covariate. The parameter β associated with time is the population growth rate, and a one-unit increase in time is expected to change the population by 100β percent.

In the second case, the term log-linear model is used to refer to a particular class of generalized linear models (GLM), the so-called “Poisson models.” GLMs share a common mathematical structure in which the expected value of the dependent variable is functionally linked to a linear predictor. Unlike the multiplicative and exponential model, the Poisson regression model cannot be transformed into a classical linear regression model for which the link function is the identity function. Instead, the dependent variable is a count variable with Poisson-distributed non-negative integers as possible outcomes, and the link function connecting the expected value of Y with the linear predictor is the natural logarithm. For a Poisson-distributed random variable, the probability of observing k occurrences is

P(Y = k ) = (λ_k e_-k )/k !

where λ= E(Y ) is the expected number of occurrences. A Poisson model stipulates a systematic relationship between the expected value and the predictor variables such that

or, equivalently,

for observations i= 1, 2,…, n.

The parameters β_j are estimated iteratively from the maximum-likelihood expression, typically using the Newton-Raphson estimation procedure. Today, most statistical software programs include estimation routines for Poisson regression models. Several measures, including the most frequently used X ²-distributed log-likelihood ratio, provide an assessment of the goodness-of-fit of the estimated model to the observed data. The estimates of the parameters can be tested for significance using a t -distribution. A broad range of phenomena are suitable to be modeled in a Poisson regression setting. Characteristic of such phenomena is the comparatively low chance of occurrence per fixed unit of time (or space). This applies to, for example, counts of the number of automobile fatalities per month, the number of disease incidents per week, the number of species found in an ecosystem, and the frequency of severe weather events per year. The oldest and probably most cited application is the study by Ladislaus Josephovich von Bortkiewicz (1868–1931) in which he analyzed the probability of a soldier being killed through the kick of a horse. In his now-famous book on the laws of small numbers, von Bortkiewicz showed that rare events follow a Poisson distribution. The book appeared sixty-one years after the French mathematician Siméon Denis Poisson (1781–1840) had published his famous treaty on the limiting distribution—which is now rightfully named the “Poisson distribution”—of the binomial distribution when the probability of occurrence is small and the number of trials is large.

Whereas the origins of regression models go back to the German mathematician and physicist Johann Carl Friedrich Gauss (1777–1855) and the French mathematician Adrien-Marie Legendre (1752–1833), extensions from the classical model to GLMs were made primarily over the last forty years. Credit goes to the seminal 1972 article by the statisticians John Ashworth Nelder and Robert William MacLagan Wedderburn. Nelder’s 1983 book—coauthored with Peter McCullagh—still serves as the authoritative reference on GLMs. Recent refinements of Poisson models allow for relaxing the equidispersion assumption (i.e., for a Poisson-distributed random variable the expected value equals the variance). Advances also include mixed Poisson models in which, for instance, normally distributed random effects are added, or an abundance of zero-outcomes is accounted for in a so-called zero-inflated Poisson model. Although these advances make the log-linear regression model more flexible and suitable for a wider range of phenomena, their applicability needs to be thoroughly scrutinized, especially in situations where the linearity may be inappropriate, where the dispersion changes over time (or space), or where the observations are not independent.

SEE ALSO Linear Regression; Regression

BIBLIOGRAPHY

Bortkiewicz, Ladislaus Josephovich von. 1898. Das Gesetz der kleinen Zahlen [The Law of Small Numbers]. Leipzig, Germany: Teubner.

McCullagh, Peter, and John A. Nelder. 1983. Generalized Linear Models. London: Chapman and Hall.

Nelder, John A., and Robert William MacLagan Wedderburn. 1972. Generalized Linear Models. Journal of the Royal Statistical Society Series A 135 (3): 370–384.

Poisson, Siméon Denis. 1837. Recherches sur la probabilité des jugements en matière criminelle et matière civile. Paris: Bachelier, Imprimeur Libraire.