Loss Functions
Loss Functions
LOSS FUNCTIONS AND REGRESSION FUNCTIONS
LOSS FUNCTIONS FOR TRANSFORMATIONS
LOSS FUNCTIONS FOR FORECASTING FINANCIAL RETURNS
LOSS FUNCTIONS FOR ESTIMATION AND EVALUATION
LOSS FUNCTION FOR BINARY FORECAST AND MAXIMUM SCORE
LOSS FUNCTIONS FOR PROBABILITY FORECASTS
LOSS FUNCTION FOR INTERVAL FORECASTS
LOSS FUNCTION FOR DENSITY FORECASTS
LOSS FUNCTIONS FOR VOLATILITY FORECASTS
LOSS FUNCTIONS FOR TESTING GRANGERCAUSALITY
The loss function (or cost function ) is a crucial ingredient in all optimizing problems, such as statistical decision theory, policymaking, estimation, forecasting, learning, classification, financial investment, and so on. The discussion here will be limited to the use of loss functions in econometrics, particularly in time series forecasting.
When a forecast f_{t, h} of a variable Y_{t + h} is made at time t for h periods ahead, the loss (or cost) will arise if a forecast turns out to be different from the actual value. The loss function of the forecast error e_{t + h} = Y_{t + h} – f_{t, h} is denoted as c (Y_{t + h} f_{t, h} ). The loss function can depend on the time of prediction, and so it can be c_{t + h} (Y_{t + h}, f_{t, h} ). If the loss function does not change with time and does not depend on the value of the variable Y_{t + h}, the loss can be written simply as a function of the error only, c_{t + h} (Y,_{t + h}, f_{t, h} ) = c (e _{t + h} )
Clive Granger (1999) discusses the following required properties for a loss function: (1) c (0) = 0 (no error and no loss); (2) min_{e} c (e) = 0, so that c (e) ≥ 0; and (3) c(e) is monotonically nondecreasing as e moves away from zero so that c (e_{1} ) ≥ c (e_{2} ) if e_{1} > e_{2} > 0 and if e_{1} < e_{2} < 0.
When c _{1}(e ), c _{2}(e ) are both loss functions, Granger (1999) shows that further examples of loss functions can be generated: c (e ) = ac _{1}(e ) + bc _{2}(e ), a ≥ 0, b ≥ 0 will be a loss function; c (e ) = c _{1}(e )^{a} c _{2}(e )^{b}, a > 0, b > 0 will be a loss function; and c (e ) = 1(e > 0)c _{1}(e ) + 1(e < 0)c _{2}(e ) will be a loss function. If h (·) is a positive monotonic nondecreasing function with h (0) finite, then c (e ) = h (c _{1}(e )) – h (0) is a loss function.
LOSS FUNCTIONS AND RISK
Granger (2002) notes that an expected loss (a risk measure) of financial return Y _{t + 1} that has a conditional predictive distribution F _{t}(y ) ≡ Pr (Y _{1} ≤ y ǀI _{t}) with X_{t} ∊ I_{t} may be written as
with A , A both > 0 and some θ > 0. Considering the symmetric case A1 = A2, one has a class of volatility measures V_{θ} = Ą[ǀy – f ǀM^{θ} ], which includes the variance with θ = 2, and mean absolute deviation with θ = 1.
Zhuanxin Ding, Clive Granger, and Robert Engle (1993) study the time series and distributional properties of these measures empirically and show that the absolute deviations are found to have some particular properties, such as the longest memory. Granger remarks that given that the financial returns are known to come from a longtail distribution, θ = 1 may be more preferable.
Another problem raised by Granger is how to choose optimal L_{p} norm in empirical works, to minimize Ą[ǀε_{t} ǀ^{p} ]for some p to estimate the regression model Y_{t} = X_{t}β + ε_{t} As the asymptotic covariance matrix of β̂ depends on p, the most appropriate value of p can be chosen to minimize the covariance matrix. In particular, Granger (2002) refers to a trio of papers (Nyquist 1983; Money et al. 1982; and Harter 1977) that find that the optimal p = 1 from Laplace and Cauchy distribution, p = 2 for Gaussian, and p = ∞ (min/max estimator) for a rectangular distribution. Granger (2002) also notes that in terms of the kurtosis κ, H. L. Harter (1977) suggests using p = 1 for K > 3.8; p = 2 for 2.2 ≤ K ≤ 3.8; and p = 3 for κ < 2.2. In finance, the kurtosis of returns can be thought of as being well over 4, so p = 1 is preferred.
We consider some variant loss functions with θ = 1, 2 below.
LOSS FUNCTIONS AND REGRESSION FUNCTIONS
Optimal forecasting of a time series model depends extensively on the specification of the loss function. Symmetric quadratic loss function is the most prevalent in applications due to its simplicity. The optimal forecast under quadratic loss is simply the conditional mean, but an asymmetric loss function implies a more complicated forecast that depends on the distribution of the forecast error as well as the loss function itself (Granger 1999), as the expected loss function is formulated with the expectation taken with respect to the conditional distribution. Specification of the loss function defines the model under consideration.
Consider a stochastic process Z_{t} ≡ (Y_{t}, X′_{t} )′, where Y_{t} is the variable of interest and X_{t} is a vector of other variables. Suppose there are T + 1 (≡ R + P ) observations. We use the observations available at time t,R ≤ t < T + 1, to generate P forecasts using each model. For each time t in the prediction period, we use either a rolling sample {Z _{t  R + 1}, …, Z_{t} } of size R or the whole past sample {Z_{1},…., Z_{t}} to estimate model parameters β̂_{t}. We can then generate a sequence of onestepahead forecasts
Suppose that there is a decision maker who takes a onestep point forecast f_{t}1 ≡ f (Z_{t}, β̂_{t} ) of Y _{t + 1}and uses it in some relevant decision. The onestep forecast error e _{t + 1} ≡ Y _{t + 1} – f _{t, 1} will result in a cost of c (e _{t + 1}), where the function c (e ) will increase as e increases in size, but not necessarily symmetrically or continuously. The optimal forecast f* _{t, 1} will be chosen to produce the forecast errors that minimize the expected loss
where F _{t}(y ) ≡ Pr (Y_{t + 1} ≤ y ǀI_{t} ) is the conditional distribution function, with I_{t} being some proper information set at time t that includes Z _{tj}, j ≥ 0. The corresponding optimal forecast error will be
Then the optimal forecast would satisfy
When we interchange the operations of differentiation and integration,
the generalized forecast error,
forms the condition of forecast optimality:
H _{0}: Ą(g _{t + 1}1ǀI_{t} ) = 0 a.s.,
that is, a martingale difference (MD) property of the generalized forecast error. This forms the optimality condition of the forecasts and gives an appropriate regression function corresponding to the specified loss function c (·).
To see this, consider the following two examples. First, when the loss function is the squared error loss
the generalized forecast error will be and thus which implies that the optimal forecast
is the conditional mean. Next, when the loss is the check function, c (e ) = [α – 1(e < 0)] · e ≡ p α(e _{t + 1}), the optimal forecast f _{t, 1}, for given α ∊ (0, 1), minimizing
can be shown to satisfy
Hence, is the generalized fore cast error. Therefore,
and the optimal forecast is the conditional αquantile.
LOSS FUNCTIONS FOR TRANSFORMATIONS
Granger (1999) notes that it is implausible to use the same loss function for forecasting Y_{t + h} and for forecasting h _{t + 1} = h (Y _{t + h}) where h (·) is some function, such as the log or the square, if one is interested in forecasting volatility. Suppose the loss functions c _{1}(·), c _{2}(·) are used for forecasting Y _{t + h} and for forecasting h (Y_{t + h}), respectively. Let e _{t + 1} ≡ Y _{t + 1} – f _{t, 1} will result in a cost of c _{1}(e _{t + 1}), for which the optimal forecast f *_{t, 1} will be chosen from min where F _{t}(y ) ≡ Pr (Y _{t + 1} ≤ y ǀI _{t}). Let ∊_{t + 1} ≡ h _{t + 1} – h _{t, 1} will result in a cost of c _{2}(ε_{t + 1}), for which the optimal forecast h*_{t, 1} will be chosen from min where H _{t}(h ) ≡ Pr (h _{t + 1} ≤ h ǀI _{t}). Then the optimal forecasts for Y and h would respectively satisfy
It is easy to see that the optimality condition for does not imply the optimality condition for in general. Under some strong conditions on the functional forms of the transformation h (·) and of the two loss functions c _{1}(·), c _{2}(·), the above two conditions may coincide. Granger (1999) remarks that it would be strange behavior to use the same loss function for Y and h (Y ). This awaits further analysis in future research.
LOSS FUNCTIONS FOR ASYMMETRY
The most prevalent loss function for the evaluation of a forecast is the symmetric quadratic function. Negative and positive forecast errors of the same magnitude have the same loss. This functional form is assumed because mathematically it is very tractable, but from an economic point of view, it is not very realistic. For a given information set and under a quadratic loss, the optimal forecast is the conditional mean of the variable under study. The choice of the loss function is fundamental to the construction of an optimal forecast. For asymmetric loss functions, the optimal forecast can be more complicated as it will depend not only on the choice of the loss function but also on the characteristics of the probability density function of the forecast error (Granger 1999).
As Granger (1999) notes, the overwhelming majority of forecast work uses the cost function c (e ) = ae^{2}, a > 0, largely for mathematical convenience. Asymmetric loss function is often relevant. A few examples from Granger (1999) follow. The cost of arriving ten minutes early at the airport is quite different from arriving ten minutes late. The cost of having a computer that is 10 percent too small for a task is different from the computer being 10 percent too big. The loss of booking a lecture room that has ten seats too many for your class is different from that of a room that has ten seats too few. In dam construction, an underestimate of the peak water level is usually much more serious than an overestimate (Zellner 1986).
There are some commonly used asymmetric loss functions. The check loss function c (y, f ) ≡ [α – 1 (y < f )] · (y – f ), or c (e ) ≡ [α – 1 (e < 0)] · e, makes the optimal predictor f the conditional quantile. The check loss function is also known as the tick function or lillin loss. The asymmetric quadratic loss c (e ) ≡ [α – 1(e < 0)] · e ^{2} can also be considered. A value of α = 0.5 gives the symmetric squared error loss.
A particularly interesting asymmetric loss is the linex function of Hal Varian (1975), which takes the form
c _{1}(e, α) = exp (αe_{t+1} 1) – αe _{t + 1} – 1,
where α is a scalar that controls the aversion toward either positive (α > 0) or negative (α < 0) forecast errors. The linex function is differentiable. If α > 0, the linex is exponential for e > 0 and linear for e < 0. If α < 0, the linex is exponential for e < 0 and linear for e > 0. To make the linex more flexible, it can be modified to the double linex loss function by
which is exponential for all values of e (Granger 1999). When α = β, it becomes the symmetric double linex loss function.
LOSS FUNCTIONS FOR FORECASTING FINANCIAL RETURNS
Some simple examples of the loss function for evaluating the point forecasts of financial returns are the outofsample mean of the following loss functions studied in Yongmiao Hong and TaeHwy Lee (2003): the squared error loss c (y, f ) = (y – f )^{2}; absolute error loss c (y, f ) = ǀy – f ǀ; trading return c (y, f ) = –sign(f ) · y (when y is a financial asset return); and the correct direction c (y, ŷ ) = –sign(f ) · sign(y ), where sign(x ) = 1(x > 0) – 1(x < 0) and 1(·) takes the value of 1 if the statement in the parentheses is true and 0 otherwise. The negative signs in the latter two are to make them the loss to minimize (rather than to maximize). The outofsample mean of these loss functions are the mean squared forecast errors (MSFE), mean absolute forecast errors (MAFE), mean forecast trading returns (MFTR), and mean correct forecast directions (MCFD):
These loss functions may further incorporate issues such as interest differentials, transaction costs, and market depth. Because the investors are ultimately trying to maximize profits rather than minimize forecast errors, MSFE and MAFE may not be the most appropriate evaluation criteria. Granger (1999) emphasizes the importance of model evaluation using economic measures such as MFTR rather than statistical criteria such as MSFE and MAFE. Note that MFTR for the buyandhold trading strategy with sign (f _{t, 1}) = 1 is the unconditional mean return of an asset because in probability as P — ∞ where µ = Ą(Y _{t}). MCFD is closely associated with an economic measure as it relates to market timing. Mutual fund managers, for example, can adjust investment portfolios in a timely manner if they can predict the directions of changes, thus earning a return higher than the market average.
LOSS FUNCTIONS FOR ESTIMATION AND EVALUATION
When the forecast is based on an econometric model, to the construction of the forecast, a model needs to be estimated. Inconsistent choices of loss functions in estimation and forecasting are often observed. We may choose a symmetric quadratic objective function to estimate the parameters of the model, but the evaluation of the modelbased forecast may be based on an asymmetric loss function. This logical inconsistency is not inconsequential for tests assessing the predictive ability of the forecasts. The error introduced by parameter estimation affects the uncertainty of the forecast and, consequently, any test based on it.
However, in applications, it is often the case that the loss function used for estimation of a model is different from the one(s) used in the evaluation of the model. This logical inconsistency can have significant consequences with regard to comparison of predictive ability of competing models. The uncertainty associated with parameter estimation may result in invalid inference of predictive ability (West 1996). When the objective function in estimation is the same as the loss function in forecasting, the effect of parameter estimation vanishes. If one believes that a particular criteria should be used to evaluate forecasts, then it may also be used at the estimation stage of the modeling process. Gloria GonzálezRivera, TaeHwy Lee, and Emre Yoldas (2007) show this in the context of the VaR model of RiskMetrics, which provides a set of tools to measure market risk and eventually forecast the valueatrisk (VaR) of a portfolio of financial assets. A VaR is a quantile return. RiskMetrics offers a prime example in which the loss function of the forecaster is very well defined. They point out that a VaR is a quantile, and thus the check loss function can be the objective function to estimate the parameters of the RiskMetrics model.
LOSS FUNCTION FOR BINARY FORECAST AND MAXIMUM SCORE
Given a series {Y_{t} }, consider the binary variable G_{t + 1} ≡ 1(Y_{t + 1} > 0). We consider the asymmetric risk function to discuss a binary prediction. To define the asymmetric risk with A_{1} ≠ A_{2} and p = 1, we consider the binary decision problem of Clive Granger and Hashem Pesaran (2000b), and TaeHwy Lee and Yang Yang (2006) with the following 2x2 payoff or utility matrix:
Utility  G_{t+1} = 1  G_{t+1} = 0 

G_{t,1}(X_{t}) = 1  u_{11}  u_{01} 
G_{t,1}(X_{t}) = 0  u_{10}  u_{00} 
where u _{ij} is the utility when G _{t,1}(X_{t} ) = j is predicted and G _{t + 1} = I is realized (i, j = 1, 2). Assume u _{11} > u _{10} and u _{00} > u _{01}, and u_{ij} are constant over time; (u _{11} – u _{10}) > 0 is the utility gain from taking correct forecast when G _{t, 1}(X _{t}) = 1; and (u _{00} – u _{01}) > 0 is the utility gain from taking correct forecast when G _{t, 1} (X _{t}) = 0. Denote
π (X _{t}) = Ą_{Yt+1} (G _{t + 1}ǀX _{t}) = Pr (G _{t + 1} = 1ǀX _{t}).
The expected utility of G _{t, 1}(X _{t}) = 1 is u _{11}π (X _{t}) + u _{01}(1 – π,(X_{t} )), and the expected utility of G _{t, 1} (X _{t}) = 0 is u _{10}π (X ) + u _{00}(1 – π (X _{t})). Hence, to maximize utility, conditional on the values of X_{t}, the prediction G_{t, 1} (X _{t}) = 1 will be made if
u _{11} π (X _{t}) + u _{01}(1 – π (X_{t} )) > u _{10}π (X_{t} ) + u _{00}(1 – π (X_{t} )), or
By making a correct prediction, our net utility gain is (u _{00}– u _{01}) when G_{t + 1} = 0, and (u _{11} – u _{10}) when G _{t+1} = 1. Put another way, our opportunity cost (in the sense that you lose the gain) of a wrong prediction is (u _{00} – u _{01}) when G _{11} = 0 and (u _{11} – u _{10}) when G _{t+1} = 1. Since a multiple of a utility function represents the same preference, (1 – α ) can be viewed as the utility gain from correct prediction when G = 0, or the opportunity cost of a false alert. Similarly,
can be treated as the utility gain from correct prediction when G _{t + 1} = 1 is realized, or the opportunity cost of a failuretoalert. We thus can define a cost function c (e _{t + 1}) with e _{t + 1} = G _{t + 1} – G _{t + 1} (X _{t}):
Cost  G_{t+1} = 1  G_{t+1} = 0 

G_{t, 1}(X_{t })=1  0  1α 
G_{t, 1}(X_{t })=0  α  0 
That is,
which can be equivalently written as c (e _{t + 1}) = pα (e _{t + 1}), where pα (e ) ≡ [a – 1(e < 0)e ] is the check function. Hence, the optimal binary predictor maximizing the expected utility minimizes the expected cost Ą(pα (e_{t + 1} )ǀX_{t}).
The optimal binary prediction that minimizes Ą_{Y t + 1}(pα (e _{t + 1}ǀX_{t}) is the conditional αquantile of G_{t + 1} denoted as
This is a maximum score problem of Charles Manski (1975).
Also, as noted by James Powell (1986), using the fact that for any monotonic function h (·), Qα (h (Y_{t + 1}1)ǀX_{t}) = h (Qα (Y_{t + 1}ǀX_{t} )), which follows immediately from observing that Pr (Y _{t + 1} < y ǀX) = Pr [h (Y _{t + 1}) < h (y )ǀX _{t}], and noting that the indicator function is monotonic, Qα (G_{t + 1}ǀX_{t}) = Qα (1(Y _{t + 1} > 0)ǀX_{t}) = 1(Qα (Y _{t + 1}ǀX _{t}) > 0).
Hence,
where Qα (Y _{t + 1}ǀX_{t}) is the αquantile function of Y_{t + 1} 1 conditional on X_{t}. Note that Ą_{Yt + 1}(pα (e _{t})ǀX_{t} ) with e_{t} ≡ G_{t + 1}, – Qα (G _{t + 1}ǀX_{t} ), and Ą_{y t + 1}(pα (u_{t + 1} )ǀX_{t} ) with u_{t + 1} ≡ Y_{t + 1} – Qα (Y_{t + 1}ǀX_{t} ). Therefore, the optimal binary prediction can be made from binary quantile regression for G_{t + 1}. Binary prediction can also be made from a binary function of the αquantile for Y _{t + 1}
LOSS FUNCTIONS FOR PROBABILITY FORECASTS
Francis Diebold and Glenn Rudebusch (1989) consider the probability forecasts for businesscycle turning points. To measure the accuracy of predicted probabilities, that is, the average distance between the predicted probabilities and observed realization (as measured by a zeroone dummy variable). Suppose we have time series of probability forecast where p_{t} is the probability of the occurrence of a turning point at date t. Let be the corresponding realization with d_{t} = 1 if a businesscycle turning point (or any defined event) occurs in period t and d_{t} = 0 otherwise. The loss function analogous to the squared error is Brier’s score based on the quadratic probability score (QPS):
The QPS ranges from 0 to 2, with 0 for perfect accuracy. As noted by Diebold and Rudebusch (1989), the use of the symmetric loss function may not be appropriate, as a forecaster may be penalized more heavily for missing a call (making a Type II error) than for signaling a false alarm (making a Type I error). Another loss function is given by the log probability score (LPS)
which is similar to the loss for the interval forecast. Major mistakes are penalized more heavily under LPS than under QPS. Further loss functions are discussed in Diebold and Rudebusch (1989).
Another loss function useful in this context is the Kuipers score (KS), which is defined by
KS = Hit Rate – False Alarm Rate,
where the hit rate is the fraction of the bad events that were correctly predicted as good events (power, or 1— probability of Type II error), and the false alarm rate is the fraction of good events that have been incorrectly predicted as bad events (probability of Type I error).
LOSS FUNCTION FOR INTERVAL FORECASTS
Suppose Y_{t} is a stationary series. Let the oneperiodahead conditional interval forecast made at time t from a model be denoted as
J_{t, 1} (α ) = (L_{t, 1} (α ), U_{t, 1} (α )), t = R, …, T,
where L_{t, 1} (α ) and U_{t, 1} (α ) are the lower and upper limits of the ex ante interval forecast for time t + 1 made at time t with the coverage probability α Define the indicator variable X_{t + 1} (α ) = 1[Y_{t + 1} ∊ J_{t, 1} , 1(α )]. The sequence is IID Bernoulli (α ). The optimal interval forecast would satisfy Ą(X_{t + 1} (α )ǀI _{t}) = α, so that {X_{t} (α ) – (α } will be an MD. A better model has a larger expected Bernoulli loglikelihood
Hence, we can choose a model for interval forecasts with the smallest outofsample mean of the negative predictive loglikelihood defined by
LOSS FUNCTION FOR DENSITY FORECASTS
Consider a financial return series . This observed data on a univariate series is a realization of a stochastic process Y_{T} ≡ {Y_{T} : Ω→ đ, T = 1, 2, …;, T } on a complete probability space (Ω, Á_{T}, p^{T} _{0}), where Ω; = đ^{T} ≡ x^{T} _{T} = _{1}đ and ÁT= B(R_{T} ) is the Borel σfield generated by the open sets of đ^{T} , and the joint probability measure P ^{T} _{0}(B)≡P_{0} [Y^{T} ∊B], B ∊B(đ^{T}) completely describes the stochastic process. A sample of size T is denoted as y^{T} ≡ (y_{1}, …, y_{T})’.
Let σfinite measure v^{T} on B(đ_{T}) be given. Assume P ^{T} _{0}(B ) is absolutely continuous with respect to v^{T} for all T = 1, 2, …, so that there exists a measurable RadonNikodým density g^{T} (y^{T}) = dP^{T} _{0}/dv , unique up to a set of zero measureV^{T}.
Following Halbert White (1994), we define a probability model P as a collection of distinct probability measures on the measurable space (ΩÁ_{T}). A probability model P is said to be correctly specified for Y^{T} if P contains P ^{T} _{0} Our goal is to evaluate and compare a set of parametric probability models , where Suppose there exists a measurable RadonNikodým density for each θ ∊ θ, where θ is a finitedimensional vector of parameters and is assumed to be identified on θ, a compact subset of đ^{K } (see White 1994, Theorem 2.6).
In the context of forecasting, instead of the joint density g^{T} (y^{T} ), we consider forecasting the conditional density of Y^{t}, given the information Á_{t1} generated by Y_{t1}. Let π(Y_{t} ) ≡ π_{t}(Y _{t}ǀÁ_{t1}) ≡ g ^{t}(Y_{t})ǀg^{t1}(Y^{t1} ) for t = 2,3, … and π(Y_{1} ) ≡ π_{1}(Y_{1} ǀρ_{0}) ≡ g^{1}(Y^{1} ) = g^{1}(Y_{1} ). Thus the goal is to forecast the (true, unknown) conditional density π_{t}(Y_{t} ).
For this, we use a onestepahead conditional density forecast model for t = 2,3, … and If almost surely for some £_{0} ∊ £, then the onestepahead density forecast is correctly specified, and it is said to be optimal because it dominates all other density forecasts for any loss functions as discussed in the previous section (see Granger and Pesaran 2000a, 2000b; Diebold et al. 1998; Granger 1999).
In practice, it is rarely the case that we can find an optimal model. As it is very likely that “the true distribution is in fact too complicated to be represented by a simple mathematical function” (Sawa 1978), all the models proposed by different researchers can be possibly misspecified and thereby we regard each model as an approximation of the truth. Our task is then to investigate which density forecast model can approximate the true conditional density most closely. We have to first define a metric to measure the distance of a given model to the truth, and then compare different models in terms of this distance.
The adequacy of a density forecast model can be measured by the conditional KullbackLeibler information criterion (KLIC) (1951) divergence measure between two conditional densities,
where the expectation is with respect to the true conditional density and Ą_{πt} Following White (1994), we define the distance between a density model and the true density as the minimum of the KLIC
where ▯_{t} (ψ: Ψ,θ) is the pseudotrue value of θ(Sawa 1978). We assume that is an interior point of Θ. The smaller this distance is, the closer the density forecast Ψ_{t} (•ǀρ _{t1}; Ψ*_{t} _{1}) is to the true density Ψ_{t }(•ǀρ_{t1 }).
However, ▯_{t} (Ψ:ψ, θ*_{t1 })is unknown since Ψ*_{t1 } is not observable. We need to estimate Ψ*_{t1 }. If our purpose is to compare the outofsample predictive abilities among competing density forecast models, we split the data into two parts, one for estimation and the other for outofsample validation. At each period t in the outofsample period (t = R + 1, …, T ), we estimate the unknown parameter vector Ψ_{t }_{1} and denote the estimate as Using we can obtain the outofsample estimate of by
where P = T– R is the size of the outofsample period. Note that
where the first term in ▯_{P} (Ψ:ψ) measures model uncertainty (the distance between the optimal density Ψ_{t}(y _{t }) and the model and the second term measures parameter estimation uncertainty due to the distance between θ*_{t1 } and Ô_{t1}.
Since the KLIC measure takes on a smaller value when a model is closer to the truth, we can regard it as a loss function and use ▯_{P }(Ψ:ψ) to formulate the lossdifferential. The outofsample average of the lossdifferential between model 1 and model 2 is
which is the ratio of the two predictive loglikelihood functions. With treating model 1 as a benchmark model (for model selection) or as the model under the null hypothesis (for hypothesis testing), ▯_{P }(Ψ:ψ_{1})▯_{p }(Ψ:ψ_{2}) can be considered as a loss function to minimize. To sum up, the KLIC differential can serve as a loss function for density forecast evaluation as discussed in Yong Bao, TaeHwy Lee, and Burak Saltoglu (2007).
LOSS FUNCTIONS FOR VOLATILITY FORECASTS
Gloria GonzálezRivera, TaeHwy Lee, and Santosh Mishra (2004) analyze the predictive performance of various volatility models for stock returns. To compare the performance, they choose loss functions for which volatility estimation is of paramount importance. They deal with two economic loss functions (an option pricing function and a utility function) and two statistical loss functions (the check loss for a valueatrisk calculation and a predictive likelihood function of the conditional variance).
LOSS FUNCTIONS FOR TESTING GRANGERCAUSALITY
In time series forecasting, a concept of causality is due to Granger (1969), who defined it in terms of conditional distribution. TaeHwy Lee and Weiping Yang (2007) use loss functions to test for Grangercausality in conditional mean, in conditional distribution, and in conditional quantiles. The causal relationship between money and income (output) has been an important topic that has been extensively studied. However, those empirical studies are almost entirely on Grangercausality in the conditional mean. Compared to conditional mean, conditional quantiles give a broader picture of a variable in various scenarios. Lee and Yang (2007) explore whether forecasting the conditional quantile of output growth may be improved using money. They compare the check (tick) loss functions of the quantile forecasts of output growth with and without using the past information on money growth, and assess the statistical significance of the lossdifferential of the unconditional and conditional predictive abilities. As conditional quantiles can be inverted to the conditional distribution, they also test for Grangercausality in the conditional distribution (using a nonparametric copula function). Using U.S. monthly series of real personal income and industrial production for income, and M1 and M2 for money, for 1959 to 2001, they find that outofsample quantile forecasting for output growth, particularly in tails, is significantly improved by accounting for money. On the other hand, moneyincome Grangercausality in the conditional mean is quite weak and unstable. Their results have important implications for monetary policy, showing that the effectiveness of monetary policy has been underestimated by merely testing Grangercausality in mean. Moneyincome Grangercausality is stronger than it has been known, and therefore the information on money growth can (and should) be more widely utilized in implementing monetary policy.
SEE ALSO Autoregressive Models; Generalized Least Squares; Least Squares, Ordinary; Logistic Regression; Maximum Likelihood Regression; Optimizing Behavior; Regression; Regression Analysis; Time Series Regression
BIBLIOGRAPHY
Bao, Yong, TaeHwy Lee, and Burak Saltoglu. 2007. Comparing Density Forecast Models. Journal of Forecasting 26: 203–225.
Diebold, Francis X., and Glenn D. Rudebusch. 1989. Scoring the Leading Indicators. Journal of Business 62 (3): 369–391.
Diebold, Francis X., Todd A. Gunther, and Anthony S. Tay. 1998. Evaluating Density Forecasts with Applications to Financial Risk Management. International Economic Review 39: 863–883.
Ding, Zhuanxin, Clive W. J. Granger, and Robert F. Engle. 1993. A Long Memory Property of Stock Market Returns and a New Model. Journal of Empirical Finance 1: 83–106.
GonzálezRivera, Gloria, TaeHwy Lee, and Santosh Mishra. 2004. Forecasting Volatility: A Reality Check Based on Option Pricing, Utility Function, ValueatRisk, and Predictive Likelihood. International Journal of Forecasting 20 (4): 629–645.
GonzálezRivera, Gloria, TaeHwy Lee, and Emre Yoldas. 2007. Optimality of the RiskMetrics VaR Model. Unpublished manuscript, University of California, Riverside.
Granger, Clive W. J. 1969. Investigating Causal Relations by Econometric Models and CrossSpectral Methods. Econometrica 37: 424–438.
Granger, Clive W. J. 1999. Outline of Forecast Theory Using Generalized Cost Functions. Spanish Economic Review 1: 161–173.
Granger, Clive W. J. 2002. Some Comments on Risk. Journal of Applied Econometrics 17: 447–456.
Granger, Clive W. J., and M. Hashem Pesaran. 2000a. A Decision Theoretic Approach to Forecasting Evaluation. In Statistics and Finance: An Interface, eds. WaiSum Chan, Wai Keung Li, and Howell Tong. London: Imperial College Press.
Granger, Clive W. J., and M. Hashem Pesaran. 2000b. Economic and Statistical Measures of Forecast Accuracy. Journal of Forecasting 19: 537–560.
Harter, H. L. 1977. Nonuniqueness of Least Absolute Values Regression. Communications in Statistics—Theory and Methods A6: 829–838.
Hong, Yongmiao, and TaeHwy Lee. 2003. Inference on Predictability of Foreign Exchange Rates via Generalized Spectrum and Nonlinear Time Series Models. Review of Economics and Statistics 85 (4): 1048–1062.
Koenker, Roger, and Gilbert Bassett Jr. 1978. Regression Quantiles. Econometrica 46 (1): 33–50.
Kullback, L., and R. A. Leibler. 1951. On Information and Sufficiency. Annals of Mathematical Statistics 22: 79–86.
Lee, TaeHwy, and Weiping Yang. 2007. MoneyIncome GrangerCausality in Quantiles. Unpublished manuscript, University of California, Riverside.
Lee, TaeHwy, and Yang Yang. 2006. Bagging Binary and Quantile Predictors for Time Series. Journal of Econometrics 135: 465–497.
Manski, Charles F. 1975. Maximum Score Estimation of the Stochastic Utility Model of Choice. Journal of Econometrics 3 (3): 205–228.
Money, A. H., J. F. AffleckGraves, M. L. Hart, and G. D. I.Barr. 1982. The Linear Regression Model and the Choice of p. Communications in Statistics—Simulations and Computations 11 (1): 89–109.
Nyquist, Hans. 1983. The Optimal Lpnorm Estimation in Linear Regression Models. Communications in Statistics—Theory and Methods 12: 2511–2524.
Powell, James L. 1986. Censored Regression Quantiles. Journal of Econometrics 32: 143–155.
Sawa, Takamitsu. 1978. Information Criteria for Discriminating among Alternative Regression Models. Econometrica 46:1273–1291.
Varian, Hal R. 1975. A Bayesian Approach to Real Estate Assessment. In Studies in Bayesian Econometrics and Statistics:In Honor of Leonard J. Savage, eds. Stephen E. Fienberg and Arnold Zellner, 195–208. Amsterdam: North Holland.
West, Kenneth D. 1996. Asymptotic Inference about Prediction Ability. Econometrica 64: 1067–1084.
White, Halbert. 1994. Estimation, Inference, and Specification Analysis. Cambridge, U.K.: Cambridge University Press.
Zellner, Arnold. 1986. Bayesian Estimation and Prediction Using Asymmetric Loss Functions. Journal of the American Statistical Association 81: 446–451.
TaeHwy Lee
Cite this article
Pick a style below, and copy the text for your bibliography.

MLA

Chicago

APA
"Loss Functions." International Encyclopedia of the Social Sciences. . Encyclopedia.com. 24 Sep. 2018 <http://www.encyclopedia.com>.
"Loss Functions." International Encyclopedia of the Social Sciences. . Encyclopedia.com. (September 24, 2018). http://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/lossfunctions
"Loss Functions." International Encyclopedia of the Social Sciences. . Retrieved September 24, 2018 from Encyclopedia.com: http://www.encyclopedia.com/socialsciences/appliedandsocialsciencesmagazines/lossfunctions
Citation styles
Encyclopedia.com gives you the ability to cite reference entries and articles according to common styles from the Modern Language Association (MLA), The Chicago Manual of Style, and the American Psychological Association (APA).
Within the “Cite this article” tool, pick a style to see how all available information looks when formatted according to that style. Then, copy and paste the text into your bibliography or works cited list.
Because each style has its own formatting nuances that evolve over time and not all information is available for every reference entry or article, Encyclopedia.com cannot guarantee each citation it generates. Therefore, it’s best to use Encyclopedia.com citations as a starting point before checking the style against your school or publication’s requirements and the mostrecent information available at these sites:
Modern Language Association
The Chicago Manual of Style
http://www.chicagomanualofstyle.org/tools_citationguide.html
American Psychological Association
Notes:
 Most online reference entries and articles do not have page numbers. Therefore, that information is unavailable for most Encyclopedia.com content. However, the date of retrieval is often important. Refer to each style’s convention regarding the best way to format page numbers and retrieval dates.
 In addition to the MLA, Chicago, and APA styles, your school, university, publication, or institution may have its own requirements for citations. Therefore, be sure to refer to those guidelines when editing your bibliography or works cited list.