Latent Structure

views updated

Latent Structure

A scientist is often interested in quantities that are not directly observable but can be investigated only via observable quantities that are probabilistically connected with those of real interest. Latent structure models relate to one such situation in which the observable or manifest quantities are multivariate multinomial observations, for example, answers by a subject or respondent to dichotomous or trichotomous questions. Models relating polytomous observable variables to unobservable or latent variables go back rather far; some early references are Cournot (1838), Weinberg (1902), Benini (1928), and deMeo (1934). These models typically express the multivariate distribution of the observable variables as a mixture of multivariate distributions, where the distribution of the latent variable is the mixing distribution [seeDistributions, Statistical, article on Mixtures of Distributions].

Lazarsfeld (1950) first introduced the term latent structure model for those models in which the variables distributed according to any of the component multivariate distributions of the mixture are assumed to be stochastically independent. (Thus a latent structure model of a subject’s answers to 50 dichotomous questions—the latent class model of this article—assumes that subjects fall into relatively few classes, called latent classes, with the variable that relates the subject to his class being the latent variable. The distribution of this latent variable, that is, the distribution of the subjects among latent classes, is the mixing distribution. Within each class it is assumed that the responses to the 50 dichotomous questions are stochastically independent.) A basic reference for the general form of latent structure models is Anderson (1959).

The present article—restricted to the case of dichotomous questions—emphasizes the problems of identifiability and efficient statistical estimation of the parameters of latent structure models, points out difficulties with methods that have been proposed, and summarizes doubts currently held about the possibility of good estimation.

The simplest of the latent structure models and almost the only one in which the problem of parameter estimation has been carefully addressed is the latent class model. In this model, each observation in the sample is a vector x with p two-valued items or coordinates, conveniently coded by writing each either as 0 or as 1. The latent class model postulates that there is a small number m of classes, called latent classes, into which potential observations on the population can be classified such that within each class the p coordinates of the vector x are statistically independent. This is not to say that all identical observations in the sample are automatically considered as coming from the same class. Rather, associated with each class is a probability distribution on the 2^P possible vectors x, such that the p coordinates of x are (conditionally) independent. An observation vector x thus has a probability distribution that is a mixture of the probability distributions of x associated with each of the latent classes.

An example of the above model comes from the study (Lazarsfeld 1950) of the degree of ethnocentrism of American soldiers during World War II. Because it is not known how to measure ethnocentrism directly, a sample of soldiers was asked the following three questions: Do you believe that our European allies are much superior to us in strategy and fighting morale? Do you believe that the majority of all equipment used by all the allies comes from American lend-lease shipment? Do you believe that neither we nor our allies could win the war if we didn’t have each other’s help?

Here p = 3 and x is the vector of responses to the three questions, with Yes coded as 1 and No coded as 0. A suitable latent class model would postulate that there are two latent classes (so that m = 2), such that within each class the answers to the three questions are stochastically independent. Postulating the existence of any more than two latent classes would, as will be seen later, lead to difficulties, since the parameters of such a latent class model could not be consistently estimated. The two latent classes would probably be composed of ethnocentric and nonethnocentric soldiers, respectively. However, this need not be the case, and in fact it may happen that the two latent classes will have no reasonable interpretation, let alone the hoped-for interpretation. This phenomenon of possible noninterpretability is characteristic not only of the latent class model but also of the factor analysis and other mixture-of-distributions models.

The latent class model

Let σ denote a subset (unordered) of the integers (1,2, • • •, p), possibly the null subset ø. (Other subsets will, for concreteness, be denoted by writing their members in customary numerical order.) Let π_σ denote the probability that for a randomly chosen individual each coordinate of x with index a member of or is a 1, and define π_ø = 1. For example, π_2,7,19 is the probability that the second, seventh, and nineteenth coordinates of x are all 1, forgetting about—or marginally with respect to—the values of the other coordinates of x.

Since the order of coordinates is immaterial for such a probability, one is justified in dealing with the 2^η unordered σ’s, but a specific order in naming the subset is helpful for exposition. The π_σ’S are notationally a more convenient set of parameters than what might be considered the 2^p natural parameters of the multinomial distribution of x.

A concise description of the natural parameters of the distribution of x is the following. Let ̄σ denote that subset of the integers (1,2, • • •,p) which is the complement of σ. Let π_{π: ̄ σ} denote the probability that for a randomly chosen individual each coordinate of x with index of a member of σ is a 1 and each coordinate of x with index a member of σ is a 0. The 2^p π_̄σ’S are the natural parameters of the multinomial distribution of x, since they are the probabilities of each of the 2 p possible observation values. For example, in the ethnocentrism case, π1.2:3 would be the probability that the first two questions are answered Yes, while the third question is answered No. The π_σ’S and π_σ:̄σ’S are related by a nonsingular linear transformation.

Let v_a be the probability that the observation vector x is a member of the αth latent class, where α = 1,2, • • • , m and Σvα = 1. Let λασ be the prob-ability that if x is a vector chosen at random from the αth class, then each coordinate of σ with index a member of x is a 1. Clearly π_σ = Σ_αv_α λ_ασ.

Let σ_i denote the ith member of σ, with the members of σ arranged in some order, say numerical. The fundamental independence assumption of the latent class model then says that for each α

for all σ. That is, the probability (conditional on x being in the αth latent class) of any given set of coordinates of x being all 1’s is the product of the probabilities of each of these coordinates being a 1. Then

for all σ. These equations are called the accounting equations of the latent class model. Thus the m (p + 1) parameters of the model are the latent pa rameters λ_αi and the v_α, α = 1, • • •, m, i = 1, • • •, p. These completely determine the 2^Pmanifest parameters, the π_σ, via the accounting equations.

Parameter estimation

Suppose that the number of latent classes, m, is known to the investigator. (This assumption is made because it underlies all the theoretical work on the estimation of parameters of the latent class model. In practice m is unknown, but a pragmatic approach is to assume a particular small value of m, proceed with the estimation, see how well the estimated model fits the manifest data, and alter m and begin again if the fit is poor.) Then a central statistical problem is that of estimating the parameters of the model, the ν’s and λ’s, from a random sample of n vectors x. (The typical sample in survey work is a stratified rather than a simple random sample. However, the problem of estimating latent parameters from such samples is much more complicated, and as yet has hardly been touched.)

Let n_σ be the number of vectors in the sample with 1’s in each component whose index is a member of σ and let p_σ = n_σ/n. If the model were simply a multinomial model with parameters the σ_σ’S, then the P_σ’S would be maximum likelihood estimators of the σ_σ’S,. If for each set of 2^p σ_σ’S, there is a unique set of latent parameters, v_α’S and λ_αi’S, α = 1, • • • ,m, i = 1, •• • , p, then the η’S and λ’s are functions of the π_σ’S, and evaluating these functions at the p_σ’s as arguments will yield estimators (actually consistent estimators) of the latent parameters. But the “if” in the last sentence is most critical; it ..., the identifi ability condition, common to all models relating distributions of observable random variables to distributions of unobservable random variables. Consequently, most of the work on parameter estimation in latent class analysis is really a by-product of work on finding constructive procedures, that is, procedures that explicitly derive the unique latent parameters as function of the π’S, for proving the identifiability of a latent class model associated with a given m and p. With such a constructive procedure available, one can replace the π’S by their estimates, the p’s, and use the procedure to determine estimates of the νs and λ’s. The following description of estimation procedures based on constructive proofs of identifiability will thus really be a description of the constructive procedure for determining the v’s and λ’s from a subset of the π’S.

Green’s method of estimation. The earliest constructive procedure was given by Green (1951). Let D i be the m x m diagonal matrix with λ_αi, α = 1, • • •, m, on the diagonal, and let L be the (p + 1) x m matrix with first row a vector of 1’s and jth row (j = 2, • • • ,p + 1) the vector of (λ_{1, j-1}, • • • λ_m,j-1). Let N be the m x m diagonal matrix with ν_α,α = 1, • • • , m, on the diagonal. For σ a subset of (1, 2, • • • ,p), define D σ = Π_σj∊σD _σj. Form the matrix Π_σ = LND_σL’, where the prime denotes the matrix transpose. The (i,j)th element of this matrix is

If i≠ j and i, j ∉ σ then the (i, j)th element of this matrix is the manifest parameter π_{i jσ}. Otherwise the (i,j)th element of this matrix can formally be defined as a quantity called π_{i j σ}, where the subscript of π may have repeated elements. Since π’S with repeated subscripts are not manifest parameters and have no empirical counterpart but are merely formal constructs based on the latent parameters, they are not estimable directly from the n_σ’s. However, Green provided some rules for guessing at values of these π’S(one rule is given below) so that the matrix π_σ can be partly estimated and partly guessed at, given data.

Let be the m × m diagonal matrix with α=l, • • • ,m, on the diagonal, and Then Π =Σ_kΠ_k = ADA’ . Under the assumptions that m ≤ p + 1, rank A = m, and all the diagonal elements of D are different and nonzero, the following procedure determines the matrices L and N of latent parameters.

Factor Π₀ as Π₀ = BB’ and Π as Π = CC’ . (The matrices B and C are not unique, but any factorization will do.) Let T = (BB’ )^-1B’C . A complete principal component analysis of TT’ will yield an orthogonal matrix Q, and it can be shown that A =BQ . Since the first row of L is a vector of 1’s, the first row of A is an estimate of the vector so that N is easily determined. The matrix L is then just .

The major shortcoming of this procedure is the problem of how to guess at values of the π’S bearing repeated subscripts. No one has yet devised a rule which, when applied to a set of p’s, will yield consistent estimators of L and N . For example, Green suggests using as a guess at π_iiH. Yet in the case m = 2, p = 3 with latent parameters v₁ = v₂ = .5, λ₁₁ = .9, λ₁₂ = .2, λ₁₃ = .8, λ₂₁ = .7, λ₂₂ = .9, λ₂₃ = .4, if i = 2, max_j≠2(p_2j— p₂p_j) is a consistent estimator of — .07, so that p₂₂ is a consistent estimator of something smaller than But , so that p₂₂ is not a consistent estimator of π₂₂

Determinantal method of estimation. A matricial procedure that does not have the above shortcoming, since it involves only estimable π’S, was first suggested by Lazarsfeld and Dudman (see Lazarsfeld 1951) and independently by Koopmans (1951), developed by Anderson (1954), and extended by Gibson (1955; 1962) and Madansky (1960). For ease of exposition, the procedure will be described only for the cases treated by Anderson.

Assume that p ≥ 2 m + 1. In that case, 2 m + 1 different items can be selected from the p items (say, the first 2 m +1) and the following matrices of π’S involving only these items formed. Let

and let ͂Π be the matrix Π* with the 1 replaced by π2 m+1 and all the π’S having the additional subscript 2 m + 1. Let A: be an (m + 1) × (m + 1) matrix with the first row a vector of 1’s and the jth row (j = 2, ... ,m + 1) the vector (λ₁,j-1, ... , λm, j-1), and let Λ₂ be an (m+l)×(m+l) matrix with first row a vector of 1’s and the jth row(j = 2, ... ,m + 1) the vector (λ₁m+j-1, ... , λm,m+j-1). Let N and D _2m+1 be defined as above. Then and . Thus, if the diagonal elements of D _{2 m+1} are distinct and if Λ ₁, N and Λ, are of full rank, then the diagonal elements of D 2 m+1 are the roots θ of the determinantal equation ͂Π – |θΠ*| = 0.

*Table 1*
*Parameter*	*Value*	*Asymptotic variance*
v	3/4	1115.42/n
λ₁₁	1/2	39.00/n
λ₁₂	1/3	60.89/n
λ₁₃	1/3	4.96/n
λ₂₁	1/4	303.00/n
λ₂₂	2/3	611.53/n
λ₂₃	1/4	31.00/n

If Z is the matrix of characteristic vectors corresponding to the roots θ_l , ... ,θ_m , then the columns of ΠZ are proportional to the columns of Λ₁, with the constant of proportionality determined by the condition that the first row of Λ₁ is a vector of 1’s. A similar argument using the transposes of ͂Π and Π* yields Λ ₂ , and N is determined by

A difficulty with this procedure is that it depends critically on which 2 m + 1 items are chosen from the p items, on which of these 2 m + 1 is chosen to define Π*, and on the allocation of the 2 m items to the rows and columns defining Π*. That is, it de-pends critically on the ordering of the items. There are no general rules available for an ordering of the items that will yield relatively efficient estimators of the latent parameters.

The most important shortcoming of this procedure and of its extensions (which involve more of the π’S) is that there is no guarantee that when the procedure is used with a set of p’s it will produce permissible estimates of the latent parameters, that is, estimates that are real numbers between 0 and 1. In four sampling experiments with n – 1,000, m - 3, and p - 8, Anderson and Carleton (1957) found that of 2,240 determinantal equations only 33.7 per cent had all roots between 0 and 1. Madansky (1959) computed the asymptotic variance of the determinantal estimates for the case m = 2,p - 3, a case in which these estimators, if permissible, are the maximum likelihood estimators of the latent parameters, and found the results presented in Table 1, where n is the sample size. Thus, sample sizes must be greater than 1,116 for the variance of the estimators of all the parameters to be less than 1.

*Table 2*
π_123:ø = 10/192
π_23:1 = 14/192
π_13:2 = 17/192
π_12:3 = 22/192
π_3:12 = 19/192
π_2:13 = 34/192
π_1:23 = 35/192
π_ø;123 = 41/192

Table 3
Response pattern	Number observed
123;ø/b>	2
23;1	3
13;2	4
12;3	4
3;12	4
2;13	7
1;23	7
ø123	9

Rounding error also affects the estimates greatly. The parameters of the multinomial distribution for the above model are given in Table 2.

For a sample of size 40, if one had actually ob-served the expected number of respondents for each of the response patterns (rounded to the nearest integer), then the sample would have the composition shown in Table 3. Table 4 shows the pσ’s based on these data (πσ being given for comparison). The determinantal estimates of the latent parameters are given in the third column of Table 5. (The fourth column will be discussed below.)

Partitioning method of estimation. A third estimation procedure (Madansky 1959) looks at the problem in a different light. Since the latent classes are defined as those classes within which the p components of the vector x are statistically independent, one might (at least conceptually) look at all possible assignments of the n observations into m classes and find that assignment for which the usual x² test statistic for independence is smallest. The estimates of the latent parameters would then just be the appropriate proportions based on this best assignment. They would always be permissible. Although for finite samples they would not be identical with minimum x² estimates, they would have the same asymptotic properties and thus be asymptotically equivalent to maximum likelihood estimates.

Madansky (1959) introduced another measure of independence, simpler to compute than x², and found that the asymptotic efficiency of the estimators of the latent parameters from this procedure, in the example described above, is about .91. The obvious shortcoming of this idea is that it is too time consuming to carry out all the possible assign

*Table 4*
σ	P_σ	σ_σ
1	.425	.4375
2	.400	.4167
3	.325	.3125
12	.150	.1667
13	.150	.1406
23	.125	.1250
123	.050	.0521

ments, even for moderate samples on an electronic computer. In the example described above, for a sample of size 40 it took four hours of computation on the IBM 704 to enumerate and assess all the assignments into two classes. The resulting estimates are shown in the fourth column of Table 5.

Table 5 – Parameter estimates for two methods*
Parameter	Value	Determinantal estimate	Partitioning estimate
n = 40.
Source: Madansky 1959, p. 21.
v₁	.75	.23	.58
λ₁₁	.50	.82	.00
λ₁₂	.33	.23	.43
λ₁₃	.33	.42	.30
λ₂₁	.25	.30	1.00
λ₂₂	.67	.45	.35
λ₂₃	.25	.29	.35

Scoring methods. Current activity on estimation procedures for the latent class model (Henry 1964) is directed toward writing computer routines using the scoring procedure described by McHugh (1956) to obtain best asymptotically normal estimates of the latent parameters. The scoring procedure will yield estimators with the same large asymptotic variances as those indicated by the above example of the maximum likelihood estimators’ asymptotic variances. Also, the scoring procedure has the same permissibility problem associated with it as did the determinantal approach described above. However, the problem can be alleviated for this procedure by using a set of consistent permissible estimators for initial values in the scoring procedure.

Albert Madansky

[See alsoScaling. Directly related are the entriesDistributions, statistical, article onmixtures of distributions; Factor analysis; Statistical identifiability.]

BIBLIOGRAPHY

Anderson, T. W. 1954 On Estimation of Parameters in Latent Structure Analysis. Psychometrika 19:1-10.

Anderson, T. W. 1959 Some Scaling Models and Estimation Procedures in the Latent Class Model. Pages 9–38 in Ulf Grenander (editor), Probability and Statistics. New York: Wiley.

Anderson, T. W.; and CARLETON, R. O. 1957 Sampling Theory and Sampling Experience in Latent Structure Analysis. Journal of the American Statistical Association 52:363 only.

Benini, Rodolfo 1928 Gruppi chiusi e gruppi aperti in alcuni fatti collettivi di combinazioni. International Statistical Institute,Bulletin 23, no. 2:362-383.

Cournot, A. A. 1838 Mémoire sur les applications du calcul des chances à la statistique judiciaire.Journal de mathematiques pures et appliquees 3:257-334.

Demeo, G. 1934 Su di alcuni indici atti a misurare I’attrazione matrimoniale in classificazioni dicotome. Accademia delle Scienze Fisiche e Matematiche, Naples,Rendiconto 73:62–77.

Gibson, W. A. 1955 An Extension of Anderson’s Solution for the Latent Structure Equations. Psychometrika 20:69–73.

Gibson, W. A. 1962 Extending Latent Class Solutions to Other Variables.Psychometrika 27:73-81.

Green, BERT F. JR. 1951 A General Solution for the Latent Class Model of Latent Structure Analysis.Psychometrika 16:151-166.

Henry, Neil 1964 The Computation of Efficient Estimates in Latent Class Analysis. Unpublished manuscript, Columbia Univ., Bureau of Applied Social Research.

Koopmans, T. C. 1951 Identification Problems in Latent Structure Analysis. Cowles Commission Discussion Paper: Statistics, No. 360. Unpublished manuscript.

Lazarsfeld, Paul F. 1950 The Logical and Mathematical Foundation of Latent Structure Analysis. Pages 362–412 in Samuel A. Stouffer et al.,Measurement and Prediction. Princeton Univ. Press.

Lazarsfeld, Paul F. 1951 The Use of Mathematical Models in the Measurement of Attitudes. Research Memorandum RM-455. Santa Monica (Calif.): RAND Corporation.

Lazarsfeld, Paul F. 1959 Latent Structure Analysis. Pages 476–543 in Sigmund Koch (editor),Psychology: A Study of a Science. Volume 3: Formulations of the Person and the Social Context. New York: McGraw-Hill.

Mchugh, Richard B. 1956 Efficient Estimation and Local Identification in Latent Class Analysis.Psychometrika 21:331-347.

Mchugh, Richard B. 1958 Note on “Efficient Estimation....” Psychometrika 23:273-274. → This is a correction to McHugh 1956.

Madansky, Albert 1959 Partitioning Methods in Latent Class Analysis. Paper P-1644. Santa Monica (Calif.): RAND Corporation.

Madansky, Albert 1960 Determinantal Methods in La-tent Class Analysis.Psychometrika 25:183–198.

Weinberg, Wilhelm 1902 Beitrage zur Physiologie und Pathologic der Mehrlingsgeburten beim Menschen. Pftuger’s Archiv fur die gesamte Physiologie des Menschen und der Tiere 88:346–430.

International Encyclopedia of the Social Sciences