psychometrika doi: 10.1007/s11336-015-9448-y

MODELLING TRENDS IN ORDERED CORRESPONDENCE ANALYSIS USING ORTHOGONAL POLYNOMIALS

Rosaria Lombardo SECOND UNIVERSITY OF NAPLES

Eric J. Beh UNIVERSITY OF NEWCASTLE

Pieter M. Kroonenberg LEIDEN UNIVERSITY The core of the paper consists of the treatment of two special decompositions for correspondence analysis of two-way ordered contingency tables: the bivariate moment decomposition and the hybrid decomposition, both using orthogonal polynomials rather than the commonly used singular vectors. To this end, we will detail and explain the basic characteristics of a particular set of orthogonal polynomials, called Emerson polynomials. It is shown that such polynomials, when used as bases for the row and/or column spaces, can enhance the interpretations via linear, quadratic and higher-order moments of the ordered categories. To aid such interpretations, we propose a new type of graphical display—the polynomial biplot. Key words: symmetric and non-symmetric correspondence analysis, generalized singular value decomposition, bivariate moment decomposition, hybrid decomposition, polynomial biplots.

1. Introduction For categorical variables, various measures and modelling strategies are now available to the analyst. One may consult, for example, Agresti (1996, 2010) for a description of many of these. Similarly, graphing the association structure between the categorical variables has received just as much attention; see, for example, Beh (1997, 1998), Beh and Lombardo (2014), Gifi (1990), Greenacre (1984, 2007), Lebart, Morineau and Warwick (1984), Lombardo, Beh and D’Ambra (2007), Lombardo and Meulman (2010), Meulman, Van der Kooij and Heiser (2004), Nishisato (1980, 2007) and Ter Braak (1986). The use of external information (concomitant variables) in the form of constraints has resulted in simplified multidimensional displays in correspondence analysis; see Böckenholt and Böckenholt (1990), Takane, Yanai, and Mayekawa (1991), and Takane and Jung (2009). A particular simple application of constrained correspondence analysis is due to Böckenholt and Takane (1990) in which the row and column scores follow a linear order. A primary goal in analysing contingency tables is to quantify each category such that the spacing between the quantifications is optimal for modelling the association and portraying the association between the categorical variables in a graphical display. The Gifi (1990) system contains such a quantification strategy for homogeneity analysis and for multiple correspondence analysis for multi-way contingency tables. In this paper, we will discuss and describe the benefits of using orthogonal polynomials when modelling the association in two-way tables with one or two ordered variables. These polynomials Correspondence should be made to Rosaria Lombardo, Economics Department, Second University of Naples, Corso Gran Priorato di Malta, 81043 Capua, CE Italy. Email: [email protected]

© 2015 The Psychometric Society

PSYCHOMETRIKA

will replace the commonly used singular vectors from a generalized singular value decomposition (GSVD) as the orthogonal bases for the column space or row spaces or both. It will be shown that orthogonal polynomials can model the association between the categorical variables, both in standard or symmetric correspondence analysis and in non-symmetric correspondence analysis. In particular, we will use Emerson’s (1968) orthogonal polynomials for this purpose. These polynomials have been used earlier in the literature to quantify ordered variables; see Beh (1997, 2001), Best and Rayner (1996), D’Ambra, Lombardo and Amenta (2002), Corbellini, Riani and Donatini (2008), Lombardo et al. (2007), Lombardo and Meulman (2010), Lombardo and Beh (2010) and Manté, Bernard, Bonhomme and Nerini (2013). To enhance the interpretation of orthogonal polynomials, we will also propose a new graphical display, the polynomial biplot. Our approach to analysing the association between the ordinal variables of a contingency table may be considered similar to the linearly constrained approach to correspondence analysis by Böckenholt and Takane (1990). This is especially so when the matrix of external variables is given as the polynomial basis generated using Emerson’s polynomials. However, the novelty of our approach is that by considering these polynomials, we obtain greater insight into the interpretation of the resulting decomposition and biplots. Two types of correspondence analyses will be considered within a unified framework: symmetric correspondence (both variables fulfil the same function; they are interdependent) and non-symmetric correspondence analysis (one variable is a response variable and the other a predictor variable). Both these types can have zero, one, or two ordered variables, but we will restrict ourselves here to those cases when at least one of the variables has ordered categories, and such ordered variables will always be modelled by orthogonal polynomials. When one of the variables has ordered categories and the other has not, we will consider the hybrid decomposition (HD) (Beh 2001, Lombardo, Beh & D’Ambra 2011), and when both have ordered categories we will use the bivariate moment decomposition (BMD) (Beh 1997; Lombardo et al. 2007). The paper is organized as follows. After describing in Section 2 our notation, Section 3 provides a brief description of the symmetric and non-symmetric correspondence analysis in a unified fashion. Section 4 discusses a general framework for ordered symmetric and non-symmetric correspondence analysis using Emerson polynomials for the ordered variables. Section 5 provides the expressions for the coordinates and distances for the ordered variants of correspondence analysis. In Section 6, the different proposals are illustrated with two examples. The first example concerns two ordered symmetrically related variables, and the second example concerns a nominal and an ordered variable in an asymmetric context.

2. Notation Consider a two-way contingency table N of dimension I × J that cross-classifies a sample of n individuals/units according to two variables Y and X consisting of I row and J column categories, spanning the real spaces R I and R J , respectively.   I  J Denote the matrix of joint relative frequencies by P = pi j so that i=1 j=1 pi j = 1. Let I be the diagonal identity matrix, whose general element is one. Let D I be the I × I diagonal matrix whose (i, i)th element is pi• . Similarly, denote D J to be the J × J diagonal matrix whose ( j, j)th element is p• j . The conditional proportion that the individuals/units are classified into row i given that they belong to column j is pi j / p• j ; this is referred to as the jth column profile. Furthermore, let the elements of the I × J matrix  be πi j =

pi j − pi• . p• j

(1)

ROSARIA LOMBARDO ET AL.

Then  is the matrix of differences between the conditional proportion pi j / p• j (or the ith element of the jth column profile) and the unconditional marginal proportion pi• (the ith row marginal proportion). Hence,  is referred to as the centred column profile matrix. 3. Symmetric and Non-symmetric Correspondence Analysis There are many variants of correspondence analysis (CA) based on a family of statistics derived from Cressie and Read’s (1984) divergence statistic. One may refer to, for example, Beh and Lombardo (2012, 2014), Gifi (1990), Greenacre (1984) and Nishisato (2007, Chapter 3) for a discussion of these variants. The most common measure for symmetric association in a two-way table is Pearson’s chisquared statistic 2 I  J   pi j − pi• p• j . X =n pi• p• j 2

i=1 j=1

The statistic may also be expressed in terms of the weighted sum-of-squares of the centred column profiles, i.e. πi j . In correspondence analysis, X 2 /n is commonly referred to as the (total) inertia, where   p• j X2 = n pi• I

J

i=1 j=1



pi j − pi• p• j

2 =

I  J  p• j 2 π . pi• i j

(2)

i=1 j=1

When N consists of the cross-classification of a column predictor variable and a row response variable, the dependence relationship (asymmetric association) should not be assessed using Pearson’s chi-squared statistic, but by the Goodman–Kruskal tau index (Goodman & Kruskal 1954), τnum τ= = τden

I i=1

J j=1

1−

 p• j I i=1

pi j p• j 2 pi•

− pi•

2

I =

J

2 j=1 p• j πi j I 2 1 − i=1 pi•

i=1

I =

i=1

J

p• j j=1 I

τden

πi2j

. (3)

Here, the numerator of τ , τnum , is the overall measure of the increase in predictability of the rows given the columns. The denominator, τden , measures the overall error in prediction and does not depend on the predictor categories (D’Ambra & Lauro 1989; Lauro & D’Ambra 1984). When all distributions of the predictor variable are identical to the overall marginal distribution, there is no relative increase in predictability, i.e. in this case, τ is zero (Agresti 1996, Kroonenberg & Lombardo 1999). The statistical significance of this index can be assessed by comparing C = (n − 1) (I − 1) τ against χα2 , which is the 1 − α percentile of a chi-squared distribution with (I − 1)(J − 1) degrees of freedom (see, Light & Margolin 1971). The key mathematical difference between the Pearson’s chi-squared statistic (Equation (2)) and the tau index numerator (Equation (3)) is in the different metrics used in R I , while in R J , the metric is the same. To express this difference more explicitly, let V I and W J be the diagonal weight (metric) matrices in the space R I and R J , respectively. In the case of correspondence analysis, p• j V I = D−1 I and W J = D J , i.e. embodied in the term pi• . For non-symmetric correspondence analysis, the weight matrices are V I = I and W J = D J , i.e. embodied in the term

p• j I

. If we

PSYCHOMETRIKA

˜ as the weighted centred column profile matrix , then  ˜ = V W is a generalized define  I J formulation valid for both correspondence analysis and non-symmetric correspondence analysis. This formulation shows that a unifying link can be established between the total inertia for the symmetric association, X 2 /n, and the asymmetric association, τnum . The two cases differ only in the weights used in the scaling of the centred column profiles, πi2j . 1/2

1/2

3.1. A General Framework for CA and NSCA: The Row and Column Spaces In order to provide a numerical or visual summary of the symmetric or asymmetric association between the column/row (predictor/response) variables, we should apply the generalized singular ˜ that is, value decomposition (GSVD) to decompose the weighted centred column profile matrix ,

˜ = AB , GSVD 

(4)

˜ is the generalized singular value matrix, and A = (aim ) and B = (b jm ) are where  = A B the I × M and J × M generalized singular vector matrices in the space R I and R J , respectively, with M = min (I, J ) . As in Greenacre’s terminology, singular values and vectors are generalized because the metrics imposed in both row and column spaces are no longer simple ˜ see Euclidean, but generalized (or weighted) Euclidean metrics, incorporated in the matrix ; Greenacre (1984, p. 344). Furthermore,  indicates the strength of the correlation between the right and left generalized singular vectors, which are centred with an unitary norm. This matrix is diagonal and implies that the generalized correlation between the vectors am and bm  for m = m  (with m, m  = 1, . . . , M) is zero. As a result, the generalized singular vectors are referred to as biorthogonal (see, Dieudonné 1953). With these definitions, we can rewrite the total inertia defined in Equation (2) as

Inertia () =

I  J  i=1 j=1

 vi• w• j

pi j − pi• p• j

2 =

I  J 

vi• w• j πi2j =

i=1 j=1

I  J 

π˜ i2j .

(5)

i=1 j=1

From Equation (4), it follows that X 2 /n or τnum (i.e. the total inertia of ) can be expressed as the sum of the squared singular values  ˜ 2, Inertia () = trace 2 = ||||2V I ;W J = ||||

(6)

where ||||2V I ;W J indicates the norm of  with respect to the weighted metrics as defined in ˜ with respect to the unitary weighted metrics. ˜ 2 is the norm of  Section 3 and |||| Note that due to the different weight matrices for CA and NSCA, the actual values for 2 the inertia of  , i.e. Xn and τnum , are not the same. However, to avoid cumbersome notation and language, we will not explicitly refer to this in the following sections. In addition, due to the different weights, the interpretation of the deviation from independence for regular correspondence analysis is in terms of association, while in the case of non-symmetric correspondence analysis, it is in terms of increase in predictability (see e.g. Kroonenberg & Lombardo 1999). However, for simplicity, we will often use only one of the two terms.

ROSARIA LOMBARDO ET AL.

4. An Ordinal Framework for the Row and Column Spaces When the two variables of N consist of ordered categories, it is important to preserve their ordered structure in a decomposition since this order provides valuable insight into the association between them. For a correspondence analysis of N, numerous techniques have been proposed to enforce the values on the singular vectors am and/or bm to be ordered. However, this puts unnecessary restrictions on the association structure. Other proposals have been put forward which consist of carrying out further computations on the scores obtained from the singular value decomposition and constraining the graphical representation; see, for example, Böckenholt and Takane (1990), Nair (1986), Nishisato and Arri (1975) and Nishisato (1980; 2007, Chaps. 2 and 11). These authors proposed a variety of ways in which a correspondence analysis can be performed for contingency tables with ordered categories. 4.1. Orthogonal Polynomials Emerson (1968) discussed in detail how orthogonal polynomials could be generated using general recurrence formulae, and it turned out that these polynomials are especially well suited for modelling ordinal categorical data with a priori chosen, although not necessarily equally spaced, sets of ordered scores. Emerson’s highly efficient short-cut algorithm was explicitly presented to compute orthogonal polynomials for ordered categorical variables. The Emerson recurrence formulae also provide a computationally simple and efficient means of orthogonalisation that is akin to the Gram–Schmidt orthogonalisation algorithm; see, for instance, Gifi (1990, p. 167). The Emerson formulae were actually proposed earlier by Robson (1959) (see Hudson 1969). 4.2. How to Compute the Orthogonal Polynomials: The Recurrence Formulae Let β jv be the jth column orthogonal polynomial of order v for the column categories, i.e. the ( j, v)th element of the matrix B. Consider consecutive integer values starting from unity as a priori chosen natural scores for an ordinal variable. Typically, the sets of integers for the two variables are indicated by i = 1, 2, . . . , I for the row response category, and j = 1, 2, . . . , J for the column categories. Note that the choice of scores has an impact on the final set of orthogonal polynomials generated. Therefore, when using orthogonal polynomials to perform correspondence analysis, the choice of scores also has an impact on the configuration of points (Beh 1998). The following exposition will be in terms of the column categories but it is essentially the same for the row categories. At the heart of the computation of the vth-order orthogonal polynomial, for v = 0, 1, 2, . . . , J − 1 lies Emerson’s general recurrence formulae, which consists of three simple terms, Mv , Vv and Sv . It will become evident from the computation of a polynomial of any degree, that the first term Mv represents a general mean, the second term a general covariance between the two consecutive polynomials and the third term the inverse of a general standard deviation of the ordered variable. The generic element, β jv , of the orthonormal polynomial vector β v , is computed by β jv = Sv [( j − Mv ) β j,v−1 − Vv β j,v−2 ],

(7)

where β jv ≡ β j,v . One may consider that this element is obtained by subtracting from the weighted centred score—( j − Mv ) β j,v−1 —the weighted covariance structure and then normalizing. To start the recursive process, we set for j = 1, 2, . . . , J the elements β jv in B equal to β j,−1 = 0 for v = −1 and to β j,0 = 1 for v = 0. This latter case is the “trivial" orthogonal polynomial and is akin to the vector of 1’s that, depending on the scaling of matrices involved,

PSYCHOMETRIKA

often arises from performing an SVD in classical correspondence analysis. Each of the three terms in Equation (7) is calculated such that

Mv =

J 

p• j jβ 2j,v−1

(8)

p• j jβ j,v−1 β j,v−2

(9)

j=1

Vv =

J  j=1

⎛ ⎞−1/2 J  Sv = ⎝ p• j j 2 β 2j,v−1 − Mv2 − Vv2 ⎠ .

(10)

j=1

For the first-degree polynomial (v = 1), Equation (8) leads to the weighted scores of a variable, J i.e. its mean M1 = j=1 p• j j = μ J , since β j,0 = 1. Therefore, (9) leads to V1 = 0 since β j,−1 = 0. By substituting M1 = μ J and V1 = 0 into Equation (10), S1 becomes ⎡



S1 = ⎣

⎤−1/2 p• j j 2 − (μ J )2 ⎦

= σ J−1 ,

j

where σ J is the standard deviation of the weighted ordered categories in the set of original scores and allows for the normalization of scores. Therefore, the general element of the first non-trivial column polynomial is equivalent to the standardization of the jth score such that β j1 = z˘ j =

j − μJ . σJ

(11)

For the second-degree polynomial (v = 2), the term in (8) becomes  M2 = p• j j z˘ 2j . j

Similarly, Equation (9) leads to V2 =



p• j j z˘ j

j

since β j,1 = z˘ j and β j,0 = 1. Therefore, for v = 2, Equation (10) becomes ⎧ ⎫−1/2 ⎨ ⎬ S2 = p• j j 2 z˘ 2j − M22 − V22 ⎩ ⎭ j

so that the generic element of the second-degree polynomial becomes

   j − Jj=1 j p• j z˘ 2j z˘ j − Jj=1 p• j j z˘ j β j2 = .  1/2 J 2 z˘ 2 − M 2 − V 2 p j • j j=1 2 2 j

(12)

ROSARIA LOMBARDO ET AL.

From Equation (11), we see that the first-degree polynomial is linear with respect to the set of ordered standardized scores. The second-degree polynomial (Equation (12)) is quadratic in the standardized scores and so on. In order to explain the nature of the computation, any quantity that involves the polynomial β 1 describes the variation of the categories in terms of differences from their mean. These scores related to the linear polynomial (or polynomial of first degree) are always ordered. Similarly, when v = 2, the polynomial, β 2 , describes the variation of the categories in terms of their differences in dispersion. Variations in higher-order moments can be assessed by considering those column orthogonal polynomials for which v > 2. Variations in terms of mean, dispersion and higher-order moments can be derived in a similar fashion for the row categories by considering the related polynomial matrix A with general elements αiu , for the order u = 1 to u = I −1. With respect to these polynomials, we code each of the two ordered variables separately. To evaluate the association between variables by optimal scores, one may perform the bivariate moment decomposition or the hybrid decomposition, depending on whether one or two variables have ordered scores. 4.3. The Orthonormality Constraints of Polynomials All polynomials are normalized and orthogonal to their lower-order polynomials in the weighted space. In matrix notation, the row and column polynomials have the characteristic A D I A = I and B D J B = I, respectively. However, unlike the generalized singular vectors, they are not biorthogonal since the generalized correlation between the polynomials α u and β v for u = v (with u = 1, . . . , I − 1 and v = 1, . . . , J − 1) need not be zero. As in Section 3.1, a unifying link can be established between the total inertia of ordered 1/2 CA and the total inertia of ordered NSCA. To establish this link, we define A´ = D I A and   1/2 B´ = D B, so that A´ A´ = I and B´ B´ = I. J

4.4. One-Way Ordered Analysis and the Hybrid Decomposition 4.4.1. The decomposition When only one of the two variables is ordered, say the column ˜ one may replace the left (generalized) singular variable, rather than considering the GSVD of , vectors in Equation (4) by the linear combination of the column Emerson polynomials. Then the bases of row and column spaces are no more biorthogonal like in GSVD decomposition. In this ˜ takes the form case, the hybrid decomposition (HD) (Beh 2001; Lombardo et al. 2011) of 

˜ = AZ∗ B´  , HD  ˜ B´ is the matrix where A = (aim ) is the I × M generalized singular vector matrix, Z∗ = A  1/2 of generalized correlations and B´ = ( p• j β jv ) is the weighted J × (J − 1) column polynomial matrix. The matrix Z∗ of dimension M × (J − 1) need not be square or diagonal as the singular vectors are not orthogonal to the polynomial vectors. It contains as off-diagonal elements the generalized correlations between the singular vectors and the orthogonal polynomials (see the next section for details). 4.4.2. Inertia in the Hybrid Decomposition In the hybrid decomposition, not only can the total inertia of the symmetric or asymmetric analysis be partitioned, but the eigenvalue associated with each principal axis (generalized singular vector) can be as well. That is, we can partition

PSYCHOMETRIKA

the square of the mth largest singular value of classical correspondence analysis into sources of inertia (Beh 2001; Lombardo et al. 2011) such that

λ2m

=

J −1 

z ∗ 2mv .

v=1

For example, the square of the first singular value (when m = 1) can be partitioned into linear, quadratic and higher-order sources of inertia, such that λ21 = z ∗ 211 + z ∗ 212 + · · · + z ∗ 21,J −1 . By comparing the terms in this equation, it is easy to see which polynomial contributes most to each principal axis. For symmetric or non-symmetric correspondence analysis, the total inertia of the points in the space R I (for the ordered column profiles) and R J (for the unordered row profiles) is given by      Inertia () = trace Z∗  Z∗ = trace Z∗ Z∗  = trace 2 .   From trace Z∗  Z∗ , we  obtain J −1 inertia values associated with the J −1 column polynomials, and from trace Z∗ Z∗  , we get M inertia values related with the M generalized singular vectors. M ∗2 , we get the inertia linked to the first polynomial In particular, by calculating m=1 z ∗2 = z •1  J −1 m1 ∗2 = z ∗2 = λ2 , we get the inertia of the first principal axis. axis. Similarly, by computing v=1 z 1v 1• 1 4.5. Two-Way Ordered Analysis and the Bivariate Moment Decomposition 4.5.1. The Decomposition When both variables are ordered, one can replace both sets of generalized singular vectors in Equation (4) by linear combinations of orthogonal polynomials. ˜ has the form The resulting bivariate moment decomposition (BMD) of 

´ B´  , ˜ = AZ BMD   ˜ B´ is the generalized correlation matrix between the row and column polynomials. where Z = A´  Thus, each element z uv of Z is

z uv =

I  J 

π˜ i j α´ iu β´ jv .

(13)

i=1 j=1

Clearly the generalized correlation matrix Z is not diagonal (see also Rayner & Beh 2009) as all z uv are defined. The non-diagonality is due to fact that the bases of the column and row space are no longer biorthogonal. From Equation (11), we can see that the generalized correlation of the first two linear polynomials z 11 is equal to z 11 =

I  J  i=1 j=1

 pi j

i − μI σI



j − μJ σJ

 .

ROSARIA LOMBARDO ET AL.

Thus, z 11 is the weight of the linear-by-linear association between the two sets of ordered categories, since αi1 and β j1 represent the standardized coefficients of the linear polynomials for the ith row and the jthe column, respectively. Similarly, a high value of, for instance, z 12 indicates that the linear trend of the rows (u = 1) is highly associated with the quadratic trend of the columns (v = 2). Best and Rayner (1996) show that z 11 is equal to the Pearson product-moment correlation for natural scores in the symmetric case and that it is equal to the Spearman rank correlation coefficient if the category scores are midranks. 4.5.2. Testing Total Inertia Based on the definitions of the BMD, the total inertia of the points in the spaces R I and R J can be expressed in terms of the squared elements of the generalized correlation matrix J −1 I −1   2 Inertia () = z uv . (14) u=1 v=1

Thus, in matrix form, the total inertia may be written as      Inertia () = trace Z Z = trace ZZ = trace 2 .

(15)

  From the trace Z Z , we obtain the  1 inertia values associated with the J − 1 column  J− polynomials. Similarly, from the trace ZZ , we get the I − 1 inertia values associated with the I − 1 row polynomials. For example, the fit or inertia of the linear component for the row variable  I −1 2 2 , and the fit or inertia of the linear component for the column variable z u1 = z •1 is equal to u=1  J −1 2 2 . Thus, from the matrix of generalized correlations, we can obtain is equal to v=1 z 1v = z 1• the inertia of each polynomial axis by adding the sum-of-squares of z uv over either u or v. As a consequence, the symmetric and asymmetric inertias (X 2 /n and τnum ) can be partitioned into sources of inertia associated with both row and column polynomials. The statistical significance of the column (or row) sources of inertia in the symmetrical case 2 ) against χ 2 with (I − 1) (or (J − 1)) degrees of is tested by comparing n × z •2 j (or n × z i• α freedom; see Beh (1997). In the asymmetric case, the row (or column) sources of inertia are −1) 2 (or (n−1)(I −1) × z 2 ) and × z i• tested by making a comparison between the statistic (n−1)(I •j τden τden χα2 with (J − 1) (or (I − 1)) degrees of freedom; see Lombardo et al. (2007) and D’Ambra, Beh and Amenta (2005).

5. Coordinates and Distances In general, graphical displays allow the researcher to identify row and/or column categories that are relatively similar or different based on their proximities to one another. One of the primary reasons for considering correspondence analysis when investigating symmetric or asymmetric associations between the categorical variables is that graphical summaries of the data can be made; the same remains true when orthogonal polynomials are used. For Emerson polynomials, graphical displays need to be interpreted differently from those based on the (generalized) singular vectors because the polynomial axes are not biorthogonal, and as polynomials they have intrinsic meaning. The axes allow the analyst to represent the variation between categories in terms of linear trends using the first axis and non-linear trends using the higher-order axes. Adopting the terminology used by Greenacre (1984), we will refer to standard polynomial coordinates for coordinates purely based on the polynomial axes themselves. Similarly, we will

PSYCHOMETRIKA

refer to principal polynomial coordinates if the orthogonal polynomials are weighted by the generalized correlations. Separate graphs are needed for rows and columns when they are based on principal polynomial coordinates, because the inertia of the axes in R I and R J is different. In order to portray row and column categories simultaneously in a graph, we introduce in this section polynomial biplots which are based on row standard polynomial coordinates and column principal polynomial coordinates, or vice versa. 5.1. Coordinates in One-Way Ordered Correspondence Analysis The graphical representation derived from a one-way ordered correspondence analysis via the hybrid decomposition needs careful consideration as one variable is modelled using generalized singular vectors, and the other is modelled using orthogonal polynomials. In this section, we discuss for one-way ordered correspondence analysis the construction of polynomial biplots. 5.1.1. Column-Metric Preserving Biplots In polynomial column-metric preserving biplots for contingency tables where the column variable consists of ordered categories, the standard row coordinates are defined as 1/2

F = VI A



1/2 f im = vi• aim ,

and the column principal polynomial coordinates are  G=

−1/2 ´ ∗  W J BZ

=

1/2  V I A

g jm =

I 

 1/2 vi• aim πi j

.

i=1

Both the row and column coordinates depend only on the singular vectors A for the row variable. This means that the coordinates of the column-isometric biplot for a one-way ordered correspondence analysis are exactly the same as those of the classic technique. Thus, when the ordered variable is the column variable, there is no gain in using column-isometric biplots to portray features of one-way ordered analysis. For this reason, we recommend using row-metric preserving biplots. 5.1.2. Row-Metric Preserving Polynomial Biplots In row-metric preserving biplots for contingency tables where the column variable consists of ordered categories, the row principal polynomial coordinates are defined as ⎛ 1/2 ˜ ´ B = W J B ⎝ f iv = F = AZ∗ = V I 

M 

∗ aim z mv =

m=1

J 

⎞ w• j β jv πi j ⎠ ,

(16)

j=1

and the column standard polynomial coordinates are coincident with the column orthogonal polynomials, that is −1/2

G = WJ

B´ = B



 g jv = β jv .

(17)

Here, f iv is the profile coordinate of the ith row category along the vth polynomial axis and it is the (i, v)th element of F. In contrast to the column-isometric biplot, it is evident that the

ROSARIA LOMBARDO ET AL.

row principal polynomial coordinates depend on the column orthogonal polynomials, i.e. they depend on the location and dispersion variation that exists between the ordered column variable categories. For example, when a row coordinate on the first axis is very far from the origin, then it indicates that the row category is associated with those ordered column categories whose scores are higher/lower than the mean. In addition, when a row coordinate on the second axis is far from the origin then the corresponding row is associated with those ordered column categories whose variances (scores on the second axis) are large. Thus the axes, principal coordinates, and their proximity to the origin using ordered correspondence analysis have a distinctive interpretation that the results of the GSVD do not provide. 5.2. Coordinates in Two-Way Ordered Correspondence Analysis In this section, we discuss for two-way ordered correspondence analysis the computations of polynomial biplots, in which the z uv values (using the generalized singular values, see Equation (13)) play a crucial role. 5.2.1. Coordinates of Polynomial Biplots In polynomial biplots, the row and column categories can be displayed in a single graph using principal and standard polynomial coordinates since the row and column scores are computed with respect to the same set of orthogonal polynomial axes. In the following section, we only describe the column-metric preserving polynomial biplot, as the row-metric preserving polynomial biplot is identical to one described in Section 5.1.2. 5.2.2. Column-Metric Preserving Polynomial Biplots For a column-metric preserving polynomial biplot, the row standard polynomial coordinates are given by the row polynomials, that is 1/2 F = V I A´ = A ( f iu = αiu ) ,

and the column principal polynomial coordinates are  G=

−1/2 ´  W J BZ

= BZ =  A g ju =

I 

 πi j αiu .

(18)

i=1

As in row-metric preserving polynomial biplots, both the row and column polynomial coordinates depend on the same set of polynomials; here the association is interpreted using the row polynomials. 5.2.3. Principal Polynomial Coordinates and Inertia From these results, it is evident that the row principal polynomial coordinates depend on the column orthogonal polynomials, while the column principal polynomial coordinates depend on the row orthogonal polynomials. Thus, the latter depend on the location and dispersion variation that exists between the ordered row variable. As is the case with the traditional biplot, the inner product of these row and column coordinates in a polynomial biplot allows for the reconstruction of the data profiles. Standard biplots have been shown to be suitable for depicting the measure of the increase in predictability of the rows given the columns in classic NSCA (Kroonenberg & Lombardo 1999), and the rules for the interpretation of the column-isometric biplot (and conversely for the row-isometric biplot) apply here also. In particular, – Only distances between column points can be properly interpreted from a polynomial biplot.

PSYCHOMETRIKA

– The larger the inner product between row and column categories, the larger the predictive power of that column category for that row. – The origin represents the row margin; responses with large distances from the origin show good increase in predictability. Polynomial biplots provide a straightforward visual interpretation of the symmetric and asymmetric association of one variable given the (linear and quadratic) trends of the other variable. The reason for this is that both the row and column coordinates are referred to the same polynomial axes. When using the orthogonal polynomial axes, the total inertia for both X 2 /n (CA) and τnum (NSCA) can be expressed in terms of their principal polynomial coordinates (see Equation (6))) such that Inertia() = ||||2V I ,W J =

I J  

−1 2 vi• f iv =

v=1 i=1

J I  

w• j g 2ju .

u=1 j=1

5.2.4. Distances The distance of the jth column (or ith row) category from the origin is an important part in identifying where the strength (or lack of) between the two variables exists for determining the increase in predictability. Recall that column points situated close to the origin do not contribute strongly to the increase in predictability between the variables. Similarly, those columns that are strongly related to specific rows will be far from the origin. In particular, the squared Euclidean distance of the jth predictor category to the origin is identical to the squared Euclidean distance of the principal polynomial coordinates such that d J2

( j, 0) =

I  i=1

 vi•

pi j − pi• p• j

2 =

I −1 

g 2ju .

u=1

This distance indicates the general influence of the jth column category to the overall inertia. The magnitude of the squared distances between the profiles of the jth and j  th column categories identifies those column categories that are similar or different. The squared Euclidean distance between the jth and j  th column category is defined as I    d J2 j, j  = vi•



  2  I −1 pi j  pi j − pi• − − pi• = (g ju − g j  u )2 . p• j p• j  u=1

i=1

Therefore, two column categories that have a similar contribution to the association (whether it be symmetric or asymmetric) will be situated closely to one another in a symmetric plot. Similarly, the weighted squared Euclidean distance between the ith and i  th centred row profiles is identical to squared Euclidean distance between their principal polynomial coordinates such that J    d I2 i, i  = w• j j=1



  2  J −1 pi  j pi j − pi• − − pi  • = ( f iv − f i  v )2 . p• j p• j v=1

The distance of a row, or column, point along each dimension can also be determined. In particular, for the vth dimension, the squared distance of the ith and i  th row profile coordinates on that dimension is   d I2 i, i  | v = ( f iv − f i  v )2 ,

ROSARIA LOMBARDO ET AL.

so that the sum of these distances for each axis is just the distance between the two profiles in the plot. Therefore, a summary of the rules of interpretation for points along the axes of a polynomial analysis can be made and include: – Two row (column) categories situated closely to one another along a particular axis will depend on the column (row) polynomial trend associated with that axis. – When a coordinate of the linear polynomial is negative then the row (column) is affected by a column (row) whose score is lower than the mean. Similarly, if the coordinate is positive, then the column (row) is affected by a column (row) whose score is higher than the mean of the variable. – Furthermore, if the most extreme coordinate of the non-linear (for example quadratic) polynomial is negative then the row (column) is affected by a column (row) whose score represents a minimum of the column (row) non-linear trend. When the most extreme value is positive, then the non-linear distribution will have a peak or maximum. A practical demonstration of these rules will be presented in the examples of Section 6.

6. Examples of Orthogonal Polynomials for Ordered Contingency Tables To demonstrate the applicability of the correspondence analysis of a two-way contingency table with ordered categories using orthogonal polynomials, we present the following two examples. The first example considers the correspondence analysis of a two-way ordered contingency table from the database of the Spanish National Health Survey in 1997 (Greenacre 2007, chap. 6, p. 41). The second example is a study of shoplifting in the Netherlands (Israëls 1987) which illustrates the non-symmetric correspondence analysis of a one-way ordered table. 6.1. Perceived Health in Spain: A Two-Way Ordered Symmetric Correspondence Analysis 6.1.1. Data As an illustrative example of two-way ordered symmetric correspondence analysis, we consider a straightforward cross-tabulation generated from the database of the Spanish National Health Survey in 1997, analysed earlier by Greenacre (2007) using classical correspondence analysis. The data consist of a cross-tabulation of perceived health and age of a Spanish sample (Table 1), and their symmetric association will be modelled using orthogonal polynomials. Perceived health was judged in five categories Very good, Good, Moderate, Bad and Very bad, and we will use the natural scores for this variable, i.e. 1, 2, 3, 4 and 5. Age was divided into seven class or groups for which we will use the mean age in a group as scores, i.e. 20, 30, 40, 50, 60, 70 and 80 (rounded values). 6.1.2. Purpose of the Study: Perceived Health Patterns We start by assuming that, as in Greenacre’s treatment, there exists a symmetric relationship between the row and column categories of Table 1. A simple representation would be that perceived health is a linear function of the age. However, as the relationship between the ordered variables is symmetric, we could also consider age as a linear function of perceived health. Such linear patterns can be well described by first-order polynomials. A more complex situation would exist if a perceived health state is specific to a particular age group, which peaks for that group and is lower for both younger and older persons. Such patterns can be modelled with second-order polynomials. If there are several peaks in the relationship between perceived health and age, we would need higher-order polynomials as well. Alternatively, given a particular age group, one may investigate how the perceived health state is distributed over its categories. To determine how many and which polynomials should be

PSYCHOMETRIKA Table 1. 1997 Spanish National Health Survey: Cross-classification of perceived health versus age.

Age groups (in years) Perceived health

16–24

25–34

35–44

45–54

55–64

65–74

75+

Total

Very bad Bad Moderate Good Very good Total

3 5 84 402 145 639

2 13 74 414 112 615

4 24 82 331 80 521

6 22 102 231 54 415

12 53 119 219 30 433

4 35 110 125 18 292

8 25 65 67 9 174

39 177 636 1789 448 3089

Table 2. Centred column profiles of Table 1.

Age groups (in years) Perceived health

16–24

25–34

35–44

45–54

55–64

65–74

75+

αi1

αi3

Very bad Bad Moderate Good Very good β j1 g j1 g j3

−0.01 −0.05 −0.07 0.05 0.08 −1.24 0.35 −0.04

−0.01 −0.04 −0.09 0.09 0.04 −0.72 0.28 −0.10

0.00 −0.01 −0.05 0.06 0.01 −0.18 0.12 −0.06

0.00 0.00 0.04 −0.02 −0.01 0.37 −0.06 0.03

0.02 0.07 0.07 −0.07 −0.08 0.92 −0.40 0.03

0.00 0.06 0.17 −0.15 −0.08 1.47 −0.48 0.25

0.03 0.09 0.17 −0.19 −0.09 2.02 −0.66 0.12

−3.46 −2.22 −0.98 0.26 1.51

5.33 1.09 1.05 −0.61 0.96

used in modelling, we need to look at the contributions to the inertia of each of the polynomials. Of course, after establishing that there is a significant interaction in the table to start with. For this example, we will present a compact description of the results to get a feel of the practice of the modelling procedure. In the next example, we will go into more detail. 6.1.3. Visual Inspection The patterns in a table-like Table 1 are difficult to inspect visually because of the unequal marginal totals. Therefore, there is merit in transforming the frequencies into deviation scores, in particular into scores which show the increase in predictability, here via centred column profiles, the πi j as defined in Equation (1). They express to what extent the prediction of a perceived health status is increased by knowing the age of person. One could create centred row profiles as well, but for understanding the patterns in the table, only one of the possibilities is necessary. Using Table 2 with the centred column profiles, one may inspect the differences in perceived health for a particular age group and for each perceived health category, the differences across age groups. We will mainly concentrate on the latter view. Note that in the last columns of this table, we report the values of the first and third row polynomial (αi1 and αi3 ). In the last rows of Table 2, the linear column polynomial (β j1 ) and the principal column coordinates on the first and third polynomial (g j1 and g j3 ) are given. 6.1.4. Evaluation of the Inertia The total inertia, or strength of association, in the present table is tested by the X 2 statistic. Its value is 392.4 with 24 degrees of freedom, and thus, there exists a strong association between the two variables. Therefore, it is meaningful to analyse further the nature of this association. The next task is to sort out both for rows and columns how many and which polynomials are necessary to model the association. There are two ways to evaluate

ROSARIA LOMBARDO ET AL. Table 3. Generalized correlations between the row and column polynomials of Table 1.

Column polynomials v=1

v=2

v=3

v=4

v=5

v=6

−0.33 0.00 0. 08 0.01

−0.03 0.03 0.03 0.03

0.04 0.00 −0.03 −0.01

0.00 0.03 −0.01 0.02

−0.02 0.02 −0.03 −0.01

−0.03 0.02 −0.02 −0.04

Row polynomials u u u u

=1 =2 =3 =4

Note: The generalized correlations, i.e. the z uv values, are computed according to Equation (13). The most sizeable correlations are highlighted in bold. Table 4. Sources of variation of inertia for perceived health.

u=1 u=2 u=3 u=4 Total

u=1

u=2

343.72 8.00 −85.87 12.68

8.00 7.78 0.71 1.95

u=3

u=4

% Diagonal/total

P

Source

(88%) (2%) (8%) (2%) (100%)

Modelling Trends in Ordered Correspondence Analysis Using Orthogonal Polynomials.

The core of the paper consists of the treatment of two special decompositions for correspondence analysis of two-way ordered contingency tables: the b...
601KB Sizes 0 Downloads 5 Views