PSYCHOMETRIKA

2013 DOI : 10.1007/ S 11336-013-9348- Y

AN ANALYSIS OF ITEM RESPONSE THEORY AND RASCH MODELS BASED ON THE MOST PROBABLE DISTRIBUTION METHOD

S TEFANO N OVENTA ASSESSMENT CENTER, UNIVERSITY OF VERONA

L UCA S TEFANUTTI FISSPA, UNIVERSITY OF PADOVA

G IULIO V IDOTTO DEPARTMENT OF GENERAL PSYCHOLOGY, UNIVERSITY OF PADOVA The most probable distribution method is applied to derive the logistic model as the distribution accounting for the maximum number of possible outcomes in a dichotomous test while introducing latent traits and item characteristics as constraints to the system. The item response theory logistic models, with a particular focus on the one-parameter logistic model, or Rasch model, and their properties and assumptions, are discussed for both infinite and finite populations. Key words: Rasch model, item response theory, most probable distribution.

1. Introduction Logistic models are widely applied in several fields of science, from physics and chemistry to social and behavioral sciences. Typical examples are the Fermi–Dirac distribution in physics and the autocatalytic reactions in chemistry (Huang, 1987), while examples in psychology are the psychometric curve, pairwise comparisons in individual choice behavior, and the assessment and description of latent traits, mental abilities, and attitudes (Luce, 1959; Rasch, 1960; Lord & Novik, 1968). Item response theory (IRT) and Rasch models (RM) are the most important methodological frameworks for modeling and verifying the process of measurement in psychological testing. Logistic IRT models and RM are formally close to the point that the latter is often considered a one-parameter IRT model (Fischer & Molenaar, 1995). In both cases, the item response function (the curve describing the probability of a response given a specific latent trait and item characteristic) has a logistic shape that shows fundamental features of continuity, strict monotonicity, and asymptotical behavior (Lord & Novik, 1968; Fischer & Molenaar, 1995). A formal derivation of RM requires additional assumptions like local stochastic independence, sufficiency of the statistics, or the application of general criteria like specific objectivity. The existence of a dense set of items is also needed to account for interval scales (Fischer & Molenaar, 1995). In other fields of science the derivation of logistic models is often achieved by means of different methodologies. For instance, in statistical mechanics a classical derivation is based on the most probable distribution (MPD) method (Clinton & Massa, 1972; Huang, 1987; Landsberg, 1954). In principle, this method is a way of obtaining a probability distribution by Requests for reprints should be sent to Stefano Noventa, Assessment Center University of Verona, Verona, Italy. E-mail: [email protected]

© 2013 The Psychometric Society

PSYCHOMETRIKA

constraining the possible outcomes of the system with some external requirement on the nature of the system itself. The method is closely related to the principle of maximum entropy of Bayesian probability theory, in which the most probable distribution of the system is the one maximizing the information entropy (Jaynes, 1957; Jaynes, 1968; Bernardo & Smith, 1994). As an example, the method is applied in physics to derive the distribution of a gas. The average number of particles with a certain energy is obtained by accounting for all the possible ways in which these particles can distribute themselves inside the energy levels and constraining the total number of equivalent microscopical states by means of requirements on the total energy and number of particles. The constraints are introduced using the Lagrangian multiplier method and the result is the distribution that maximizes the number of equivalent microscopical states, or equivalently the MPD among all the possible ones (Huang, 1987). Hence, if a psychological test is conceived as a distribution of responses with several possible outcomes that are constrained by means of requirements on latent traits and item characteristics, the probability of a correct response can be derived. The method is applied with the twofold purpose of analyzing an alternative way of deriving the logistic IRT models and of exploring if such a derivation helps improving the understanding of their measurement properties. Whether psychological attributes should be considered qualitative rather than quantitative has been indeed argued (Michell, 1990, 2009). Confirmation of a quantitative scale can be done using additive or polynomial conjoint measurement (Luce & Tukey, 1964; Tversky, 1967). Necessary and sufficient conditions for additivity in finite structures have been given in the form of a denumerable set of independence or cancellation conditions (Scott, 1964; Adams, 1965). Conditions for conjoint attributes whose components are a mixture of finite and infinite systems have also been given (Fishburn, 1981; Gonzales, 2000). Similarities and differences between RM and the theory of fundamental measurement have been recently debated (see, for instance, Perline, Wright, & Wainer, 1979; Karabatsos, 2001; Kyngdon, 2008, 2011). Mistaking, however, a qualitative attribute for a quantitative one has been defined as “the psychometricians’ fallacy” (Michell, 2009). Indeed, the total number of possible ordered relations of a conjoint attribute is not exhausted by the strict order of its components (Michell, 2009). Higherorder cancellation conditions must be satisfied to ensure a quantitative nature (Scott, 1964; Michell, 2009). After introducing notation for population, latent traits, item characteristics, and probability, equivalence classes based on latent traits and item characteristics are used to partition the population. The number of possible outcomes in each of these classes is given, and a general family of constraints based on latent traits and item characteristics is introduced. Finally, the MPD method is applied to both cases of finite and infinite populations and its implications are discussed. An estimation of the finite population’s size, at which the results of an infinite population can be applied, is also given.

2. Basic Definitions and Notation 2.1. Population of a Test Given a set of n subjects, indexed by ν ∈ {1, . . . , n}, and a set of m items, indexed by i ∈ {1, . . . , m}, the response matrix of a test is an n × m matrix {xνi }, where each element xνi

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

corresponds to the response of the νth subject to the ith item, as in ⎞ ⎛ x11 . . . x1i . . . x1m ⎜ .. .. .. ⎟ ⎜ . . . ⎟ ⎟ ⎜ ⎜ {xνi } := ⎜ xν1 . . . xνi . . . xνm ⎟ ⎟. ⎟ ⎜ . . . .. .. ⎠ ⎝ .. xn1 . . . xni . . . xnm

(1)

In the dichotomous case, each xνi is the realization of a random variable Xνi , which can be either one or zero (e.g., correct or wrong responses). The generalization of response matrix (1) to the population can be either a finite or an infinite matrix {xνi } with ν ∈ S and i ∈ I, where S and I are, respectively, the population sets of all the possible subjects and items. Any real test is applied to samples drawn from these populations. 2.2. Latent Traits, Item Characteristics and Equivalence Classes The fundamental idea behind the definition of assessment in unidimensional IRT is that the items possess some characteristic that allows one to measure a subject’s latent trait. Thus, in the end, latent traits and item characteristics should be comparable on a one-and-the-same scale of measurement. Let then P = S ∪ I be the union of the populations of subjects and items. Let also  be a weak order over P that allows comparisons between subjects ν ∈ S and items i ∈ I. The pair P,  can be regarded as a relational empirical system (Suppes & Zinnes, 1963; Pfanzagl, 1971; Luce, Krantz, Suppes, & Tversky, 1990). An equivalence relation and a strict order relation are thus defined, for any x, y ∈ P, as x∼y



xy

and y  x,

(2)

x≺y



xy

and y ∼ x.

(3)

A common scale for latent trait and item parameters can then be set by considering a relational numerical system M, ≤, with M ⊆ R, and an homomorphism φ : P → M such that, for any x, y ∈ P, x≺y



φ(x) < φ(y),

(4)

x∼y



φ(x) = φ(y).

(5)

The triple P, M, φ defines the measurement scale. Its type is characterized by the set of admissible transformations of the homomorphism φ (Luce et al., 1990). Equivalence classes of all the subjects and of all the items possessing the same position on the measurement scale can also be defined as  Sα = ν ∈ S : φ(ν) = α ∈ A ⊆ M , (6)  Iδ = i ∈ I : φ(i) = δ ∈ D ⊆ M , (7) where α ∈ A and δ ∈ D are the values associated with the levels of the latent trait and of the item characteristic. Let N = A ∩ D be their intersection. If A = D, there is a one-to-one relation between latent trait and item characteristic since each latent trait value α corresponds to an item characteristic value δ. If instead A = D, with N = ∅, there is a non-symmetry between subjects and items. For instance, while subjects’ latent traits are easily thought of as values ordered along a

PSYCHOMETRIKA

metric dimension A, often only a finite universe D of items can be defined (Fischer, 1995; Fischer & Molenaar, 1995). Suppose that A and D have at least countable cardinalities A and D, then there are A equivalence classes for latent traits and D equivalence classes for item characteristics. Let j ∈ {1, . . . , A} and k ∈ {1, . . . , D} be the indexes spanning the elements of A and D, then the equivalence classes can be labeled as Sαj and Iδk . Each element of the intersection N ⊆ M is instead associated with both latent trait value α and item characteristic value δ on the scale P, M, φ, and labeled by j, k ∈ {1, . . . , N} where N = |N |. 2.3. Probabilities for Finite and Infinite Populations A response matrix in the population can be ordered by considering increasing levels of the latent trait and the item characteristic. This is equivalent to a clustering of the matrix into different blocks of elements xjνik where the supra indexes refer to a specific subject ν and a specific item i, while the sub-indexes refer to a specific value of latent trait, αj , and a specific value of the item characteristic, δk .

The total number of correct responses within each cluster is given by nj k = ν,i xjνik . Considering an ordering based on the values of the set N , the corresponding N × N square matrix {nj k } can be defined as ⎛

n11 ⎜ .. ⎜ . ⎜ {nj k } := ⎜ ⎜ nj 1 ⎜ . ⎝ .. nN 1

...

n1k .. .

...

...

nj k .. .

...

...

nN k

...

⎞ n1N .. ⎟ . ⎟ ⎟ nj N ⎟ ⎟. .. ⎟ . ⎠ nN N

(8)

Notice that such a matrix is square only if the set N of latent trait and item characteristic values is considered. A rectangular matrix is obtained otherwise. Let Xjνik be the response variable corresponding to the subject ν with latent trait αj , and the item i with characteristic δk , of which xjνik is the realization. Let also gj k be the total number of cells in the specific (j k)th cluster. Then the ratio nj k /gj k gives the proportion of cells that are filled in that cluster or, equivalently, the number of subjects giving a correct response to a certain class of items. Hence, the probability of drawing from the population a single subject with latent trait in the equivalence class of αj , answering to an item with a characteristic value of δk , is given for a finite population by nj k P Xjνik = 1; αj , δk := gj k

∀ν ∈ Sαj , i ∈ Iδk ,

(9)

or, for an infinite population, by nj k P Xjνik = 1; αj , δk := lim gj k →∞ gj k

∀ν ∈ Sαj , i ∈ Iδk .

(10)

Notice that probabilities (9) and (10) are defined for all ν ∈ Sαj and i ∈ Iδk ; and, hence, they are independent of the specific subject or item considered in the cluster, but depend only on the values of the latent trait and the item characteristic. Each subject in the (j k)th cluster has the same a priori probability of giving a correct answer. That is, subjects with the same value of latent trait have the same probability of answering to items with the same value of item characteristic. n Besides, while probability (9) takes values on the discrete set {0, . . . , gjj kk , . . . , 1}, probability (10) is defined on subsets of the interval [0, 1] for any value of α ∈ A and δ ∈ D, or α, δ ∈ N .

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

It is also important to notice that the law of total probability implies P Xjνik = 1; αj , δk + P Xjνik = 0; αj , δk = 1,

(11)

and it is usually expected to be 1 P Xjνik = 1; αj = δk = , 2

(12)

so that, if one defines Pj k := P (Xjνik = 1; αj , δk ) for simplicity of notation, an N × N matrix of probabilities can be directly derived by matrix (8) in the case of α, δ ∈ N , and written as ⎛

1 2

⎜ .. ⎜ . ⎜ {Pj k } := ⎜ ⎜ Pj 1 ⎜ . ⎝ .. PN 1

...

P1k .. .

...

...

1 2

...

.. .

. . . PN k

...

⎞ P1N .. ⎟ . ⎟ ⎟ Pj N ⎟ ⎟. .. ⎟ . ⎠

(13)

1 2

Instead, in the general case of a non-symmetry between α ∈ A and δ ∈ D, matrix (13) has dimension A × D. If, for instance, there were only a finite universe of items, there could be values of latent trait α ∈ A \ N (corresponding to additional rows in the previous matrix) that are not associated with any item characteristic δ ∈ N . As an example, the strict order α1 < α2 < α3 , with the only items δ1 < δ3 , would imply that subjects of the second equivalence class are systematically better performers than the first class (P21 > P11 = 12 ) and worse performers than the third class (P23 < P33 = 12 ). Finally, an interpretation to the quantities nj k and Pj k has been given in the so-called random sampling view, so that Equations (9) and (10) define the proportions of subjects (with the same position on the latent trait scale) answering to a specific class of items. A different interpretation can be given in the stochastic subject view (Lord & Novik, 1968; Holland, 1990) by considering, in place of the equivalence class Sα , the class of all the repetitions of all the subjects with the same position on the latent trait scale, so that Equations (9) and (10) can be interpreted as the propensity distribution of a subject possessing a latent trait value αj and answering to an item with characteristic value δk . In the most general case, then, it could be considered xjνir k that is the rth repetition of the subject ν with latent trait αj to the item i with characteristic δk ; however, the supra-indexes do not influence probabilities (9) and (10), which depend only on the sub-indexes.

3. The Most Probable Distribution for a Test The MPD method relies on the idea of deriving the probability distribution by finding the values nj k that maximize the total number of possible outcomes when the system is bounded by some constraint. A first step is to calculate this number of possible outcomes and then to define sound constraints to relate its variations to the values of latent trait and item characteristic. 3.1. Permutations The total number of different ways in which nj k correct responses can fill the gj k cells of the cluster is given by the binomial coefficient

 gj k ! gj k = , (14) nj k nj k !(gj k − nj k )!

PSYCHOMETRIKA

so that the number of possible ways in which all the clusters can be filled is given by

   gj k gj k ! W {nj k } = = nj k nj k !(gj k − nj k )! jk

(15)

jk

and is often called “multiplicity”. It is also common, in the MPD method, to consider a monotonic transformation like the natural logarithm, 

 gj k ! . (16) ln ln W {nj k } = nj k !(gj k − nj k )! jk

The logarithmic transformation is taken for a double purpose; on one hand, it is a monotonic transformation that preserves maximum and minimum values of its argument W ({nj k }); on the other hand, it is an homomorphism from

a multiplicative group to an additive one, so it is useful  to turn the product j k into a sum j k for handiness of calculation. Moreover, the logarithm of the number of equivalent states of the system, S = k ln W , is known as Boltzmann’s entropy and its maximum characterizes a situation in which there is the least quantity of information (or equivalently, the maximum uncertainty or randomness) about the system (Huang, 1987). The approach is strictly connected to the maximum entropy principle in Bayesian theory in which the chosen probability distribution among all the trial distributions is the one which maximizes information entropy (Jaynes, 1957, 1968). 3.2. Constraints The number of possible outcomes defined by multiplicity (15) is independent of the specific values of latent trait and item characteristic associated with each cluster. In general, however, multiplicity is constrained by α and δ. Indeed, the number of correct responses nj k is expected to decrease with increasing values of the item characteristic δ and increase with increasing values of the latent trait α. In principle, such a constraint can be modeled as an implicit function of all the scale values αj , δk , and of all the numbers nj k of cells filled in the clusters. Namely, H ({nj k }, {αj }, {δk }) = μ, with μ ∈ R. A similar shape allows, however, for interactions between different clusters: a latent trait αj and an item characteristic δk associated with a particular cluster j, k could affect the number of responses in a different cluster j  , k  , and vice versa. Since the number of correct responses in any cell is usually expected to depend only on the performance of subjects with a specific value of a latent trait to an item with a specific characteristic, the number of correct responses in each cluster should be independent of the latent trait and item characteristic associated with other clusters. In what follows, an additive independence is then assumed, that is,  h(nj k , αj , δk ) = μ. (17) H {nj k }, {αj }, {δk } = jk

Any hj k := h(αj , δk , nj k ) is then a generic constraint for the specific (j k)th cluster and is assumed to be a monotonic function of αj and δk . Notice also that the additive independence (17) can also be given as a logarithmic transformation of a multiplicative independence assumption. Given the generic constraint  (18) h˜ j k , H˜ = jk

the natural logarithm allows a switch from a multiplicative structure to an additive one:    ln h˜ j k = hj k . h˜ j k = H = ln H˜ = ln jk

jk

jk

(19)

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

3.3. Derivation of the Most Probable Distribution The MPD can now be calculated by maximizing the natural logarithm of the total number of possible outcomes (16) under the effect of constraint (17). Since this is an extremality problem under external constraints, the Lagrangian multipliers method can be applied; and the maximization of (16) under a generic constraint becomes the unconstrained maximization of the function Λ {nj k }, λ = ln W {nj,k } + λ H {αj }, {δk }, {nj k } − μ , (20) where λ ∈ R is the Lagrangian multiplier. Notice that the sets {αj } and {δk } enter the equation as parameters. By substitution of multiplicity (15) and of the additive constraint (17),

 

  gj k ! Λ {nj k }, λ = ln hj k − μ . (21) +λ nj k !(gj k − nj k )! jk

jk

However, since the variables nj k are positive integer quantities, with nj k ∈ {0, . . . , gj k }, the previous equation is not differentiable; and its maximum cannot be obtained by simply considering its derivatives. Nevertheless, following the same derivation given by Clinton & Massa (1972), if the function (21) is in a maximum for a given set of {nj k }, for a variation of nj k to nj k ± 1, it must be: Λ(. . . , nj k , . . . , λ) ≥ Λ(. . . , nj k + 1, . . . , λ),

(22)

Λ(. . . , nj k , . . . , λ) ≥ Λ(. . . , nj k − 1, . . . , λ).

(23)

In particular, inequalities (22) and (23), once applied to Equation (21), yield, respectively (see Appendix A), a forward finite difference inequality and a backward difference inequality, related to upper and lower bounds in the proportion of correct responses into the (j k)th cluster, that is, 1−

1 gj k

exp(−λhj k )

1 + exp(−λhj k )

1 + g1j k nj k ≤ , ≤ gj k 1 + exp(−λ∇hj k )

(24)

where forward and backward differences of hj k are defined as hj k = h(αj , δk , nj k + 1) − h(αj , δk , nj k ),

(25)

∇hj k = h(αj , δk , nj k ) − h(αj , δk , nj k − 1).

(26)

Notice that condition (24) gives different results depending on the actual shape of the constraint hj k and whether the population gj k is infinite or finite. 4. The Infinite Population Case It is straightforward that in an infinite population, namely the limit gj k → ∞, Equation (24) and the definition of probability (10) yield 1 1 ≤ P Xjνik = 1; αj , δk ≤ , 1 + exp(−λhj k ) 1 + exp(−λ∇hj k )

(27)

so that, if the additional condition hj k = ∇hj k holds, the squeeze theorem allows one to define a value of probability shaped as a logistic model, P Xjνik = 1; αj , δk =

1 1 + exp(−λhj k )

∀ν ∈ Sαj , i ∈ Iδk .

(28)

PSYCHOMETRIKA

This form is unique up to the choice of the function h(αj , δk , nj k ). Besides, the Lagrangian multiplier λ is a scale factor common to each and every cluster, that is, for any pair of latent trait and item characteristic. Whether the condition hj k = ∇hj k is satisfied or not depends on the shape of h(αj , δk , nj k ). Hence, for a generic shape of hj k , probability is not uniquely defined. However, since probability should not depend on the specific number of correct responses nj k , the constraint hj k is expected to be linear in nj k . Indeed, the condition hj k = ∇hj k corresponds to the functional equation h(nj k + 1) − 2h(nj k ) + h(nj k − 1) = 0 on the integers nj k , and can be solved as a second-order linear homogeneous recurrence relation with constant coefficients. Since its characteristic polynomial is equal to λ2 − 2λ + 1 = 0 for a generic root λ, the solution takes the form h(nj k ) = c1 + c2 nj k , with c1 , c2 ∈ R. Hence, if the constraint h(αj , δk , nj k ) is linear in the variable nj k , the requirement hj k = ∇hj k is always satisfied. Namely, the requirement is satisfied by a shape like h(αj , δk , nj k ) = nj k h(1) (αj , δk ) + h(2) (αj , δk ),

(29)

or, in the short form, (1)

(2)

hj k = nj k hj k + hj k , (1)

(30)

(2)

where hj k and hj k are generic functions of the latent trait and item characteristic values. Notice that if the variable nj k were treated as continuous, condition hj k = ∇hj k would give a second-order linear homogeneous difference equation with constant coefficients. In such a case (1) (2) the solutions would be the same as (29) where both hj k and hj k could, however, be periodic functions on nj k , namely, h(m) (nj k ) = h(m) (nj k ± 1), with m = 1, 2. Notice also that, consider(2) ing a finite difference of Equation (29), the term hj k always cancels out so that it can be generally set equal to zero. This happens because a constraint that does not affect nj k does not affect the number of possible outcomes (15). Thus, Equation (29) leads to the logistic model P Xjνik = 1; αj , δk =

1 (1)

1 + exp(−λhj k )

∀ν ∈ Sαj , i ∈ Iδk ,

(31)

so that, using the short notation Pj k , the constraint h(1) j k is the logarithm of the odds, namely the logit,

 Pj k (1) ln = λhj k . (32) 1 − Pj k As already noted in Section 2.3, probability (10) is defined for all ν ∈ Sαj and i ∈ Iδk ; hence, each subject in the (j k)th cluster has the same a priori probability (31) of giving a correct answer, and such a probability depends only on the values of the latent trait and the item characteristic. Furthermore, probability (31) takes values into the interval [0, 1]. In particular, the cardinality of the set {Pj k } corresponds to the cardinality of the set of the values taken by the function (1) hj k . The latter, in its turn, depends on the cardinality of the Cartesian product A × D (or N × N (1)

whenever A = D). Indeed, each pair (αj , δk ) gives a value of hj k that corresponds to a value of (1)

Pj k . Hence, a weak order on hj k is a situation to which the theory of conjoint measurement applies (Luce & Tukey, 1964; Tversky, 1967; Krantz, Luce, Suppes, & Tversky, 1971; Luce et al., 1990). Moreover, probability Pj k is defined in the entire [0, 1] interval if at least a dense set of (1) real numbers is achieved by the constraint hj k so that probability can be extended by continuity to the entire interval [0, 1]. This requires the set N to be at least an infinite countable set, or the

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

set of latent trait A to be dense if only a finite universe of items D exists. In this latter case, however, the existence of latent trait values that do not correspond to any item in the population has to be hypothesized. Hence, in spite of the probabilities being theoretically defined in [0, 1], the actual discriminable values of latent trait should be those related to the finite set of values δ ∈ N as if a finite number of equivalence classes of items would imply a finite precision in measurement. One could wonder whether a more fine structure of A compared to D is an experimentally falsifiable hypothesis or not. 4.1. Further Requirements and Considerations on Probability General considerations about the shape of probability (31) lead to specific consequences for the constraint h(1) j k . For instance, the requirement for Pj k to be strictly increasing in αj and strictly decreasing in δk implies for the logit to be strictly decreasing in αj and strictly increasing in δk . That is, the monotonicity of constraint h(1) j k is required by the monotonicity of probability. Similarly, it is reasonable to assume that the probability of a correct response given specific values of latent trait αj and item characteristic δk is the same probability of a wrong response νi = with values of item characteristic δj and latent trait αk . That is, P (Xjνik = 1; αj , δk ) = P (Xkj 0; αk , δj ). Switching the values of the latent trait and the item characteristic leads to complementary events and the law of total probability (11) can be rewritten as Pj k + Pkj = 1, which implies for the logit to be antisymmetric, that is, h(1) (αj , δk ) = −h(1) (αk , δj )

or in the short form

(1)

(1)

hj k = −hkj .

(33)

Notice, however, that such a result holds only for α, δ ∈ N since Pkj is not defined for a nonsquare matrix, a situation in which there could be values αj ∈ A that do not correspond to any δj ∈ D. Nevertheless, if condition (33) holds and δk = αj (or equivalently j = k), then it is straight(1) forward to verify that Pjj = 12 and hjj = 0. That is, equivalent levels of latent trait and item characteristic lead to correct responses half of the times as required by hypothesis (12). As a further consequence,    (1) (1) (1) nj k hj k = njj hjj + nj k hj k μ= j =k

jk

=

 j >k

(1)

nj k hj k +

 j k (2)

so that, since nj k is in general expected to be different than nkj , it is μ = 0 even when hj k = 0, where μ is the actual value taken by constraint (17). Finally, it is important to notice that hypothesis (12) is actually not a real hypothesis but the solution of Equation (21) when there are no constraints involved. It is indeed straightforward to verify that (for an infinite population) the solution of (21) is given by nj k = gj k /2 and states that the condition under which the maximum number of possible outcomes is achieved is the condition of maximum randomness in which half of the population gives a correct response. This is also the equilibrium solution in which the maximum entropy corresponds to the maximum uncertainty about the system (Huang, 1987). Similarly, a logistic shape of probability like (31) is the distribution accounting for the minimum amount of information under the constraints defined by (17). The central idea behind this approach is that other probability distributions might be possible, but they describe nonequilibrium solutions; that is, they depict different distributions of responses {xjνik } whose occurrences fluctuate around the average values {nj k } that maximize Equation (21). In statistical

PSYCHOMETRIKA

mechanics these fluctuations are considered negligible in the infinite population limit (Huang, 1987). It is also worth noting that a logistic distribution is achieved independently of the nature of the constraint. Indeed, the logistic function is the inverse parameter mapping for the exponential family of Bernoulli distributions, and exponential families can be derived as maximum entropy solutions in the presence of linear constraints on the expected values of their sufficient statistics (Kagan, Linnik, & Rao, 1973). Hence, several logistic IRT models can be derived (see Appendix A and Sect. 4.4) by using the MPD method, but such a derivation does not account for their measurement properties. However, since a specific choice of the constraint h(1) (αj , δk ) leads to a unique model of probability, its measurement properties should establish the scale level of latent traits and item characteristics. Indeed, a weak order on the constraint h(1) (αj , δk ) appears to be a situation in which conjoint measurement can be applied to assess the quantitative nature of the attributes underlying the scale value parameters α and δ. In what follows, the Rasch model is discussed as an example derived by requiring specific objectivity on the system. 4.2. Rasch Model for an Infinite Population A general statement about “odds” is that they can be expressed as the ratio between two quantities, Aj k and Dj k , that are monotonic functions of the latent trait and the item characteristic, namely, Oj k :=

Pj k Aj k = . 1 − Pj k Dj k

(35)

A stronger common assumption in IRT and RM is that Aj k := Aj is independent of the item characteristic while Dj k := Dk is independent of the latent trait, thus yielding Oj k =

Pj k Aj = 1 − Pj k Dk

such that



Pj k = α˜ j − δ˜k , ln 1 − Pj k

(36)

where α˜ j = ln Aj and δ˜k = ln Dk are usually considered the latent trait and item characteristic parameters. The previous requirement can also be considered a strong form of specific objectivity; that is, the relation between the parameters of two subjects with different latent trait values is independent of the items’ parameters, and vice versa (Rasch, 1960, 1972). Indeed, the ratio between the odds Oj k and Oj  k is independent of the item index k, while a summation of the logits leads to the difference α˜ j − α˜ j  . From a functional equations perspective, the logit (36) obeys Cantor’s first equation, that is f (x, y) + f (y, z) = f (x, z), while odds (35) obeys Cantor’s second equation, f (x, y)f (y, z) = f (x, z) (Aczel & Dohmbres, 1989). In particular, equating logits (32) and (36) leads to 1 1 h(1) (αj , δk ) = (α˜ j − δ˜k ) = g(αj ) − g(δk ) , λ λ

(37)

since the solution of Cantor’s first equation has the form f (x, y) = g(x) − g(y), for an arbitrary continuous function g. Thus, there is no a priori rationale to identify the latent trait αj and item characteristic δk , used to order subjects and items in the population, with the parameters α˜ j and δ˜k that appear in logit (36). However, if the sets of αj and δk are dense in R, so that functional equation methods can be applied, it has been shown that the function g is an affine transformation, thus making the scale of αj and δk an interval scale (Pfanzagl, 1971; Fischer & Molenaar, 1995). Also, any further monotonic transformation of αj and δk does not influence the uniqueness of the scale since it just relates different suitable numerical models representing

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

the same empirical structure (Irtel, 1993; Fischer, 1995; Fischer & Molenaar, 1995). The oneparameter simple logistic IRT model, or Rasch model, can then be written as P Xjνik = 1; αj , δk =

1 1 + exp(δk − αj )

∀ν ∈ Sαj , i ∈ Iδk .

(38)

More general considerations can be given rewriting specific objectivity in a less restrictive form, that is,   U F (αj , δk ), F (αj  , δk ) := V (αj , αj  ), (39) where U : R2 → R is a continuous and strictly monotonic function (increasing in the first argument and decreasing in the second one) called the comparator, and F : R2 → R is a continuous function, increasing bijective in the first argument and decreasing bijective in the second, called the reaction function (Fischer & Molenaar, 1995). According to specific objectivity, V is independent of the items’ parameters. Since the previous equation is a special case of the generalized functional equation of transitivity (Aczel, 1966), the general solution is of the form F (αj , δk ) = f [g(αj ) − m(δk )] and V (αj , αj  ) = u[g(αj ) − g(αj  )] for arbitrary continuous bijective increasing functions m, g, f, u (Fischer & Molenaar, 1995). (1) Since the function hj k is a strictly monotonic function as required in Section 4.1, it can be taken as reaction function F (αj , δk ) = h(1) (αj , δk ). It follows that m and g are each a constant shift of the other. Indeed, hypothesis (12) implies h(1) (αj , δj ) = 0 and, hence, f [g(αj ) − m(δj )] = 0, which happens when g(αj ) − m(δj ) = f −1 (0), or equivalently g(αj ) = m(δj ) + c, where c = f −1 (0) is just a constant since f is a continuous and strictly monotonic function and since αj = δj for α, δ ∈ N . If c = 0, the general solution has the form f [g(αj ) − m(δk )] = Φ(g(αj ) − m(δk ), g(δk ) − m(αj )) where Φ is any antisymmetric function of its arguments. However, since m and g are arbitrary functions corresponding to scale transformations of latent trait and item characteristic, m can always be redefined as m − c, so that g = m; and the system is equivalent to the case c = 0 in which the solution has a more simple form. If c = 0, a general shape of h(1) j k can be written as   h(1) (αj , δk ) = f g(αj ) − g(δk )

(40)

with the requirement for f to be an odd function so that condition (33) is satisfied for all αj , δk ∈ N (notice that such a requirement would not be needed for any α ∈ A \ N ). The previous equation is again a special case of the general result given by Pfanzagl (1971): its solutions are affine transformations g(α) = aα + b, for a, b ∈ R, thus leading to an interval scale for the latent trait and item characteristic (in the specific case of Rasch model (38) the func1 tion f becomes a dilation by aλ that cancels out the Lagrangian multiplier and compensates for any arbitrary unit change). This situation appears to be rich enough so that h(1) (αj , δk ) satisfies the axioms of conjoint measurement (Luce & Tukey, 1964; Tversky, 1967; Krantz (1) et al., 1971). As noted in Sect. 4.1, a dense set of values of hj k also allows one to define by continuity a probability in [0, 1]. However, the application of functional equations requires a dense universe for latent traits and item characteristics (Fischer & Molenaar, 1995; Fischer, 1995). What happens if this density requirement is not met or if some mixed condition in which A and D have different cardinalities is achieved? 4.3. Measurement Properties for the Rasch Model If only a finite set of equivalence classes of items is given, a dense set of latent traits α ∈ A is required to define a dense probability. In such a case, probability (38) is defined for any α, δ ∈ N ,

PSYCHOMETRIKA

but it could be theoretically assumed to hold also for all those α ∈ A \ N that do not have a corresponding value δ ∈ N . For these values, the condition of antisymmetry (33) does not hold, thus leaving to constraint (40) a certain freedom out of the set N . In particular, the functions g(α) and m(δ) are not necessarily equal. It has been shown that in such a case Equation (40) decomposes into a set of equations, one for each constant m(δk ) = βk ∈ R, leading to a scale that is somewhat weaker than an interval scale. When items are such that all the differences δk − δk  are multiples of some real value  ∈ R (so that ratios of differences are rational), the admissible transformations g(αj ) are not uniquely defined; but the interval properties still hold for an infinite set of many equally spaced points on the latent continuum. The transformation instead is affine if there are some δk , δk  , δk  ∈ R such that the ratio (δk − δk  )/(δk − δk  ) is irrational (Fischer & Molenaar, 1995; Fischer, 1995). The same result has been found for additive conjoint measurement (Fishburn, 1981). More generally, it has been shown that when such an asymmetry between A and D exists, and restricted or unrestricted solvability is defined with respect to a single attribute, any order of cancellation axiom is required for an additive conjoint representation to exist, but uniqueness is weakened as previously discussed (Gonzales, 2000). It appears that if the structure of a latent trait can be hypothesized to be rich enough, it makes up (although not completely) for the finite number of equivalence classes of item characteristics, thus closing the scale to an interval one while yielding a continuous probability. Indeed, the higher the cardinality of D and the smaller the spacing between the values δ, the less freedom is left to the constraint and hence to the admissible transformation g(α) (Fischer & Molenaar, 1995; Fischer, 1995). (1) As a consequence, a dense set of values hj k can also be achieved if A = D = N is a countable infinite set but is not dense in R. In such a case latent traits and item characteristics are on an ordinal scale but can be promoted to an interval scale. Constraints like (37) or (40) describe an ˜ thus yielding interval scales additive conjoint measurement structure of the form f (α˜ + (−δ)), for their arguments, α˜ = g(αj ) and δ˜ = g(δk ). But latent trait α and item characteristic δ might be still on an ordinal scale. Since N is a well ordered infinite set, there is a unique monotonic transformation g that leads its values into a chosen interval scale so that each step of the scale corresponds to a value in N . Hence, the ordinal scale of α and δ, that has been used to order the population, can be considered an initial ordinal scale that can be transformed into a chosen metric interval scale α, ˜ δ˜ by means of a unique monotonic transformation (Fischer & Molenaar, 1995). This is again a situation in which a structure of infinite equally spaced values appears to be rich enough to satisfy the axioms of conjoint measurement (Krantz et al., 1971). So it appears that when probability is a continuous quantity, the system should be rich enough to allow for interval measurement. (1) However, when the parameters α and δ do not define a dense set for the constraint hj k , then probability (38) is defined only on a set of discrete points in the interval [0, 1] and cannot be extended by continuity to the entire interval. In this case, the measurement scale could be either interval or ordinal. Indeed, if at least one of the sets A or D is infinitely countable, then there is a unique transformation g leading its parameters into a chosen interval scale. In such a case the scale of the parameters used to order the population can be considered an initial ordinal scale that can be always promoted into a chosen metric interval scale. But, again, an asymmetry between A and D has to be hypothesized so that solvability holds with respect to a single attribute, thus weakening the uniqueness properties of the scale. Instead, if there are finite equivalence classes of both latent traits and items characteristics, (1) although α˜ = g(αj ) and δ˜ = g(δk ) are on interval scales when hj k satisfies any cancellation condition, latent trait α and item characteristic δ might be still on an ordinal scale since there is not a unique monotonic transformation g leading the parameters into a chosen interval scale ˜ This corresponds to the situation in which a psychometricians’ fallacy could occur, since α, ˜ δ. inferring quantitative measurement from the fit of the model would not ensure that the attributes

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

are indeed quantitative. Unless there is a strong theoretical basis (or empirical support) to assume a rich structure of latent traits and item characteristics (or the existence of latent traits that do not have corresponding values of item characteristics), the only way to assess the quantitative nature of the data structure would likely be to probabilistically test the data itself checking if, under (1) the composition rule given by constraint hj k , they satisfy cancellation conditions up to the last empirically testable finite order (Karabatsos, 2001; Michell, 2009; Kyngdon, 2011). It is finally interesting to notice that the previous results closely parallel (although generalized to equivalence classes rather than data) the results achieved with non-parametric item response models (Scheiblechner, 1995, 1999; Karabatsos, 2001). Cancellation axioms up to the last empirically testable finite order are required to obtain ordered-metric scales for respondents and items that place in-between ordinal and interval metrics. The higher the cardinality of the sets of item and subjects, the more these scales approximate an interval scale (Scheiblechner, 1999; Karabatsos, 2001). Indeed, if specific objectivity is redefined to account for discrete items and subjects, only an ordering of latent traits and item characteristics appears to be possible so that any metric property is lost (Irtel, 1987; Fischer, 1995; Scheiblechner, 1995; Karabatsos, 2001). 4.4. Interpretation of Rasch’s Constraint and Generalization to IRT Models As suggested in the previous sections, when an infinite population P is partitioned by means of equivalence classes Sα and Iδ , IRT’s simple logistic model, or Rasch model, is the one that (1) maximizes the number of possible outcomes (15) of the system when the constraint hj k has the form 1 h(1) (αj , δk ) = (αj − δk ). (41) λ In particular, the RM follows from the general constraint  nj k (αj − δk ) = μ, (42) H {αj }, {δk }, {nj k } = λ jk

(2) where hj k = 0 has been kept for handiness of calculation. Since n = j k nj k is the total number of correct responses, the

previous constraint can be related to the difference between the

average n n of the latent trait, μα = j k nj k αj , and the average of the item characteristic, μδ = j k nj k δk , over all the correct responses. That is,

 n  nj k (αj − δk ) n = (μα − μδ ) = μ. (43) H {αj }, {δk }, {nj k } = λ n λ jk

Hence, the constraint value μ corresponds to a shift between the averages. Notice that such a

(1) shift can derive both from the term hj k , since j >k (nj k − nkj )(αj − δk ) = 0 if nj k = nkj as is (2)

usually expected, and from the term hj k = 0, if present. In addition, the constraint (42) implies that the latent trait and item characteristic are measured on an interval scale so that the set of their difference, αj − δk , is an infinite difference system. It is also worth noticing that a straight generalization of constraint (42) is given by  nj k ak (αj − δk ) = 0, (44) λ jk

where ak is the Birnbaum parameter for the item k (see Birnbaum in Lord & Novik, 1968) and represents the discrimination power of a single item. Such a constraint corresponds to a

PSYCHOMETRIKA

weighted

mean of both latent trait and item parameters. Another direct generalization, once one sets δk = l ωkl ηl , leads to the linear logistic test model (LLTM, Scheiblechner, 1972; Fischer, 1973),    nj k (45) αj − ωkl ηl , λ jk

l

where the ηl are the basic parameters quantifying the cognitive operations that decompose the difficulty parameters of the items. The parameter ωkl is just an occurrence or weight parameter. Once rewritten as  ω˜ l   nj k nj k ωkl , (46) αj − ηl with ω˜ l = λ λ jk

l

jk

it compares the average latent trait with the average of the cognitive operations. More generally (see Appendix C), using constraint (44) and accounting for lucky guesses and careless errors as a variation in the number of correct responses (15), the two-, three-, and four-parameter logistic IRT models (Barton & Lord, 1981) can be derived, corresponding to probability P Xjνik = 1; αj , δk = bk +

ck − bk , 1 + exp(ak (δk − αj ))

(47)

where the parameters bk , ck account for lucky guesses and careless errors, respectively. Finally, in the framework of MPD other important properties of RM, such as local stochastic independence and the sufficiency of the raw score statistics to identify the latent trait and item characteristic parameters, can still be identified. Local stochastic independence has been indeed assumed, although in a different shape, through the structure of multiplicity given in Equation (15) and by the additive independence in constraint (17). The number of possible outcomes is given for any cluster independently of the number of correct responses in all the other clusters. At the same time, the constraint works at a local level bounding the number of correct responses in each cluster. These local combinatorial and algebraic assumptions on the nature of the system imply local stochastic independence from a probabilistic point of view. Sufficiency of raw score statistics is, instead, a direct consequence of the very shape of Equation (38) and, hence, need not to be assumed. 4.5. Rasch Models as Incomplete Pairwise Comparisons for an Infinite Population It is also interesting to notice that, once one defines Aj , Dk ∈ R such that Aj = exp(αj ) and Dk = exp(δk ), a multiplicative structure as in (18) can be given to constraint (42):  nj k jk

λ

(αj − δk ) =

 nj k λ

jk

(ln Aj − ln Dk )

nj k

 nj k  Aj Aj 1 1 = , ln = ln λ Dk λ Dk jk

(48)

jk

which allows the definition of a multiplicative independence constraint,

 1  Aj nj k ˜ H {Aj }, {Dk }, {nj k } = , λ Dk jk

(49)

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

which is related to the probability P Xjνik = 1; Aj , Dk =

1 1+

Dk Aj

∀ν ∈ Sαj , i ∈ Iδk ,

(50)

which implies measurability of Aj and Dk on ratio scales. Such a shape is followed, for instance, by the Bradley–Terry–Luce model (Bradley & Terry, 1952; Luce, 1959). Matrices (8) and (13) can indeed be extended to pairwise comparisons. If the population is defined as a generic set of objects O, the quantity nj k can be interpreted as the number of times (during gj k comparisons) in which object Oj is preferred to Ok , so that the probability P (Oj , Ok ), defined by Equation (10), expresses the probability that the object Oj is preferred to the object Ok in the population (notice that in such a case the matrix is always square). Indeed, the simple logistic model can be seen as an incomplete case of pairwise comparisons when the population is split into two subsets P = S ∪ I, and a multiplicative constraint is scaled to an additive one in order to maximize the multiplicity of the outcomes. Finally, it is interesting to note that the considerations previously given about the cardinality of {Pj k } can be easily verified in probability (50): it is indeed sufficient to have infinite countable sets for {Aj } and {Dk } (or at least one of the two dense if the other is not) to define a dense set for their ratio (namely the odds) so that probability is defined on a dense set in the interval [0, 1] and can then be extended by continuity.

5. The Finite Population Case It is not unusual to have finite populations in clinical psychology, health sciences, epidemiology, biology, sociology, etc. In such a case the number of subjects and items in the (j k)th cell, gj k , is a finite number, and inequalities (24) are not satisfied by a unique value of nj k , even if condition hj k = ∇hj k holds. There are indeed lower and upper bounds, respectively: n+ jk =

gj k − exp(−λhj k ) 1 + exp(−λhj k )

and n− jk =

1 + gj k , 1 + exp(−λ∇hj k )

(51)

such that − n+ j k ≤ nj k ≤ nj k .

(52)

In particular, n+ j k is the solution of the forward finite difference corresponding to the case nj k → nj k + 1, while n− j k is the solution of the backward finite difference corresponding to the case nj k → nj k − 1. Hence, if one sets a value of λ and is given a specific choice of hj k in the (j k)th cluster, different integers nj k might be acceptable as average numbers of correct responses since they would correspond to solutions that maximize the multiplicity (15). Indeed, when hj k = (1) ∇hj k = hj k , inequality (52) can be rewritten as gj k a j k − b j k ≤ n j k ≤ gj k a j k + aj k

(53)

with aj k =

1 1 + exp(−λh(1) jk )

(1)

,

bj k =

exp(−λhj k ) 1 + exp(−λh(1) jk )

,

(54)

PSYCHOMETRIKA

such that aj k + bj k = 1 and aj k , bj k ∈ [0, 1]. The previous equation can also be rewritten as gj k aj k ≤ nj k + bj k ≤ gj k aj k + 1.

(55) (1)

Since now nj k ∈ {0, . . . , gj k } and gj k ∈ N is a finite value, if hj k is such that gj k aj k = N ∈ N, then the previous inequality has a unique solution given by nj k = N ∈ {0, . . . , gj k } so that probability can be defined as in Equation (31) only for discrete values. If, instead, inequality (53) is rewritten as (gj k + 1)aj k ≤ nj k + 1 ≤ (gj k + 1)aj k + 1.

(56)

Given (gj k + 1)aj k = N ∈ N there are two solutions, nj k = N − 1 = (gj k + 1)aj k − 1 ∈ − {0, . . . , gj k } and nj k = N = (gj k + 1)aj k ∈ {0, . . . , gj k } (corresponding to n+ j k and nj k ), so that there are two different values of the probability that satisfy, at the same time, inequality (52). This result can also be generalized to any (gj k + τ )aj k = N ∈ N with τ = 0 such that gj k + τ ∈ N. Finally, for any other value of aj k , bj k ∈ [0, 1], such that gj k aj k ∈ N or (gj k + τ )aj k ∈ N, it can (1) be seen that there is only one solution nj k that is associated with infinite different values of hj k . (1)

Indeed, a bijective relation between the argument hj k and the value of nj k can be established only for the finite set of values corresponding to the ratios nj k /gj k . − Furthermore, since n+ and Dn− , j k , nj k ∈ {0, . . . , gj k } are defined on different domains Dn+ jk jk solutions nj k should be defined only in the intersection Dn+ ∩ Dn− . jk jk These problems of identifiability and domain propagate to the probability. Given the definitions 1 1 − gj k exp(−λhj k ) , P + Xjνik = 1; αj , δk = 1 + exp(−λhj k )

P − Xjνik = 1; αj , δk =

1+

1 gj k

1 + exp(−λ∇hj k )

,

which lead to the interval P + Xjνik = 1; αj , δk ≤ P Xjνik = 1; αj , δk ≤ P − Xjνik = 1; αj , δk ,

(57)

(58)

(59)

probability P (Xjνik = 1; αj , δk ) is uniquely defined only for a finite set of values and only in the intersection DP + ∩ DP − . Hence, even if condition hj k = ∇hj k holds and hj k follows a linear jk jk shape like (29), when gj k is finite the bounds are different and the squeeze theorem cannot be applied. As a result, in-between these bounds, probability possesses a unique analytic definition only for a finite set of values. Moreover, probability is defined on a limited domain DPj k . A direct example is given for RM. 5.1. Rasch Model and IRT Models for a Finite Population In the specific case of constraint (41), the two bounds are given by the probabilities 1 1 − gj k exp(δk − αj ) P + Xjνik = 1; αj , δk = , 1 + exp(δk − αj )

P − Xjνik = 1; αj , δk =

1+

1 gj k

1 + exp(δk − αj )

that frame the simple logistic model as in Figure 1.

,

(60)

(61)

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

F IGURE 1.

Simple logistic model (SLM) (infinite population case, gj k → ∞) and probability curves P + and P − plotted for gj k = 1 and gj k = 10 (finite population case), as a function of the difference δ − α.

Since probabilities Pj+k and Pj−k lie in the interval [0, 1], it can be seen that their domains are defined as DP + = [−∞, ln gj k ] and DP − = [− ln gj k , ∞]. Hence, P (Xjνik = 1; αj , δk ) can jk jk be defined only in the intersection DPj k = DP + ∩ DP − = [− ln gj k , ln gj k ] and takes shape (38) jk jk only for a finite set of values. This result can also be seen as an indeterminacy in the value of j k = δk − αj given an expected value of nj k , or of probability Pj k . The existence of a limited domain [− ln gj k , ln gj k ] in which a countable set of differences j k can be defined suggests that a finite population is not enough to define a unique value of probability out of a finite countable set of latent traits and item characteristics. Indeed, in the extreme case of gj k = 1, probability is defined only for j k = 0. As noticed by Landsberg (1954), when gj k = 1 the multiplicity takes a value of one for all the admissible values of nj k ; and, hence, all the states are equally likely and there are no admissible solutions. It is, however, straightforward to see that higher values of gj k define tighter bounds around the simple logistic model and shape larger domains DPj k so that the simple logistic model appears to be uniquely defined only when differences j k are allowed to vary in R, namely in the limit of an infinite population gj k → ∞. A population size can be set above which the bounds can be considered to be tight enough to consider the Rasch model uniquely defined and to allow its application as a good approximation, thus neglecting, as in the infinite population case, the fluctuations around the equilibrium solution that in the finite population are not negligible (Huang, 1987). Nonetheless, it must be kept in mind that in a finite population the number of equivalence classes is also finite; so, although the logistic models could be applied, its fit should not be considered evidence of interval measurement since likely no more than an ordering of the attributes is achieved unless higher-order cancellation conditions are satisfied (Michell, 2009). 5.2. An Approximate Size for the Finite Population It can be seen (see Appendix B) that, in the limit nj k → ∞, the variable nj k can be considered continuous and the condition hj k = ∇hj k requires for the function h(αj , δk , nj k ) to be differentiable in nj k , so that the right and left derivatives both exist and are equal, ∂n+j k = ∂n−j k . Although in general such an approximation introduces spurious solutions, it yields the same results of the discrete case (entailing at the same time more handy calculations) when hj k is linear like in (29). In particular, it can be used to define a threshold size above which the results of the infinite population can be applied to a finite population. Indeed, the Rasch model (38) in this

PSYCHOMETRIKA

continuous case can be found by applying the Stirling approximation to the factorial, that is,

N √ N N ! ≈ 2πN , e

(62)

and then taking the derivatives in respect to nj k (see Appendix B). This is a common approximation that holds in the asymptotical limit of large N . Hence, when in a finite population the number of items and subjects is large enough, the logistic model can be considered a good equilibrium approximation. Usually, a value of N ≈ 30 is considered to be acceptable since it leads to a discrepancy slightly lower than 0.3 %. Thus, Equation (38) is a good description for a finite population when, on average, gj k − nj k ≈ 30 and nj k ≈ 30, that is, gj k ≈ 60. In other words, if in the (j k)th cluster there are at least 60 cells, a finite population should already follow the same behavior of an infinite population. Such a requirement might be fulfilled in different ways, for instance, by 10 subjects having similar positions on the latent trait scale and six items having very close positions on the item characteristic scale or, equivalently, by 60 subjects in the population having similar positions on the latent trait scale for each single item.

6. Conclusions The most probable distribution method is a way of deriving a probability distribution by maximizing the multiplicity of the outcomes under the presence of some constraint on the system. Such a methodology corresponds to finding the distribution associated with the minimum available quantity of information regarding the system or, alternatively, the maximum uncertainty about its actual state (Huang, 1987). In the case of a dichotomous test, multiplicity (15) is the number of ways in which a certain number of correct responses to the items can be given by the subjects. The system is constrained using the latent traits and item characteristics. For an infinite population, logistic models are those that maximize this number of possible outcomes or, alternatively, that describe an equilibrium solution accounting for the maximum lack of information, or maximum randomness. Indeed, a logistic function is the inverse parameter mapping for a Bernoulli distribution, and exponential families can be derived as the maximum entropy solution in the presence of linear constraints on the expected value of their sufficient statistics (Kagan et al., 1973). In particular, multiplicity (15) describes a series of independent binomial distributions, each one corresponding to a cluster j, k; and within each cluster there are gj k Bernoulli trials, hence the result is a logistic function. This suggests that other non-logistic IRT models, in the infinite population limit gj k → ∞, might just be descriptions of systems with a different combinatorial nature, or out of this equilibrium condition so they could be used to describe situations in which a different state of information is available. In addition, several different logistic IRT models can be derived, independently of their measurement properties that should be otherwise assessed by testing the axioms of conjoint measurement. Besides, the models have been derived without any assumption of continuity, monotonicity, and asymptotical behavior of the item response function. The same result does not hold, however, for a finite population since there is not a unique definition except for a finite set of values. Most of all, fluctuations around the average value are not negligible in a finite population so that distributions describing different states are not negligible. Nonetheless, an estimate of the number of elements in the population required to apply the logistic model with good approximation can be given. However, unless cancellation conditions of any order are tested and satisfied, latent traits and item characteristics in the finite case cannot be granted more than a qualitative ordering (Michell, 2009).

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

Indeed, it appears that quantitative measurement is not, in general, achieved unless the populations of latent traits and item characteristics can be assumed to be rich enough, and the constraint satisfies the axioms of conjoint measurement. The metric of latent traits and item characteristics appears to be connected to the metric of the constraint taken into consideration that, in its turn, affects the probability. In particular, the latter can be extended by continuity in all the interval [0, 1] only as long as a dense set of values for the constraint can be defined. Besides, a weak or strict order on the constraint h(1) (αj , δk ) is a situation in which conjoint measurement can be applied to assess the quantitative nature of the attributes. Notice also that testing the set (1) of constraints {hj k } or the set of probabilities {Pj k } should lead to the same result since there is a one-to-one relation between the two. In particular, probability in the model has been built as an expected proportion of subjects; thus, it can be the base of a numerical empirical structure since it can be empirically obtained (Luce & Narens, 1994). Besides, the constraint follows an additive composition rule given by Cantor’s first equation; thus, this should circumvent any other difficulty in conceiving probability as an empirical relational system (Kyngdon, 2008). On the other hand, such a test would correspond to a direct test of the composition rules that are hypothesized to underlie individual differences in test performance, as has already been suggested by Kyngdon (2011). On a general ground, it could also be conceived as a test of the composition rule of cognitive attributes that impose restrictions to the full randomness of the system. For instance, the one-parameter logistic model, or Rasch model, can be derived as the model describing the minimum amount of information, under the constraint imposed by latent traits and item characteristics parameters, with the additional assumption of specific objectivity. This leads to the particular shape of the constraint (40) that can be used to assess the properties of the underlying attributes. For a dense population of latent traits and item characteristics, the metric appears to be on an interval scale. In this situation, indeed, the structure appears to be rich enough to satisfy the axioms of conjoint measurement. In particular, a ratio scale can be achieved when the constraint is given in a multiplicative form so that the simple logistic model becomes an incomplete case of pairwise comparisons. In such a case a dense set of values for the constraint is also immediately achieved, granting a probability defined in all the interval [0, 1]. However, if only latent trait values are dense, while the item characteristic universe of equivalence classes is finite, a quasi-interval metric can be achieved, although the existence of latent trait values that do not correspond to any item must be hypothesized. Such a metric turns out to be somewhat weaker than an interval scale (Fishburn, 1981; Fischer & Molenaar, 1995) but gets stronger and stronger as long as the number of equivalence classes of items characteristics increases, approaching an interval scale in the infinite limit. Furthermore, in such an asymmetric situation, a perfect interval measurement is granted only if there are at least some item characteristics δk , δk  , δk  ∈ R such that the ratio (δk − δk  )/(δk − δk  ) is irrational. If, instead, both latent traits and item characteristics are on an ordinal level but possess infinite countable levels, the constraint (and the probability) is still defined on a dense set and the metric can be promoted to an interval scale. There is a unique monotonic transformation of the ordinal latent trait and item characteristic parameters used to order the population into a chosen equal interval scale as if they were a sort of initial scale (Fischer & Molenaar, 1995). Furthermore, it appears that a weakened interval metric scale can still be achieved, even when only one between the set of latent trait and of item characteristic values is infinitely countable; however, probability becomes defined only on a discrete set of values. Finally, if there are finite sets of equivalence classes of latent traits and item characteristics, only a discrete set of probabilities can be defined; and there is no unique transformation leading into an interval scale so that the scale results to be ordinal. In such a situation, the quantitative nature of the attributes cannot be granted; and there is the risk of incurring in the psychometricians’ fallacy (Michell, 2009), unless the data structure itself

PSYCHOMETRIKA

is tested according to some compositional rule (Kyngdon, 2011). Besides, as noted in Section 4.3, the previous results closely parallel (although applied to equivalence classes and populations rather than data structures) the results of non-parametric item response models in which cancellation axioms up to the last empirically testable finite order are required to obtain ordered-metric scales that approximate, better and better, an interval scale with the increase in the number of respondents, items, and response categories (Scheiblechner, 1999; Karabatsos, 2001). It is also worth noticing how local stochastic independence, sufficiency of raw score for the statistics, and the criterion of specific objectivity can be described in this framework. The first can still be considered a fundamental assumption of any logistic model, yet it is introduced in an algebraic rather than a probabilistic way: multiplicity is given in terms of the number of possible responses for any combination of latent trait and item characteristic, but independently of the other combinations. The same holds for the constraint. This local algebraic independence implies local stochastic independence when probability is defined. Sufficiency of the raw statistic in the RM is, instead, a consequence of the shape of the model, given the specific constraint (42), and does not need to be assumed. Finally, specific objectivity is a property possessed only by a family of constraints that defines a family of models of which the Rasch model is one possible case. Another important observation is that the previous results are general, in that they could be applied to several situations depending on how populations and multiplicity are defined. If multiplicity is taken as the number of times an object from a given population is preferred to another one, the final result is the pairwise comparisons model for individual choices behavior (Luce, 1959). If, instead, multiplicity is a measure of the number of times a stimulus is correctly identified above the threshold, the result is the psychometric logistic curve. This generality relies on the fact that the most probable distribution method (or, more generally, maximum entropy) derives a probability distribution from an abstract variational principle; and, hence, it should be possible to extend the previous results beyond the dichotomous test to include different cases like the dispersion location model (Andrich, 1982), and the polytomous Rasch model or the partial credit model (Andrich, 1978, 1982; Masters, 1982). Most of all, the possibility of modifying the constraints to account for different effects could lead to alternative ways of shaping the probability distribution as the result of specific needs in the construction of a test. For instance, analogous derivations have been given for other logistic models of item response theory, or for the LLTM model and the 2PL/Birnbaum model. As already noticed, however, their fit does not grant interval measurement since quantitative properties should be assessed by means of conjoint measurement testing of their constraints (Michell, 2009; Kyngdon, 2011). This also underlies the importance of defining probabilistic framework to test the axiom of conjoint measurement, like in order-restricted statistical inference methods (Karabatsos, 2001; Davis-Stober, 2009). As a final note, the most probable distribution method is neither the only way of deriving the logistic function, nor the most elegant (Landsberg, 1954). For instance, in statistical mechanics, logistic functions can be derived by considering the probability distribution of a statistical ensemble, thus describing the mean properties of a system as averages over an ensemble of systems (Huang, 1987). Similarly, the approach of the maximum entropy principle in Bayesian theory would have been more general (Jaynes, 1957, 1968; Bernardo & Smith, 1994). Although these would actually be more suitable and elegant frameworks, the older method of the most probable distribution does not require a formal definition of the concepts of ensemble or of entropy; and it is often used in introductory courses to statistical mechanics since it needs less formal requirements and is very useful to explore the system behavior (Landsberg, 1954). This, however, suggests that IRT could have a more general and deep connection with Bayesian probability modeling and maximum entropy, beyond the problem of parameters, a priori probabilities, or models estimation.

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

Acknowledgements We wish to thank the two anonymous reviewers of the journal for their insight into the work and their helpful comments and suggestions.

Appendix A. Derivation of Condition (24) Inequalities (22) and (23) in the case of Equation (21) (and omitting the constant term μ for simplicity of notation) become

ln

 gj k ! + λh(αj , δk , nj k ) nj k !(gj k − nj k )! 

gj k ! + λh(αj , δk , nj k + 1), ≥ ln (nj k + 1)!(gj k − (nj k + 1))! 

gj k ! + λh(αj , δk , nj k ) ln nj k !(gj k − nj k )! 

gj k ! + λh(αj , δk , nj k − 1), ≥ ln (nj k − 1)!(gj k − (nj k − 1))!

so that expanding the logarithm and simplifying the common terms gives − ln nj k ! − ln(gj k − nj k )! + λh(αj , δk , nj k ) ≥ − ln(nj k + 1)! − ln(gj k − nj k − 1)! + λh(αj , δk , nj k + 1), − ln nj k ! − ln(gj k − nj k )! + λh(αj , δk , nj k ) ≥ − ln(nj k − 1)! − ln(gj k − nj k + 1)! + λh(αj , δk , nj k − 1). Since, now, by definition of factorial, ln(n + 1)! = ln(n + 1) + ln n!, the previous inequalities become   − ln(gj k − nj k ) ≥ − ln(nj k + 1) + λ h(αj , δk , nj k + 1) − h(αj , δk , nj k ) ,   − ln nj k + λ h(αj , δk , nj k ) − h(αj , δk , nj k − 1) ≥ − ln(gj k − nj k + 1), which, by setting the finite and backward differences of hj k as hj k = h(αj , δk , nj k + 1) − h(αj , δk , nj k ), ∇hj k = h(αj , δk , nj k ) − h(αj , δk , nj k − 1), can finally be rewritten as 1 + gj k gj k − exp(−λhj k ) ≤ nj k ≤ , 1 + exp(−λhj k ) 1 + exp(−λ∇hj k ) which when divided by gj k leads to condition (24).

PSYCHOMETRIKA

Appendix B. Derivation of Equation (38) Applying Stirling Formula The case of an infinite population can be described by the limits nj k → ∞, gj k → ∞ and gj k − nj k → ∞. Under these limits, variations nj k in the discrete variables nj k are often considered to be negligible compared to the value of nj k , namely nj k  nj k , so that it can be directly considered the limit nj k → 0. In such a case the forward difference equation can be written as  

  gj k !  +λ ln h(αj , δk , nj k ) − μ = 0, nj k !(gj k − nj k )! jk

jk

while the backward difference equation can be given by switching the operator  with ∇. Since, now, due to linearity, variation passes under the sign of summation, the previous equation becomes 

    gj k ! +λ  ln h(αj , δk , nj k ) − μ = 0, nj k !(gj k − nj k )! jk

jk

and the summations can be dropped, leading to 

   gj k ! +λ h(αj , δk , nj k ) − μ = 0,  ln nj k !(gj k − nj k )! jk

which can be rewritten as

   h(αj , δk , nj k ) − μ = 0.  ln gj k ! − ln nj k ! − ln(gj k − nj k )! + λ jk

The previous equation is usually approximated by means of the Stirling formula, ln N ! ≈ N ln N − N , in order to get rid of the factorials. In passing, notice that the Stirling formula misses a term 12 ln 2πN with respect to the approximation (62). This is not a problem since this additional term vanishes in the limit for N → ∞. Hence,   gj k ln gj k − gj k − nj k ln nj k + nj k − (gj k − nj k ) ln(gj k − nj k ) + (gj k − nj k ) + λ



 h(αj , δk , nj k ) − μ

= 0,

jk

which simplified gives 

   gj k ln gj k − nj k ln nj k − (gj k − nj k ) ln(gj k − nj k ) + λ h(αj , δk , nj k ) − μ = 0. jk

Since the forward difference implies nj k → nj k + nj k (while the backward difference implies nj k → nj k − nj k ), the equation becomes −(nj k + nj k ) ln(nj k + nj k ) + nj k ln nj k − (gj k − nj k − nj k ) ln(gj k − nj k − nj k ) + · · ·   + (gj k − nj k ) ln(gj k − nj k ) + λ h(αj , δk , nj k + nj k ) − h(αj , δk , nj k ) = 0,

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

which, dividing by nj k , and taking the limit nj k → 0, gives 

gj k − n j k + λ∂n+j k h(αj , δk , nj k ) = 0, ln nj k where, by definition of a derivative, ∂n+j k h(αj , δk , nj k ) =

lim

nj k →0

h(αj , δk , nj k + nj k ) − h(αj , δk , nj k ) , nj k

so that the forward finite variation  has been switched with the right derivative ∂n+j k , thus giving nj k =

gj k . + 1 + exp(−λ∂nj k h(αj , δk , nj k ))

(B.1)

A similar result can be obtained for the backward difference equation, so that the operator ∇ can be switched with the left derivative ∂n−j k , giving nj k =

gj k . − 1 + exp(−λ∂nj k h(αj , δk , nj k ))

(B.2)

Hence, both Equations (B.1) and (B.2) are satisfied when the function h(αj , δk , nj k ) is derivable so that both right and left derivatives exist and coincide, ∂n+j k h = ∂n−j k h , thus giving Equation (38).

Appendix C. Other IRT Logistic Models Other parameters in the logistic item response model can be introduced by modifying the constraint and the multiplicity in order to account for lucky guesses, careless errors, and single item discriminability. C.1. Permutations If in the (j k)th cluster there are lj k lucky guesses and cj k careless errors, it must be considered how nj k − lj k responses can distribute themselves inside gj k − cj k − lj k cells in the matrix, so the total number of possible outcomes (15) becomes

   g j k − c j k − lj k (gj k − cj k − lj k )! = . W {nj k } = n j k − lj k (nj k − lj k )!(gj k − nj k − cj k )! jk

jk

C.2. Constraints Items possessing different discriminating power can be accounted for by means of constraint (44),  nj k ak (αj − δk ) = μ, λ jk

where the coefficient ak is the Birnbaum parameter (see Birnbaum in Lord & Novik, 1968) and becomes just a dilation parameter for all the items when ak = a for every item as in (44).

PSYCHOMETRIKA

C.3. Derivation For handiness of calculations, one can follow the same continuous derivation given in Appendix B (omitting the constant parameter μ for simplicity of calculation) in the case of nj k → ∞, gj k → ∞ and gj k − nj k → ∞. Due to linearity, the finite variation  passes under the sign of summation, which gives 

   nj k (gj k − cj k − lj k )! +λ ak (αj − δk ) = 0;  ln (nj k − lj k )!(gj k − nj k − cj k )! λ jk

and the summation can be dropped, leading to    (gj k − cj k − lj k )!  ln + nj k ak (αj − δk ) = 0, (nj k − lj k )!(gj k − nj k − cj k )! which can be rewritten as    ln(gj k − cj k − lj k )! − ln(nj k − lj k )! − ln (gj k − nj k − cj k )! + nj k ak (αj − δk ) = 0. Since in the population limit the Stirling approximation can be applied to the logarithm of a factorial, the difference formula becomes   (gj k − cj k − lj k ) ln(gj k − cj k − lj k ) − (gj k − cj k − lj k ) − (nj k − lj k ) ln(nj k − lj k ) + (nj k − lj k ) + · · · − (gj k − nj k − cj k ) ln(gj k − nj k − cj k )  + (gj k − nj k − cj k ) + nj k ak (αj − δk ) = 0, which simplified gives   (gj k − cj k − lj k ) ln(gj k − cj k − lj k ) − (nj k − lj k ) ln(nj k − lj k ) + · · ·

 + (nj k − lj k ) − (gj k − nj k − cj k ) ln(gj k − nj k − cj k ) + nj k ak (αj − δk ) = 0.

Consider, now, nj k as a continuous variable, as has already been done in Appendix B, so that the finite variation  can be switched into the derivative ∂nj k thus giving − ln(nj k − lj k ) −

n j k − lj k gj k − n j k − cj k + ln(gj k − nj k − cj k ) + + ak (αj − δk ) = 0, n j k − lj k gj k − n j k − cj k

which, simplifying, is

 gj k − n j k − c j k ln + ak (αj − δk ) = 0, n j k − lj k

so that gj k − n j k − c j k = exp ak (δk − αj ) , n j k − lj k which can be rewritten as gj k − cj k + lj k exp ak (δk − αj ) = nj k 1 + exp ak (δk − αj ) , or equivalently as gj k − (lj k + cj k ) + lj k 1 + exp ak (δk − αj ) = nj k 1 + exp ak (δk − αj ) ,

STEFANO NOVENTA, LUCA STEFANUTTI, AND GIULIO VIDOTTO

which gives n j k = lj k +

gj k − (cj k + lj k ) . 1 + exp(ak (δk − αj ))

Since the proportion cj k /gj k gives the probability of a careless error PjCk , and the proportion lj k /gj k gives the probability of a lucky guess PjLk , dividing the previous equation by gj k returns the most probable distribution for the number of cells filled in a cluster, P Xjνik = 1; αj , δk = PjLk +

1 − (PjLk + PjCk ) 1 + exp(λak (δk − αj ))

,

which is exactly model (47), if one defines bk = PjLk and ck = 1 − PjCk . Notice that the choice of omitting the index j is just a simplification in which lucky guesses and careless errors are considered to depend only on the item difficulty. References Adams, E.W. (1965). Elements of a theory of inexact measurement. Philosophy of Science, 32(3), 205–228. Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573. Andrich, D. (1982). An extension of the Rasch model for ratings providing both location and dispersion parameters. Psychometrika, 47(1), 105–113. Aczel, J. (1966). Lectures on functional equations and their applications. New York: Academic Press. Aczel, J., & Dohmbres, J. (1989). Functional equations in several variables. Cambridge: Cambridge University Press. Bradley, R.A., & Terry, M.E. (1952). Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3/4), 324–345. Bernardo, J.M., & Smith, A.F.M. (1994). Bayesian theory. Chichester: Wiley. Barton, M.A., & Lord, F.M. (1981). An upper asymptote for the three-parameter logistic item-response model. Princeton: Educational testing service. Clinton, W.L., & Massa, L.J. (1972). Derivation of a statistical mechanical distribution function by a method of inequalities. American Journal of Physics, 40, 608–610. Davis-Stober, C.P. (2009). Analysis of multinomial models under inequalities constraints: applications to measurement theory. Journal of Mathematical Psychology, 53, 1–13. Fischer, G.H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359–374. Fischer, G.H. (1995). Some neglected problems in IRT. Psychometrika, 60(4), 459–487. Fischer, G.H., & Molenaar, I.W. (1995). Rasch models: foundations, recent developments, and applications. New York: Springer. Fishburn, P.C. (1981). Uniqueness properties in finite-continuous additive measurement. Mathematical Social Sciences, 1(2), 145–153. Gonzales, C. (2000). Two factor additive conjoint measurement with one solvable component. Journal of Mathematical Psychology, 44, 285–309. Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55(4), 577– 601. Huang, K. (1987). Statistical mechanics. New York: Wiley. Irtel, H. (1987). On specific objectivity as a concept in measurement. In E.E. Roskam & R. Suck (Eds.), Progress in mathematical psychology-1. Amsterdam: Elsevier. Irtel, H. (1993). The uniqueness of simple latent trait models. In G.H. Fischer & D. Laming (Eds.), Contributions to mathematical psychology, psychometrics, and methodology. New York: Springer. Jaynes, E.T. (1957). Information theory and statistical mechanics. The Physical Review, 106(4), 620–630. Jaynes, E.T. (1968). Prior probabilities. IEEE Transactions on Systems Science and Cybernetics, 4(3), 227–241. Kagan, A.M., Linnik, V.Y., & Rao, C.R. (1973). Characterization problems in mathematical statistics. New York: Wiley. Karabatsos, G. (2001). The Rasch model, additive conjoint measurement, and new models of probabilistic measurement theory. Journal of Applied Measurement, 2(4), 389–423. Kyngdon, A. (2008). The Rasch model from the perspective of the representational theory of measurement. Theory & Psychology, 18, 89–109. Kyngdon, A. (2011). Plausible measurement analogies to some psychometric models of test performance. British Journal of Mathematical & Statistical Psychology, 64, 478–497. Krantz, D.H., Luce, R.D., Suppes, P., & Tversky, A. (1971). Foundations of measurement. Vol. 1: Additive and polynomial representations. San Diego: Academic Press. Landsberg, P.T. (1954). On most probable distributions. Proceedings of the National Academy of Sciences, 40, 149–154. Lord, F.M., & Novik, M.R. (1968). Statistical theories of mental test scores. London: Addison-Wesley.

PSYCHOMETRIKA Luce, R.D. (1959). Individual choice behavior: a theoretical analysis. New York: Wiley. Luce, R.D., Krantz, D.H., Suppes, S., & Tversky, A. (1990). Foundations of measurement. Vol. 3: Representation, axiomatization and invariance. San Diego: Academic Press. Luce, R.D., & Narens, L. (1994). Fifteen problems concerning the representational theories of measurement. In P. Humpreys (Ed.), Patrick suppes: scientific philosopher, (Vol. 2). Dordrecht: Kluwer Academic. Luce, R.D., & Tukey, J.W. (1964). Simultaneous conjoint measurement: a new scale type of fundamental measurement. Journal of Mathematical Psychology, 1, 1–27. Masters, G.N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149–174. Michell, J. (1990). An introduction to the logic of psychological measurement. Hillsdale: Erlbaum. Michell, J. (2009). The psychometricians’ fallacy: too clever by half? British Journal of Mathematical & Statistical Psychology, 62, 41–55. Perline, R., Wright, B.D., & Wainer, H. (1979). The Rasch model as additive conjoint measurement. Applied Psychological Measurement, 3, 237–255. Pfanzagl, J. (1971). Theory of measurement. Wurzburg and Vienna: Physica-Verlag. Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Nielsen & Lydiche. Rasch, G. (1972). On specific objectivity. An attempt at formalizing the request for generality and validity of scientific statements. In M. Blegvad (Ed.), The Danish yearbook of philosophy. Copenhagen: Munksgaard. Scott, D. (1964). Measurement structures and linear inequalities. Journal of Mathematical Psychology, 1, 233–247. Suppes, P., & Zinnes, J.L. (1963). Basic theory of measurement. In R.D. Luce, R.R. Bush, & E. Galanter (Eds.), Handbook Math. Psych.: Vol. 1. New York: Wiley. Scheiblechner, H. (1972). Das Lernen und Lösen komplexer Denkaufgaben [The learning and solving of complex reasoning items]. Zeitschrift für Experimentelle und Angewandte Psychologie, 3, 456–506. Scheiblechner, H. (1995). Isotonic psychometrics models. Psychometrika, 60, 281–304. Scheiblechner, H. (1999). Additive conjoint isotonic probabilistic models. Psychometrika, 64, 295–316. Tversky, A. (1967). A general theory of polynomial conjoint measurement. Journal of Mathematical Psychology, 4, 1–20. Manuscript Received: 1 SEP 2012 Final Version Received: 1 FEB 2013

An analysis of item response theory and Rasch models based on the most probable distribution method.

The most probable distribution method is applied to derive the logistic model as the distribution accounting for the maximum number of possible outcom...
366KB Sizes 0 Downloads 6 Views