This article was downloaded by: [New York University] On: 06 May 2015, At: 17:32 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Research Quarterly for Exercise and Sport Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/urqe20

Effects of Violating Local Independence on IRT Parameter Estimation for the Binomial Trials Model a

Marilyn A. Looney & Judith A. Spray a

b

Department of Physical Education , Northern Illinois University , DeKalb , IL , USA

b

Support, Technological Applications, and Research Department, Development Division , American College Testing , Iowa City , IA , USA Published online: 26 Feb 2013.

To cite this article: Marilyn A. Looney & Judith A. Spray (1992) Effects of Violating Local Independence on IRT Parameter Estimation for the Binomial Trials Model, Research Quarterly for Exercise and Sport, 63:4, 356-359, DOI: 10.1080/02701367.1992.10608756 To link to this article: http://dx.doi.org/10.1080/02701367.1992.10608756

PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http:// www.tandfonline.com/page/terms-and-conditions

R...ucll Quarterly for

Eleen:_.

Sport

e 1992 bytheAmerican Alliance for Health, Physical Education, Recreation and Dance

Vol. 63,No.4, pp.356-359

Effects of Violating Local Independence on IRT Parameter Estimation for the Binomial Trials Model

Downloaded by [New York University] at 17:32 06 May 2015

Marilyn A. Looney andJudith A. Spray The appropriateness oftM Binomial Trials Modelfur testdata that consist ofmultipleattempts oftM sameitem needs to be determined because 1Mpresence oflearningorfatigue effects may violate tM model's assumption of local independence. TIut purpose ofthis study was to determine what effect 1Mseverity oflM violation oflocal independena (VLl), coupled with different sample.tiz.e (SS), test length (TL), and testdifficulty (W) had on the estimation oftM model difficulty parameter, b, wing computer simulation techniques. Each oftM followingconditions was replicated 100 timesunder a completely crossed design: SS (100,200,500,2,000); TL (5, 10, 20, 25 attempts); W (-1.2, 0.0, 1.2); and VU (from no violation to complete violation). Examinee ability or latent trait was pseudorandomly draumfrom a standard normaldistribution, and 1Mb-parameter was estimated using a maximum likelihood procedure on generated testscores. Regardless of SS, TL, and W, tM b-parameter tended to be overestimatedfur situations in whichthe Vll condition simulatedfatigue and underestimated when tM VLI condition simulated late-test. learning urpractia effect. The findings suggest that violationsof local independena, at leastas simulatedin this study, couldseriously bias 1Mdifficulty parameter estimates if all examinees tested exhibited tM dependency.

Key words: item response theory, binomial trials model, local independence, Rasch Model

D

iscussion began in 1987 concerning the advantages and difficulties of applying item response theory (IRT) to psychomotor data (Disch, 1987; Safrit, 1987; Spray, 1987; Wood, 1987). Since that time few articles have appeared in Research Qy.arterlyforExercise and Sport to further the discussion and application of IRT (Safrit, Cohen, & Costa, 1989; Spray, 1990). The limited attention given to IRT by physical educators is troublesome because two of the advantages IRT has over classical test theory are (a) an estimate of an examinee's true score (i.e., latent trait) can be obtained that is independent of the test administered, and (b) item characteristics can be estimated that are independentofthe populations tested. These are the invariant principles of IRT that make adaptive testing possible (i.e., a test can be "tailor made" for each examinee). Additionally, issues such as determining test bias, validating cutoff scores for criterionreferenced mastery tests, and measuring change can be addressed more satisfactorily with IRT (Spray, 1990;

Submitted: November 11, 1991 Revision accepted: May21, 1992 Marilyn A. Looney is an associate professor in the Department of Physical Education, Northern Illinois University, DeKalb, IL Judith A. Spray is the assistantdirector of the Support. TechnologicalApplications, andResearch Department Development Division, American College Testing, Iowa City, IA.

356

Wood, 1987). These issues can only be addressed after test items have been calibrated on a large group of examinees (200-500) (Hambleton, 1989, p. 171). However, item calibration should not occur unless IRT model assumptions are met. Many IRT models assume that the examinees have some latent or unobserved unidimensional ability. This assumption, however, cannot be strictly met because other factors affect test performance such as motivation and anxiety. The assumption implies that there is a dominant factor, the ability measured by the test (Hambleton, 1989). Local independence is another as-sumption common to many IRT models. For any given ability level (latent trait), the examinee's performance on one item is not influenced by the performance on any other item on the test (Hambleton, 1989). Within a specific psychomotor framework, this implies that there is no practice or fatigue effect in multiple trials of the same item (e.g., shooting 10 free throws in basketball). Previously, Spray (1990) stated that "for the purpose or usefulness of the (IRT) models, this assumption of local independence can usually be waived" (p. 166). However, this referred to the nominal trial-to-trial dependency that results from the effect of the knowledge of the results of a previous trial. It did not include a serious practice or fatigue effect, which could contaminate a series of repeated trials of some psychomotor task or skill. The binomial trials model has been proposed for use with data generated by administering repeated trials of the same item, which is scored 1 or 0 (Masters & Wright, 1984; Spray, 1990). Before applying this model to empiri-

ROES: December 1992

Downloaded by [New York University] at 17:32 06 May 2015

Looney andSpray

mated using a maximum likelihood procedure on generated test scores. The binomial trials model is defined as:

cal data, it is important to know how sensitive the model is to violations of the local independence assumption. This is particularly important because most multitrial data probably do manifest some degree of practice or fatigue effects. The purpose of this study was to determine what effect the severity of the violation of local independence, coupled with different sample size, test length, and test difficulty, had on the estimation of the difficulty parameter (h) for the binomial trials model.

where 8 is latent trait, h is test difficulty, X is number of successes, and n is number of attempts (Spray, 1990, p.163).

Method

Results and Discussion

Two hundred forty computer simulations were conducted )Vhere each simulation represented a condition from a completely crossed 4 x 4 x 3 x 5 design. Each condition was replicated 100 times. Sample sizes of 100, 200,500, and 2,000 were used. These sample sizes represent those that can be acquired more easily for psychomotor data (100,200) and those that are commonly seen in the literature relating to cognitive data (500, 2,000). Larger sample sizes should result in smaller standard errors of item parameter estimates. Test lengths of 5, 10, 20, and 25 attempts were utilized because they represent what is seen in practice. It is common forinstructors to have students serve 5 times each to the right and left service court in tennis, serve 10 times in volleyball, and shoot 20 free throws in basketball. Twenty-five was viewed as the maximum number of attempts that an instructor would require. Test difficulty represents the latent trait ability associated with a.5 probability of being successful given one attempt. Because values of b typically fall between -2.0 to 2.0 (Hambleton, Swaminathan, & Rogers, 1991, p. 13), -1.2,0.0, and 1.2 were selected for the study. The easiest item is represented by -1.2. and 1.2 represents the most difficult item of the three. A general model for item dependency developed by Spray (Ackerman & Spray, 1987) was used to generate dependent item responses during the last 20% of the trials. This model is described briefly in the appendix. The degree of dependency was con trolled by manipulating the alpha and beta parameters (a, {J) of the dependency model. An independent trial condition was introduced into the data (1, 1) along with simulated practice effects or increased probability of a correct response following a correct response ([1, .6] [1,0]) and simulated fatigue or increased probability of an incorrect response regardless of the previous response ([0, .6] [0,1]). Dependency was introduced for all simulated examinees, which depicts the worst case scenario. Examinee ability or latent trait was pseudorandomly drawn from a standard normal distribution (i.e., a random number subroutine generated ability scores), and the /;parameter of the binomial trials model was esti-

The results were similar for all simulations of the specified true h-parameters (-1.2, 0.0, 1.2); therefore, only the resul~ associated with b = 0 are presented. Estimation bias (h - h) for b = 0 appears in Table 1. As would be expected, the bias in the hestimates was negligible or close to zero when the trial responses were independent of each other. The bias was not negligible, however, when the probability of getting a correct response following a previous correct response increased. The magnitude of the bias found under all simulated dependent conditions was greater than .10. When stan-

RDES: December 1992

P(~=x.18.,h) = [n] I

I

Xi

exp[xi(8rh)] , [1 + exp (8 r h)]"

(1)

Tablet Estimation bias (b - D) bydegree oftrial dependency

N

Independence Trials 11, 1)"

Simulated Practice 11, .6)b I1,O)b

Simulated Fatigue (0, .6)" (0, 1)"

100

5 10 20 25

-.004 .001 .003 .003

-.182 -.143 -.116 -.114

-.516 -.439 -.421 -.419

.313 .284 .321 .325

.492 .427 .426 .423

200

5 10 20 25

-.005 .004 .009 .006

-.185 -.135 -.111 -.108

-.518 -.429 -.412 -.414

.308 .292 .321 .330

.492 .430 .429 .427

500

5 10 20 25

.00Dd .005 .001 .003

-.180 -.133 -.118 -.112

-.509 -.429 -.419 -.418

.324 .299 .313 .323

.505 .437 .420 .424

2,000

5 10 20 25

.001 .001 .001 .001

-.175 -.138 -.118 -.115

-.504 -.433 -.419 -.420

.326 .295 .312 .323

.506 .432 .420 .423

Note. (I, I) represents a. and ~ values for dependency model. "Independent trial condition. blncreased probability of correct response following a correct response. "Increasedprobability ofincorrect response following eithera correct or incorrect response. dRounded to zero.

357

Downloaded by [New York University] at 17:32 06 May 2015

Looney andSpray

dardized (i.e., when the bias was divided by the standard error of estimate as given in Table 2), only 7 out of 64 simulated dependency conditions showed values less than 2.5. Therefore, this was considered significant bias. Regardless of sample size and the test length, test difficulty was underestimated so that the test appeared to be easier than the nominal level of bwould assume. Bias was greater for the most extreme case of trial dependence; however, the magnitude of the bias in the /,estimates did not shrink appreciably as sample size increased, and it decreased only slightly as test length increased. The increased probability of getting an incorrect response following either a correct or incorrect response was viewed as a fatigue effect. Regardless of sample size and test length, test difficultywas overestimated; thus, the test appeared to be more difficult than the nominal value of bwould imply. Once again, an increase in sample size had little effect on the magnitude of the bias in these estimates of b. Similarly, bias did not shrink appreciably when test length was increased from 5 to 25 trials. In other words, the magnitude of the bias did not reach the near zerodefinition of negligible bias given previously. Standard deviations of the b-estimates are presented in Table 2 for the b-parameter of 0.0. A decrease in standard deviations of .01 was chosen to be practically significant. As would be expected, standard deviations decreased with increasing sample size, and varied little across conditions of dependency (.009 to .126). As test length increased from 5 to 10 trials for all sample sizes, standard error decreased. Decreases in standard errors were also seen as test length increased from 10 to 20 trials for sample sizes of 100, 200, and 500. An acceptable standard error is determined subjectively based on the types of decisions to be made with the data. The information in Table 2 can help researchers and test developers determine the sample sizes needed for their particular applications of the binomial trials model. In Figure 1 the item characteristic curve for one trial is presented for two severe conditions of bias (N= 100, trials = 10). The vertical axis is the probability of being successful for any single trial. The binomial trials model assumes the probability for one attempt is constant for each subsequent attempt. When the assumption oflocal independence is violated, this probability is not constan t across trials. Note how the logistic function shifts from ~e true condition (b =0) !o either one of positive bias (b = .43) or negative bias (b =-.44) when the assumption is violated by fatigue and practice effects, respectively. The major implication of the findings suggests that violations oflocal independence, at least as simulated in this study, could seriously bias the difficulty parameter estimates. It must be remembered, however, that this study simulated the worst case scenario, where all examinees tested exhibited dependency during the last 20% of the trials. Further work should investigate the bias that

358

occurs if less than 100% of the examinees exhibit the dependency, which would be more realistic in most testing situations. Several questions still need to be addressed. Will the bias in b-estimates be negligible when only a subset of Tabla2. Standard deviation of b bydegree oftrialdependency

N

Simulated Independence Practice (1, .6)b (1,O)b (1, 1)" Trials

Simulated Fatigue (0, .6)" (0, 1)"

100

5 10 20 25

.093 .068 .049 .045

.102 .075 .050 .047

.116 .080 .058 .051

.126 .093 .059 .057

.120 .083 .053 .052

200

5 10 20 25

.065 .053 .037 .033

.071 .057 .037 .036

.080 .061 .037 .035

.093 .063 .043 .036

.085 .059 .039 .033

500

5 10 20 25

.049 .033 .024 .021

.051 .034 .025 .021

.060 .038 .027 .022

.056 .040 .030 .025

.055 .038 .028 .023

2,000

5 10 20 25

.022 .015 .009 .009

.024 .014 .010 .010

.028 .016 .011 .010

.032 .018 .013 .014

.029 .017 .011 .012

Note. (I, I) represents a and ~ valuesfor dependency model. "Independent trialcondition. blncreased probability ofcorrect responsefollowing a correct response. "Increased probability ofincorrectresponsefollowing eithera corrector incorrectresponse. 1.0

0.' 0••

o.? 0••

f

o.e 0.4 0.1 0.8 0.1 0.0 -3.0

-1.0

-1.0

0.0

1.0

8.0

3.0

B

Figura 1.Examples ofbiased item characteristic curveswhen b = O.

ROES: December 1992

Looney andSpray

Downloaded by [New York University] at 17:32 06 May 2015

examinees' responses violates the assumption of local independence? What is the maximum number of trials that can have item dependencywithout severelyaffecting estimates of test difficulty? How will the answers to these two questions interact to affect the bestimates? The binomial trials model has potential for use with psychomotor data, but the amount of dependency in item responses that can exist without seriously biasing the bestimates still needs to be determined. Researchers cannot assume that practice and fatigue effects are negligible and proceed to calibrate psychomotor tests. At the very least they must demonstrate that trial responses are free of severe practice and fatigue effects before fitting the data to the binomial trials model.

References Ackerman, T., & Spray,]. A. (1987). A general nwdelfur item dependency. ACf Research Report Series 87-9. Iowa City:

American College Testing. Disch,J. (1987). Recent developments in measurement and possibleapplications to the measurement of psychomotor behavior: A response. Research ~rler#y fur Exercise and Sport, 58, 210-212. Hambleton, R. K (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational measurement (8rd ed., pp. 147-200). NewYork: Macmillan. Hambleton, R. K, Swaminathan, H., & Rogers, H.J. (1991). Fundamentalsofitemresponse theory. NewburyPark,CA: Sage. Masters,G .N., & Wright, B.D. (1984). The essential process in a family of measurement models. Psyc1wmetrika, 49, 529544.

Safrit,M.J. (1987). The applicabilityofitem response theory to test of motor behavior. Research Quarterly fur Exercise and Sport, 58, 21~215. Safrit, M.J., Cohen, A. S. & Costa, M. G. (1989). Item response theory and the measurement of motor behavior. Research ~rlerly

fur Exercise and

sport, 60, 825-885.

Spray,J. A. (1987). Recent developments in measurement and possibleapplications to the measurement of psychomotor behavior. Research ~rler#y fur Exercise and sport, 58, 20~ 209. Spray,J. A (1990). One-parameter item response theory models for psychomotor testsinvolvingrepeated, independent attempts. Research ~rlerly fur Exercise and sport, 61, 162168.

Wood,T. M. (1987). Putting item response theoryintoperspective. Research ~rlerly fur Exercise and Sport, 58, 216-220.

ROES: December 1992

Appendix The item or trial dependency model is defined as follows. Let lj (8 j ) represent the probability ofan examinee with ability, 8;. obtaining a successful performance on any trial j according to Equation 1. To link the trials or any subset of trials in a dependent fashion, define transition probabilities between two trials, j and j-l, as pictured below. j o 1

o

I-a;

a;I}•

1

13;

1-P;

j-l

In this model, aij represents the probability that an examinee will move from an unsuccessful attempt or 0 on trial j-l to a successful attempt or 1 on trialj. Similarly, represents the probability that an examinee will move from a successful attempt on trial j-l to an unsuccessful attempt on trial j. The model then specifies that

p;)

< = eu;( 8

i)

and

P' ij =PQj( 8i ) ,

where Qj(8i) = I-1j(8,).

The parameters, a and P, are used as dependency weights with 0 S as 1 and 0 S pSI. These parameters establish the amountand direction ofdependency in the trials. When a = P= I, the trials are independen t; when a = P= 0, the trials are completely dependent. Obviously, different degrees of trial-to-trial dependency can be achieved by letting a or P independently take on values between 0 and 1.

Authors" Note Please send all correspondence to Marilyn A. Looney, Department of Physical Education, Northern Illinois University, DeKalb, IL 60115.

359

Effects of violating local independence on IRT parameter estimation for the Binomial Trials Model.

The appropriateness of the Binomial Trials Model for test data that consist of multiple attempts of the same item needs to be determined because the p...
648KB Sizes 0 Downloads 0 Views