511840 2013

HPQ0010.1177/1359105313511840Journal of Health PsychologyRees et al.

Article

Using Rasch analysis to evaluate the validity of the diabetes-specific version of the Illness Perception Questionnaire–Revised

Journal of Health Psychology 0(0) 1­–17 © The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/1359105313511840 hpq.sagepub.com

Gwyneth Rees1, Ecosse L Lamoureux1,2,3, Jing Xie1, Bonnie A Sturrock1 and Eva K Fenwick1

Abstract This study used Rasch analysis to examine the psychometric validity of the Illness Perception Questionnaire– Revised to assess beliefs about diabetes in 470 participants with Type 2 diabetes and 71 participants with Type 1 diabetes. All Illness Perception Questionnaire–Revised scales had psychometric issues, which included poorly utilised response categories, poor scale precision and multidimensionality. Following reengineering, only four of the eight scales (Consequences, Illness coherence, Timeline cyclical and Emotional representations) were psychometrically adequate according to the Rasch model. The diabetes-specific version of the Illness Perception Questionnaire–Revised provides suboptimal assessment of beliefs held by patients with diabetes.

Keywords diabetes, illness perception, psychometric evaluation, quantitative methods, Rasch analysis

Introduction There is growing evidence to suggest that individuals’ beliefs about diabetes are associated with self-management behaviours and clinical outcomes such as glycaemic control (Broadbent et al., 2011; Hagger and Orbell, 2003; Hampson et al., 1990; Mc Sharry et al., 2011; Skinner et al., 2006). Tools to assess beliefs about diabetes have been developed, for example, the Personal Models of Diabetes Interview (Hampson et al., 1990) and the Diabetes Specific Illness Representations Questionnaire (Skinner et al., 2003). However, the most frequently used tool to assess patients’ beliefs about diabetes

1Centre for Eye Research Australia, Royal Victorian Eye and Ear Hospital, University of Melbourne, Australia 2Singapore Eye Research Institute, National University of Singapore, Singapore 3Duke-National University of Singapore Graduate Medical School, Singapore

Corresponding author: Gwyneth Rees, Health Services and Ocular Epidemiology Research Unit, Centre for Eye Research Australia, University of Melbourne, Royal Victorian Eye and Ear Hospital, Locked Bag 8, East Melbourne, VIC 8002, Australia. Email: [email protected]

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

2

Journal of Health Psychology 0(0)

is the Illness Perception Questionnaire (IPQ) (Weinman et al., 1996). The IPQ was developed and validated across a number of patient groups providing a generic (i.e. not disease-specific) quantitative assessment of illness perceptions outlined in theory (Leventhal et al., 2003; Weinman et al., 1996). A revised version (Illness Perception Questionnaire–Revised (IPQ-R)) was devised in an attempt to improve the measurement properties of the initial version, and a short version of the measure has also been formulated (i.e. Brief IPQ (BIPQ)) (Broadbent et al., 2006). The IPQ and its more recent versions are widely used in diabetes research to gather patient beliefs about diabetes and, more recently, to inform intervention research. A recent review examining the link between patient beliefs and glycaemic control identified that the IPQ/IPQ-R/BIPQ had been used in 13 published studies (Mc Sharry et al., 2011). The IPQ-R was used most frequently, including in two intervention studies aimed at changing beliefs about diabetes (Mc Sharry et al., 2011). Despite widespread use of these measures in diabetesrelated research, the psychometric properties of these instruments have received little attention specifically within a population with diabetes. We are aware of only one validation study which conducted confirmatory factor analysis on data from Turkish patients with diabetes and/or cardiovascular disease (Brzoska et al., 2012). All versions of the IPQ were developed and validated using Classical Test Theory (CTT), and recent studies have highlighted issues with content validity (French and Weinman, 2008; McCorry et al., 2013). CTT has a number of well-known limitations (Massof, 2004; Vanalphen et al., 1994), including the assumptions that (1) all items are of equal difficulty and therefore scored alike, (2) the change between Likert response categories (e.g. strongly disagree, disagree, agree, strongly agree) is uniform and therefore allocated scores can be summed to provide an overall score, and (3) the summed overall score represents interval-level measurement. Furthermore, CTT lacks strong indicators of scale validity or unidimensionality (Lamoureux and Pesudovs, 2011). Modern

psychometric methodologies, such as Rasch analysis (described further in the section ‘Research design and methods’), offer an alternative approach that overcomes the methodological problems associated with CTT as well as providing a transformation of ordinal scores to data approximating interval measures. The use of Rasch analysis to examine psychometric properties of measures commonly used in health psychology research is becoming more widespread (e.g. see Peter et al., 2013 for a report on Rasch analysis of the General SelfEfficacy Scale). However, to date, Rasch analysis has not been applied to any version of the IPQ-R, IPQ or BIPQ. Given the growing interest in assessing and targeting patient beliefs about diabetes in intervention research, it is critical that rigorously validated instruments are used. Therefore, this study aimed to assess the validity, reliability and measurement characteristics of the IPQ-R using Rasch analysis in a large clinical sample of people with Type 1 and Type 2 diabetes. A secondary aim was to attempt to maximise the psychometric properties of the IPQ-R if required.

Research design and methods Study design and participants This cross-sectional study was part of the Diabetes Management Project (DMP), a large prospective, longitudinal study of Englishspeaking adults (aged >18 years) with Type 1 or 2 diabetes which aimed to determine barriers to optimal diabetes management. In total, 609 participants were recruited from specialised eye clinics at the Royal Victorian Eye and Ear Hospital (RVEEH) and Diabetes Australia– Victoria, Melbourne, Australia, between 2009 and 2010. Participants were eligible for the study if they were aged 18 years or older, Englishspeaking, free of significant hearing and cognitive impairment and living independently. Participants underwent a comprehensive assessment which included clinical, biochemical and

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

3

Rees et al. anthropometric measures, and interviewer administered questionnaires on lifestyle, psychosocial factors, diabetes, knowledge and self-care activities. The DMP cohort had a mean age of 64.60 years (±11.6 years), and 65.5 per cent (n = 399) were male. The majority (n = 510, 83.7%) had Type 2 diabetes. Details of the methodology have been fully outlined previously (Lamoureux et al., 2012). Each participant provided written informed consent and ethical approval for the DMP was provided by the RVEEH Human Research and Ethics Committee (Project number 08/815H). The DMP protocol adhered to the tenets of the Declaration of Helsinki.

The IPQ-R Beliefs about diabetes were measured using the diabetes-specific version of the IPQ-R (MossMorris et al., 2002). The diabetes-specific version of the IPQ-R has the same items as the original English version of the measure, but participants answer questions in relation to ‘diabetes’ rather than to their ‘illness’. The IPQ-R and items referred to in this article can be accessed on the IPQ website: http://www.uib.no/ipq/. The IPQ-R is a quantitative measure of illness perceptions covering nine dimensions in three sections. The first section is the Identity scale which is assessed by a list of 14 symptoms (e.g. ‘Nausea’). Respondents are asked whether or not they believe the symptoms to be specifically related to their diabetes using a yes/no response format. The second section covers seven dimensions all rated on a 5-point Likert scale: strongly disagree, disagree, neither agree nor disagree, agree and strongly agree. Scores are summed for each scale (after reverse scoring as necessary), and the mean value is calculated. The Timeline acute/chronic scale is a measure of the perceived duration of diabetes (5 items, for example, ‘I expect to have diabetes for the rest of my life’). The Timeline cyclical scale measures the extent of illness variability and unpredictability (4 items, for example, ‘The symptoms of my diabetes change a great deal from day to day’). The Consequences scale is a measure of the perceived impact the illness has on

a patients’ life (6 items, for example, ‘My diabetes is a serious condition’). The Personal control scale measures the patients’ perceived control over diabetes (6 items, for example, ‘Nothing I do will affect my diabetes’). The Treatment control scale measures the patients’ perception that treatment is effective in controlling diabetes (5 items, for example, ‘My treatment can control my diabetes’). The Emotional representations scale measures the emotional states that patients associate with their diabetes (6 items, for example, ‘When I think about my diabetes I get upset’). The Illness coherence scale measures the extent to which patients believe they have a coherent understanding of their diabetes (5 items, for example, ‘My diabetes doesn’t make any sense to me’). The third section comprises the Cause items, which ask respondents to rate the extent to which they believe the 18 items are the cause of their diabetes (e.g. ‘overwork’) using the same Likert scale described above. The Cause items are analysed in a different manner from the other eight scales, and therefore analysis of the Cause items is not reported here.

Psychometric validation of the IPQ-R Psychometric evaluation of questionnaires using traditional methods, such as CTT, involves adding up a series of coded responses to items to produce a total raw score. As these raw scores are ordinal (rather than interval), they may not exhibit the essential properties of measurement. By using modern psychometric methods such as Rasch analysis, ordinal-level raw score data are converted into data approximating interval-level measures that demonstrate the essential features of measurement (Mallinson, 2007). We provide a summary of Rasch analysis here. The interested reader may also wish to refer to Tennant and Conaghan (2007), Da Rocha et al. (2013) and Pallant and Tennant (2007) for more detailed introduction and explanation of the Rasch measurement model. Online courses are also available at http://www.statistics.com/rasch1/

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

4

Journal of Health Psychology 0(0)

Under the Rasch model, items and persons are ranked on the same linear scale according to their ‘difficulty’ and ‘ability’, respectively, via a series of responses to items (Elliott et al., 2007). The Rasch model assumes that the probability of a given respondent affirming an item is a logistic function of the relative distance between the item’s location (i.e. difficulty) and the respondent’s location (i.e. ability) on this linear scale. Through the pattern of responses from respondents to items in an instrument, person ability and item difficulty calibrations are iteratively determined (Bond and Fox, 2001). The resulting response scale calibrations and person measures are expressed in log of the odds units, or Logits, positioned along a hierarchical scale (Prieto et al., 2003). This predictable pattern of responses is referred to as a Guttman scale and represents the expected pattern of data. Because the Rasch model is probabilistic not deterministic, some random variation in responses is expected. During Rasch analysis, the observed pattern of responses is compared to the pattern expected by the Rasch model using various fit statistics (outlined in detail below). If the observed responses differ markedly from the expected responses, then the items or response options in the questionnaire may be problematic and the scale may lack validity (Massof, 2002, 2004; Massof and Ahmadian, 2007; Vanalphen et al., 1994). As Rasch analysis is a strict form of item response theory where the data must fit the model, it ensures that a hierarchical structure of internal measurement is provided which enables subjective patient-reported outcomes to resemble objective scientific measurement. This is crucial for instruments used in intervention research, such as the IPQ-R. Rasch analysis also provides significant insight into the psychometric properties of the scale (Lamoureux and Pesudovs, 2011), as assessed by the following key parameters.

categories have an equal probability of selection. Theoretically, as participants with incrementally higher ‘ability’ are observed, each category of the scale in turn should be the most likely to be chosen. This can be inspected visually using category probability curves, in which categories are represented by peaks (Figure 1). If a peak is submerged, then at no point is this particular category ever likely to be chosen over its adjacent categories. Failure of the thresholds to advance in the expected order (i.e. ‘threshold disordering’) can be the result of over- or underutilisation of response categories and an inability of respondents to discriminate between them. Disordered thresholds can be detrimental to measurement and collapsing of adjacent categories may be necessary if it is logical to do so, and the precision of the scale improves (Bond and Fox, 2001; Linacre, 2005). Rating scales with a ‘neutral’ category in the middle (e.g. neither agree nor disagree) are often disproportionally endorsed and are almost always disordered. This is because the neutral middle category acts like a cut-off point and does not have a measurement range like other categories. Consequently, these types of rating scales usually need repairing. However, as it is not logical to collapse the middle category with either of the two adjacent categories (i.e. ‘agree’ and ‘disagree’), repairing such scales can be problematic. Two methods have been proposed, namely, to recode the middle response option as missing data (Pesudovs et al., 2010) and to merge 50 per cent of the data in the middle category to the adjacent categories on either side (assuming that the respondents might respond ‘agree’ or ‘disagree’ 50 per cent of the time if they were forced to choose when the neutral category did not exist) (Court et al., 2007).

Response category functioning

The capacity of the IPQ-R to discriminate between respondents with different levels of ‘illness perception’ was assessed by the Person Separa­ tion Index (PSI) and Person Reliability (PR) scores. Values of >2.0 and >0.8, respectively, are

Category thresholds represent the boundaries between adjacent response categories and correspond to the point at which these two adjacent

Scale precision

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

5

Rees et al.

Figure 1.  Category probability curves for the Emotional representations scale (Type 2 sample) showing (a) disordered thresholds and an underused category (Category 3, bottom line) and (b) ordered thresholds following the deletion of Category 3. Initially, Category 3 is never the most probable category to be chosen, irrespective of the respondent’s location along the scale. IPQ-R: Illness Perception Questionnaire–Revised.

considered minimally acceptable levels and represent the capacity of the scale to distinguish at least three levels of person ‘perception’ (Bond and Fox, 2001).

Unidimensionality The Rasch model requires that a scale measures a single latent trait, that is, that it is unidimensional. Unidimensionality was assessed by testing the items for local independence using principal components analysis (PCA) of residuals. The raw variance explained for the first factor should exceed 50 per cent and the unexplained variance by first contrast should be 2 eigenvalues, this suggests evidence of another dimension in the scale with the strength of ˜2 items which is bigger than that expected by chance. How well each item ‘fits’ the underlying trait was determined through an ‘infit’ MnSq statistic (Pesudovs et al., 2007). A value of 0.7–1.3 is

considered productive for measurement (Linacre, 2002). Values below 0.7 may indicate redundancy and values over 1.3 indicate an unacceptable level of measurement ‘noise’ in the responses.

Person fit In addition to item fit, it is critical to assess whether participants respond to items in the manner predicted by the Rasch model. Misfitting person response strings were assessed through the ‘infit’ MnSq statistics for person measures. Values greater than 2.0 indicate data that degrade measurement as the level of ‘noise’ is greater than useful information. This may indicate that these participants did not understand the items, guessed, or lost interest in the questionnaire. Persons with an infit MnSq score > 2.0 were iteratively removed so long as this improved measurement precision (Linacre, 2010).

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

6

Journal of Health Psychology 0(0)

Scale targeting The difference between the mean of item ‘difficulty’ (defined as 0.0 logits) and the mean of person ‘ability’ gives an indication of a scale’s targeting. Generally, a difference between the mean person and item score of >1.0 logits indicates notable mis-targeting (Pesudovs et al., 2007). Poor targeting occurs when persons generally have a higher or lower ‘ability’ than the most or least item difficulty threshold, or when items are clustered at particular levels of difficulty leaving large gaps (Pesudovs et al., 2007), which can be observed through visual inspection of the person–item map.

Differential Item Functioning Differential Item Functioning (DIF) is a statistical method for detecting whether sample subgroups (e.g. gender, age groups) systematically respond differently to certain items, despite having similar underlying ability. A DIF contrast of >1.0 logits for an item was considered large enough that the DIF effect has a substantive impact on measures, and a p-value < 0.05 was considered small enough to rule out a spurious finding (Linacre, 2005). DIF for the IPQ-R was assessed for gender, age group (0.7 and 2.0 as these responses are degrading to measurement. Item IP36 was deleted due to misfit which considerably improved all other fit statistics. This resulted in good precision (PSI = 2.42), no evidence of multidimensionality, no misfitting persons or items, good targeting and no DIF for any of the assessed parameters. Similar modifications also improved the Consequences, Illness coherence, Timeline cyclical and Emotional representations scales in the Type 1 sample, although some minor issues with item misfit and targeting remained (Table 6). Modifications were iteratively applied to all other scales. Despite this, the Timeline acute/chronic, Personal control, Treatment control and Identity scales did not achieve adequate fit to Rasch model parameters, with poor precision and multidimensionality remaining problematic.

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

11

Rees et al.

MEASURE 3

2

1

0

-1

-2

-3

PERSON - MAP - ITEM | + | | | | | |T . | | | + | | | | IP4 . | IP1 . | . | |S . | T+ | .### | | .######### S| | .######## | . | IP18 . M| .########## | . +M .############ | . S| . | .## | | .# T| . | . | . | + . | IP2 IP5 |S IP3 | | | | | | | + | | | |T | | | | | . + |

Figure 2.  Person–item map for the Timeline acute/chronic scale (Type 2 sample). To the left of the dashed line are the participants, represented by ‘#’ signifying nine participants and ‘.’ signifying one to eight participants, and on the right are the items, denoted by their item number. Participants with lower perceptions of this trait and the most ‘difficult’ items are near the top of the diagram, and participants with higher perceptions of this trait and items that can be performed with least ‘difficulty’ are near the bottom. This figure demonstrates that the means of person perception and item difficulty are very close. Importantly, however, 5 of the 6 items are located at the extreme ends of the spectrum and only 1 item, IP18, adequately targets participants’ responses on this trait, leaving large gaps in targeting. The poor targeting of item ‘difficulty’ to participant level of perception may explain the lack of precision displayed by this scale (PSI = 0.0). M: mean; PSI: Person Separation Index; SD: standard deviation; S: 1 SD from the mean; T: 2 SD from the mean.

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

Downloaded from hpq.sagepub.com at University of Texas Libraries on June 5, 2015

None None None None None None

None None None None None None

0.8 >0.7 and 0.7 and

Using Rasch analysis to evaluate the validity of the diabetes-specific version of the Illness Perception Questionnaire--Revised.

This study used Rasch analysis to examine the psychometric validity of the Illness Perception Questionnaire--Revised to assess beliefs about diabetes ...
709KB Sizes 0 Downloads 0 Views