0895.4356/91 $3.00 + 0.00 Copyright 0 199 I Pergamon Press plc

J Clia Epidemiol Vol. 44, No. 7, pp. 633640, 1991 Printed in Great Britain. All rights reserved

RELIABILITY OF INTERVIEWER AND SUBJECT ASSESSMENTS OF NEVUS COUNTS IN A STUDY OF MELANOMA S. D. WALTER,‘* L. D. MARRETT’ and C. HERTZMAN~ ‘Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Ontario, 20ntario Cancer Foundation Epidemiology Research Unit, Department of Preventive Medicine and Biostatistics, University of Toronto, Toronto, Ontario and 3Department of Health Care and Epidemiology, University of British Columbia, Vancouver, B.C., Canada (Received in revised form 21 December

1990)

Abstract-Several types of data are presented concerning the reliability of counting or estimating the density of nevi (moles), a major risk factor for melanoma, using methods typically employed in epidemiologic studies. First, interviewer-derived counts of nevi on the arm produced estimates of inter-observer, inter-subject, temporal and random variability, and their interactions. Second, interviewer-derived arm counts and respondent self-reports of whole body nevus density were compared. Finally, we compared male and female cases and controls with respect to their reported rates of having a relative with a malignant mole. Overall, the intra-observer reliability ranged from 55 to 81%, and was better for observers with more experience. The correlation between the interviewer counts and respondents’ self-reported estimates was 0.41. The data on malignant moles in relatives suggest higher reporting rates in male cases and lower reporting in male controls relative to their female counterparts, but there is little difference by sex in the reporting of one’s own nevus density. Melanoma

Nevus

Reliability

INTRODUCTION

Having a large number of cutaneous nevi (moles) has been suggested as an important risk factor for melanoma. Although the precise role of the nevus in the etiology of cutaneous malignant melanoma is unknown, it apparently acts either as a precursor or as a risk marker. Several case-control studies have shown higher numbers of nevi in melanoma cases than controls, and melanoma lesions are often found at the same body site as a nevus

[I, 4. Despite the potential importance of nevi in the etiology of melanoma, there is little infor*All correspondence should be addressed to: Dr S. D. Walter, Department of Clinical Epidemiology and Biostatistics, Health Sciences Centre, McMaster University, Hamilton, Ontario, Canada L8N 325.

mation available on the reliability of reporting or counting them. The objective of this paper is, therefore, to present several types of data concerning self-reports, interview responses and physical counts of nevi and associated variables. The emphasis will be on the consistency and comparability of the various types of data, which were derived using methods typically employed in practice for epidemiologic studies of melanoma. The data to be presented here are the first to be published concerning intraobserver, inter-observer, inter-subject and random variability of nevus assessment in the context of a case-control study, the main design used for etiologic studies of melanoma. We also can add to the one previous report on the consistency of different counting methods, including, for the first time, information on consistency in population controls. 633

634

S. D. WALTER et al.

Previous literature on nevus assessment methods

Bain et al. [3] described the self-reported counts of all moles and of palpably raised moles for the arms, thighs and legs in a large cohort of U.S. nurses. Case-control relative risks were then derived for various indices of mole frequency, but there were no direct attempts to estimate the reliability of the data. The authors concluded that “concentrating on one limited and accessible site would presumably promote compliance and give more exact absolute counts. . . Our results . . . suggest that the arm or forearm may be the best site. . .“. Some information on the reliability of interviewer-derived nevi counts in the context of a general population survey is available from New Zealand [4]. Trained observers obtained whole body counts of pigmented nevi with a maximum diameter of at least 2 mm for approximately half the sample, and unrestricted counts for the remainder. Randomly selected participants were independently recounted; the inter-observer correlation for the counts on the upper limbs and back was 0.81, and the correlation for the unrestricted count of all moles was 0.89. Over 90% of a sample of flat, tan nevi 24mm in diameter, initially identified by the interviewer, were confirmed from a photograph by a clinician as being consistent with a pigmented nevus classification. This appears to be the only previous study that has estimated inter-observer reliability of nevus counts. Some limited information on consistency of different methods exists in a case-control study [5] where a subset of the cases first gave a selfassessment of their whole body mole density (by choosing one of three schematic diagrams), and were then also physically examined by a clinician. Concordance was very good, with 88% crude agreement on the mole density category. Unfortunately the corresponding clinician information for controls is not available, because the physical examination was considered impractical. A recent review [2] of studies relating nevi to melanoma identified two further studies where patient self-assessments had been used, but concluded that there was potential for bias in each. Three other studies relied on clinical observers (nurses, physicians or dermatologists), but each used different counting criteria, rendering them non-comparable. An overall conclusion of the review was that “evidence is lacking on the repeatability of the various procedures used for counting nevi in published studies”. The present

paper is an attempt to fill this important gap in the literature. METHODS

We are able to present here three different types of data bearing on the reliability of mole reports. All the data were obtained during an actual case-control study of melanoma. First we give data on the reliability of nevus counts obtained by study interviewers during a quality control workshop. These data allow estimation of the inter-subject, inter-observer (interviewer), and random error components of variability in the nevus counts. Reliability coefficients are then derived. Second, we present data from the main study on the relationship between counts of nevi made by interviewers at a single body site and self-reported estimates of whole body nevus density. Finally we present information on the proportions of male and females who report having a parent or sib with a malignant mole. We would not expect these percentages to vary substantially by sex, so differences may indicate reporting bias. Reliability of interviewer observer data

The interviewers were participating in a large study of case-control population-based melanoma in a six-county area of southern Ontario [6]. The bulk of the study data was obtained through a questionnaire administered by the interviewers. Additionally, the interviewers made physical assessments of skin colour and nevus density. We have reported elsewhere on the reliability of the skin colour data [7] and other aspects of the interview methodology [8]. The interviewers were trained initially in the recognition of nevi by an expert dermatologist/pathologist, who was also a co-investigator on the project. The interviewers also periodically consulted the expert on nevus definition during the study. A few months after the start of the study, a routine meeting of the investigators and staff was held to review various aspects of the study procedures and administration. One of the activities at the meeting was a workshop to evaluate the reliability of the physical measurements taken by interviewers. The nevus counts were made at the same time as the skin colour assessments, using the same methods as were employed in the actual study interviews. Five female interviewers participated as the observers in the workshop. Two had clinical

635

Nevus Counts and Melanoma backgrounds (in nursing) but the others had no clinical training; however all had had experience with other health interview studies. The subjects were a convenience sample of 8 volunteers (4 male, 4 female) who were not participants in the main study. Mole density was not considered in their selection. However, in order to provide a range of skin colours, the sample included one person of Indian ethnicity, and one of Philippine ethnicity; the other six subjects were white, with a variety of European ancestries. The observers assessed the subjects in random order. Each subject was assessed twice by each observer over a 2 hr period, under constant lighting conditions. Several other activities took place between the first and second observation periods, which reduced the likelihood of observers remembering their initial assessments. The observers were instructed to record the total number of pigmented and palpably raised nevi on all surfaces of the left arm, for 4 inches above and below the elbow. A total of 80 counts were taken (5 interviewers x 8 subjects x 2 replicates). The balanced factorial layout of the observations allows the use of analysis of variance (ANOVA) to estimate the components of variability associated with each of these factors and their interactions [9]. Subject and observer were taken as random factors. Their variance components were derived by equating each mean square to its expectation as an appropriate weighted combination of each of the variances and random error [9], similar to the approach described in our earlier report on skin colour [7]. The variance components estimate the absolute and relative amount of variability in the data that are associated with subjects, observers and random error, and also are required for the estimation of reliability coefficients. Comparison reports

of

interviewer

counts

and self-

Because of constraints of feasibility and acceptability, we had elected to use only a single body site nevus count during the main study. It has been shown [2,3,8, lo] that the relationship to melanoma is broadly similar for nevi counts at particular body sites or for the whole body, so that the arm count is probably the most reasonable practical alternative [3]. Nevertheless we were aware that restricting the physical count to the arm might limit the usefulness of the data, because of the relatively small expected number of nevi. Therefore during the main study, participants were also shown four mole

diagrams (Fig. l), and asked to say which one most closely resembled their body. These diagrams were based on those used in a previous study [5]. The question asked of participants was: “Could you please tell me which of these diagrams best describes the number of moles on your body-that is, any brown spots, either raised or flat, which do not come or go with sunlight?”

when Interviewers informed respondents, necessary, that the diagrams were not intended to correspond exactly to the location of nevi, but only to give a general impression of the overall density. The self-report question was designed for simple understanding, even though it includes flat spots, some of which may not be nevi. Although the self-report thus differs from the interviewer count definition of a nevus, it is still possible to compare the two variables for ordinal correspondence. This comparison was made using the cases and controls in the main study. Rates of reporting relatives with a malignant mole

As part of the interview during the main study, participants were asked if any of their parents, sisters or brothers had had a malignant mole. Although males have a lower incidence and higher mortality from melanoma than do females, the proportions of males and females in the general population with an affected relative should be similar. We therefore compared our male and female cases and controls to examine the possibility of reporting bias. Such bias, if present, might also suggest bias in the selfreported nevus counts. RESULTS

Reliability of interviewer observer data

Table 1 shows the ANOVA table derived from the quality control workshop data on the interviewer-derived nevus counts. The overall mean count was 1.12, and the overall standard deviation was 0.99. The effect of observer was not significant, but there was a significant interaction between subjects and observers (interviewers). In other words, some subjects were counted with significantly more nevi by some observers than by other observers. The overall error variance was 0.23, indicating an error (residual) standard deviation of 0.48 nevi, after allowing for the variation between subjects and observers, and their interaction.

636

S. D. WALTERet al.

2

4

Fig. 1. Diagrams used for respondent self-reports of whole body nevus density.

In view of the subject-observer interaction, we next carried out separate ANOVAs for each observer. The objective was to estimate the reliability of each observer, and to relate reliability to their level of previous experience with the observation method. At the time of the workshop, interviewers had completed between 2 and 19 field interviews. Table 2 shows the reduced ANOVA for the first observer (A), with highly significant variation between subjects. Also shown are the Table 1. ANOVA for interviewer nevus counts Effect Subject Observer Subject by observer interaction Error Total

df

MS

F ratio

32.5 8.7

7 4

4.6 2.2

4.9t (**) 2.3t OUS)

26.3 9.5 77.0

28 40 79

0.9 0.2

4.03 (“)

SS

SS = Sum of squares; df = degrees of freedom; MS = mean square. tEffe-ct compared to interaction mean square. $Effect compared to error mean square. **p < 0.01; NS = Not significant.

calculations for the variance components associated with subjects and random error, which are derived as follows. The expected mean square for error is simply ai, the residual variation. Hence the estimate of 0; is 0.3. The expected mean square for subjects is CJ~+ 2 ai, where cri is the between subject variance [9]. (This is because there are two replicates for each subject by each observer). The esimate of a; is therefore (MSs - MSr)/2, which is (2.9 - 0.3)/2 = 1.3 in the data. In order to compute the reliability of single observations, we use the intra-class correlation Table 2. ANOVA of nevus counts by one observer and associated components of variance Variance component Effect Subjects Error Total

SS

df

20.4 7 2.5 8 22.9 15

MS

F

EOW

2.9 9.3 (**) a:+2ui 0.3 0:

W)

1.3 (81%) 0.3 (19%)

SS = Sum of squares; df = degrees of freedom; MS = mean square;__.E(MS) = expected mean square. **p c 0.01.

Nevus Counts and Melanoma

637

Table 3. Reliability of nevus counts, by observer Variance components (%) Observer A B C D E

Number of interviews completed 19 16 7 7 2

[91?which is the percentage of variation associated with the subjects. The intra-class correlation is the analogy of the kappa statistic used with categorical data. In the data of Table 2, it is (1.3 +0.3)/1.6 x 100% = 81%. The remaining 19% is associated with residual (random) variation, after allowing for betweensubject variation. Table 3 shows a synopsis of the same calculations for each of the observers, together with the number of interviews completed at the time of the workshop. There is a suggestion that reliability doesincrease with experience, although observer B is an exception to this. Observer B had the lowest error variance, and also the lowest mean nevus count. Re-examination of the original data showed that she had counted one particular subject as having no nevi on both occasions, whereas the other observers found an average of 1.6, with at least one nevus being found each time by each other observer. This discrepancy was largely responsible for the observer by subject interaction in Table 1. The subject in question was fairly heavily freckled, and later discussion with the expert dermatologist/pathologist revealed that observer B had had difficulty in distinguishing nevi from freckles in this subject. Further ANOVAs were carried out to investigate the effect of time, i.e. the differences between the first and second rounds of observation. Such an effect, if significant, would suggest a learning effect or “drift” in the mean count. Time and the observer-time interaction were added to the overall analysis (of Table l), but neither effect was significant. Similarly the time effect was not significant in any of the observer specific analyses (of Table 3), and the variance components for time were always very small. The subject-time interaction was not considered because subjects would not have changed their status during the short period of the observations. In summary, three of the five interviewers had reliability of 75% or better. The other two had reliabilities of 5560%; one of these (E) had

Mean count 0.9 0.6 1.5 1.3 0.8

Subjects 1.3 0.3 0.9 0.9 0.3

(81%) (57%) (78%) (76%) (55%)

Error 0.3 0.1 0.3 0.3 0.2

(19%) (43%) (22%) (24%) (45%)

only done 2 previous interviews, and the other (B) appeared to have had difficulty in differentiating nevi and freckles. Comparison reporfs

of

interviewer

counts

and self-

Table 3 shows the relationship between the mole diagram selected by respondents in the main study, and the corresponding arm nevus count obtained by the interviewer. Diagram 1 was much more likely to be selected by cases and controls with no arm nevi found; and diagrams 3 or 4 were much more often chosen by subjects with 3 or more arm nevi counted. Overall, the contingency coefficient [l l] (which measures the correlation between the two dimensions of the table, without regarding either of them as error-free) is 0.41, indicating a modest level of association. The contingency coefficient was similar in subgroups of the data defined by sex and disease status. Also the pattern of mole diagram choice was almost identical for males and females in both the case and control groups. Rates of reporting of relatives with a malignant mole

Table 4 shows the numbers of cases and controls who reported having a parent, sib, or child with a malignant mole. Female controls were significantly more likely to report a parent with a malignant mole than were male controls. Also significantly (p < 0.01) more male cases reported a parent with a malignant mole than did male controls, whereas the rates for female cases and controls were similar. None of the Table 4. Relationship between self-reported estimate of whole body mole count (mole diagram) and physical mole count of the arm Mole diagram

Mole count

0 l-2 3+ Total

1

2

3

4

Total

156 30 12 198

348 239 130 717

63 63 88 214

5 8 33 46

512 340 263 1175

S. D. WALTER et al.

638

other differences between sexes within case and control groups is statistically significant. The data on sibs and children were more sparse, presumably because of their younger age and shorter time at risk of developing a malignant mole; however these limited data were also in the direction of more reporting by the male cases relative to controls. DISCUSSION

The workshop data are limited because the overall mean nevus count per subject was small. This means that the general variability is low, and higher counts in a few subjects could potentially have had a substantial impact on the overall analysis. Because of potential non-normality in the data (arising from the discrete nature of the outcome variable, with small expected nevi counts), further analyses were carried out using the data transformed by a logarithm or square root. Essentially the same conclusions were reached as those presented here; we have elected to show the untransformed data because of the easier interpretability of the natural scale. The counts might have been more stable if nevi had been counted on the entire arm, rather than in the more restricted skin area we used. Despite this limitation, the workshop data showed very good reliability for most of the observers. Improvement may be possible for less reliable observers if the responsible factor(s) can be identified. In our study, poorer reliability in two observers was probably due to inexperience and an incomplete understanding of the distinction between moles and freckles: both of these factors are avoidable in the future. In the main interview study, there was a modest level of association between the subject self-description of whole body mole density (through the diagrams) and the interviewerderived mole count on the arm. However the association seemed strongest when considering people with very few or relatively many nevi. The latter group are at particularly high risk of melanoma. Bain et al. [3] have suggested using categories such as 0, 1,2 and 3 + for self-reports of nevi, and for analysis. When interviewers obtain the data, we recommend making an exact count, even though the data may be grouped subsequently. Also, because there is less than perfect association between selfreports and interviewer assessments, further work is needed to clarify which of these

measures (or both) might be most useful in predicting melanoma risk, both for etiologic research and population screening purposes. The data on relatives provide only indirect evidence of reporting bias for self-reports of mole density or counts, because these data are supposed to refer to malignant moles in relatives, as opposed to any moles in the respondent him/herself. Accuracy of reports concerning other people (the relatives) may or may not be related to accuracy of self-reports. Also ultimately we have no way of distinguishing more frequent reporting of true events from overreporting (e.g. by including non-malignant skin conditions or non-melanoma skin cancer) (NMSC). The overall rates of reporting malignant moles in parents (3 and 7% in males and female controls) do suggest over-reporting. The cumulative risk of melanoma (from birth to age 74) is approximately 0.5% [ll], which would yield an expected rate of at least one affected parent for the control group of 1% or less. However the cumulative risk of non-melanoma skin cancer is much higher. In Australia [12] the incidence ratio of treated NMSC to melanoma is approximately 40; in Canada, the cumulative risk of NMSC to age 74 ranges from 4 to 13% [ 131. Assuming a nominal figure of 10% for the cumulative risk, we might expect up to 20% of the population to have a parent eventually affected by NMSC. It is therefore likely that at least some of the reported malignancies in our data were actually NMSC. With this in mind, we note that there are two possible interpretations of the data. First, the greater reporting rates by female controls relative to male controls is consistent with females being more aware of health problems (either true malignancies or aesthetic effects) in others. Also, cases might have been expected to show higher reporting rates than controls because of their motivation to make enquiries concerning family members. In fact, this only happened in females. So one conclusion is that females are more likely to report accurately, and that (relative to females) males under-report when controls, and/or over-report when cases. A second possibility is that genetic and lifestyle variables, that are risk factors for melanoma and that have intra-familial correlation (e.g. vacation exposure to sunshine), could be expected to produce a larger percentage of affected relatives among the cases than the controls. If this were so, we might infer that

Nevus Counts and Melanoma

639

Table 5. Numbers of cases and controls reporting a relative with a malignant mole, by sex Controls

Affected parent Yes No Total p-Value*

Cases

Males

Females

Males

Females

9 (3.3%) 264 (96.7%) 273 (100%)

23 (7.3%) 290 (92.7%) 313 (100%)

26 (9.4%) 251 (90.6%) 277 (100%)

20 (6.4%) 291 (93.6%) 311 (100%)

0.04

0.22

Affected sib

Yes No Total

2 (0.7%) 273 (99.3%) 275 (100%)

p-Value Affected child Yes

No Total p-Value

6 (1.9%) 306 (98.1%) 312 (100%)

13 (4.8%) 260 (95.2%) 273 (100%)

0.29 0 (0%) 274 (100%)

274 (100%)

14 (4.5%) 294 (95.5%) 308 (100%) 1.00

4 (1.3%)

4 (1.4%)

310 (98.7%) 314 (100%)

275 (98.6%) 279 (100%)

0.13

3 (1.0%) 308 (99.0%) 311 (100%)

0.71

*Fisher’s exact test.

it is the males who are relatively more accurate, and that female controls over-report more frequently than the other groups of subjects. Subgroup analysis of the data in Table 4 showed very little difference between male and female controls in their mole diagram choices or physical mole counts. The New Zealand survey [4] also indicated very little difference by sex in the whole body nevus count. In total, therefore, we feel that there is only limited evidence for a sex-specific bias in reporting one’s own nevi; however, both sexes appear to over-report malignant moles in others, with this tendency being more pronounced in females. The preferred choice between self-reports and interviewer counts of nevi is unclear. There could be a case-control bias in the self-reports, that could bias the comparison if the whole body diagrams (Fig. 1) are used. On the other hand, self-reports of nevus density using diagrams can be easily collected in mail or interview surveys, can convey information on the whole body, and are predictive of melanoma risk [3]. These advantages and the potential bias of the diagrams method must be weighed against the relative unbiasedness of interviewerderived mole counts for only a small portion of the body, and of the relative predictability of melanoma risk with either method. As Bain et al. noted [3], there is a general lack of information concerning reliability of nevus counts. Only inter-observer reliability has been discussed previously [4] and that was in the context of a special community survey using whole body counts by trained clinical observers. There have been no previously published estimates of intra-observer variation, or of inter-

observer and inter-subject variation in the context of a case-control study. There has also been only one previous report [5] on the consistency of two different methods of counting nevi, such as self-reports and physical counts by a study interviewer; this was for a subset of cases only, so could not comment on this issue for controls [5]. Overall, our interviewer reliability on the nevus count for a limited area of one arm appears to be moderate to good. Although Bain et al. recommended using one limited and accessible site, we suggest that counting as much as possible of both arms may be desirable, in order to increase the average count per subject, and thus yield more informative data for people with intermediate nevus densities. We also recommend that interviewer-examiners be well trained and practised in making nevi counts. Finally, additional quality control exercises of the kind described here might be contemplated during the execution of a study, so that reliability may be assessed and monitored. Although other authors have not addressed the issue of nevus count reliability in detail, it is reasonable to suppose that less than perfect reliability in assessing nevi has prevailed in most previous case-control studies of melanoma. Despite this, nevus density has emerged as a relatively strong risk factor. One might imagine, therefore, that methods to obtain nevus counts with greater accuracy and reliability would provide an even stronger indicator of melanoma risk. Acknowledgements-This study was supported in part by the National Cancer Institute of Canada and the Dep;rrt-ment of Health and Welfare, and by Naticjn,l’ It

S. D. WALTER et

640

Scientist and Scholar awards from the National Health Research and Develonment Program to SDW and LDM respectively. We would like to ac?knowlege the cooperation of Norma Mishkel (study coordinator), the study interviewers and subjects in the workshop, and to thank Drs B. Pasternack and N. Dubin of the New York Medical Centre for permission to use their mole diagrams. The helpful comments of the referees in clarifying certain parts of the analysis are also acknowledged.

5.

6.

7. REFERENCES Armstrong BK, English DR. The epidemiology of acquired melanocytic naevi and their relationship to malignant melanoma. Melanoma and naevi incidence, interrelationships and implications. Pigment Cell 1988; 9: 21-47. Swerdlow AJ, Green A. Melanocytic naevi and melanoma: an epidemiological perspective. Br J Dermatol 1987: 117: 137-146. Bain C, Colditz GA, Willett WC et al. Self-reports of mole counts and cutaneous malignant melanoma in women: Methodologic issues and risk of disease. Am J Epidemiol 1988: 127: 703-712. Cooke KR. Frequency of benign pigmented naevi in the general population. Melanoma and naevi incidence, interrelationships and implications. Pigment Cell 1988; 9: 8-26.

8.

9. 10.

11. 12.

13.

al.

Dubin N, Moseson M, Pasternack BS. Epidemiology of malignant melanoma: Pigmentary traits, ultraviolet radiation, and the identification of high-risk populations. Recent Results in Cancer Research: Epidemiology of Malignant Melanoma. Berlin: Springer_ Verlag; i986: Vol.102, 56-75. Walter SD. Marrett LD. From L et al. The association of cutaneous malignant melanoma with the use of sunbeds and sunlamps. Am J Epidemiol 1990; 131: 232-243. Hertzman C, Walter SD, From L et al. Observer perception of skin color in a study of malignant melanoma. Am J Epidemiol 1987; 126: 901-911. Walter SD, Marrett LD, Mishkel N. Effect of contact letter on control response rates in cancer studies. Am J Epidemiol 1988; 127: 691-694. Winer BJ. Statistical Principles in Experimental Design, 2nd edn. New York: McGraw-Hill; 1971. Swerdlow AJ, English J, MacKie RM et al. Benign melanocytic naevi as a risk factor for malignant melanoma. Br Med J 1986; 292: 1555-1559. Cancer ln Ontario 1988. Toronto: Ontario Cancer Treatment and Research Foundation; 1988. Giles GG, Marks R, Foley P. Incidence of nonmelanocytic skin cancer treated in Australia. Br Med J 1988; 296: 13-17. Muir C, Waterhouse J, Mack T et al. Cancer Incidence in Five Continents. Vol 5, IARC Sci. Publ. No. 88. Lyon: IARC: 1987.

Reliability of interviewer and subject assessments of nevus counts in a study of melanoma.

Several types of data are presented concerning the reliability of counting or estimating the density of nevi (moles), a major risk factor for melanoma...
830KB Sizes 0 Downloads 0 Views