International Journal of Epidemiology © International Epidemiological Association 1992

Vol. 21, No. 6 Printed In Great Britain

Validity and Reliability of a Questionnaire on Fecundability GERHARD A ZIELHUIS, MARLIES E J L HULSCHER AND ELLIE I M FLORACK

Infertility is a medical problem which is of growing interest to the public and the medical profession. In a decade when pregnancy is often postponed because of professional careers, the time span in which successful pregnancies are expected becomes shorter. In addition, the probability of conceiving decreases with increasing age. As a result, the proportion of couples who experience conception difficulties will increase and the demand for medical help will grow. Although several new therapies for infertility have become available in recent years, the overall success rate of therapy is still low, thereby increasing the need for prevention of infertility. Apart from infection, knowledge about risk factors for infertility which can be used in preventive actions is only fragmentary. Important risk factors include pelvic inflammatory disease, reproductive behaviour and age, but there is also some evidence that lifestyle factors (e.g. smoking, caffeine intake, psychotropic substances) and occupational factors may affect fertility.1-9 In clinical practice, infertility is defined arbitrarily as not being able to conceive within a period of 12 months of unprotected intercourse. Fertility, however, is in essence a continuous phenomenon: the probability of becoming pregnant in a given menstrual cycle. In biological terms this is referred to as fecundity, which is an individual characteristic. For the charac-

terization of a population, 'fecundability' is used, i.e. the proportion of couples that conceive per cycle. This parameter has to be estimated indirectly by measurement of the length of the period between the start of unprotected intercourse until pregnancy occurs. This is called the time to pregnancy (TTP).4'Although the TTP has been studied in several selected populations in the USA and Great Britain,4-5'10-" the validity of the instrument (personal or telephone interviews) has not yet been well-established. Recently Baird et al. reported a correlation of 0.82 between a short, mailed questionnaire and a detailed telephone interview on TTP in a group of 523 dental assistants.12 In essence, however, both methods of data collection were retrospective. Some years ago, a European Concerted Action was started (project leader: S Juul, Aarhus University, Denmark) with the assignment of studying the variability in fecundability among European countries in relation to the variability in potential risk factors. The general idea in this study is that, in each participating country, a random sample of about 1000 women of reproductive age will be questioned with a short standard questionnaire. A draft questionnaire was developed during a workshop in Aarhus in 1989. This paper describes a study on the validity and repeatability of the core questions of this questionnaire. A major issue was to study these characteristics with three different methods of data collection: face-to-face interviews, telephone interviews and mailed questionnaires.

Department of Epidemiology, University of Nijmegen, PO Box 9101, 6500 HB Nijmegen, The Netherlands.


Downloaded from at Virginia Tech on May 14, 2015

Zelhuis G A (Department of Epidemiology, University of Nijmegen PO Box 9101,6500 HB Nijmegen, The Netherlands), Hulscher M E J L and Florack E I M. Validity and reliability of a questionnaire on fecundability. Internationa/Journal of Epidemiology 1992; 21: 1151 -1156. The validity of a short questionnaire on fertility problems was tested on a group of 151 moderately educated women who had participated in a prospective study on earty pregnancy failure some years before. Part of this group was approached by a personal interview, another part by telephone interviews and a third part was sent a questionnaire by mail. Reliability was tested on another population of 89 women with at least one child. The questionnaire was presented to these women twice with an interval of 3-5 weeks. The validity study showed no systematic errors for either erf the approaches. Random errors were fairly large except for the personal interviews. The reliability showed great stability for the telephone interviews as well as for the mailed questionnaire. It is concluded that the questionnaire performs reasonably well, but on some items the draft questionnaire should be rephrased. The choice of a specific approach to the study population depends on the research question.



MATERIALS AND METHODS The Questionnaire In designing the questionnaire on fecundability, an explicit attempt was made to keep it as short as possible, First, the at-risk status was established (ever been pregnant, ever experienced periods of unprotected intercourse). Then the accidental pregnancies were sorted out (pregnant while using a 'safe' birth control method). Subsequently, the respondent was informed about the term 'being at risk for pregnancy', followed by the two questions on the TTP:

Of the total group, 30 women who lived nearby were approached for a personal interview. Of the remaining population 61 women were randomly selected for a telephone interview and 60 were sent a questionnaire by mail. The distribution of the differences between both estimates of the TTP (the gold standard and the new questionaire) formed the basis for statistical analysis. The different methods of measurement were compared with respect to the distribution of these differences.

The Validity Study The validity of this questionnaire was studied on a population of women who had participated in a prospective study on early pregnancy failures 1 or 2 years before (Florack et al. University of Nijmegen, unpublished manuscript). The original population consisted of non-medical hospital personnel planning to become pregnant and recruited from 39 hospitals in the Netherlands. Only women who did become pregnant during the follow-up period of the original study (n = 151) were selected for participation in the current validity study. From the original data set the TTP was calculated as the period between the first month of unprotected intercourse and the first positive pregnancy test (standard two site immunoenzymometric 'sandwich-type' assay for human chorionic gonadotrophin, done 5 days or more afteT the expected menstruation date for every month that menstruation failed to come). These TTP values (in months) from the original study were used as the gold standard against which the new questionnaire was validated. Note that the start of these reference TTP values was also based on the interview information, provided by the respondents at enrolment in the original study. At that time the women were not yet pregnant but many had been trying to conceive for several months.

RESULTS Validity Of the population of 151 women approached, 132 (87%) participated in the validity study. The response was slightly different in the three subgroups: face-toface 77%, telephone 85% and mailed 95% (after two reminders). The three groups of respondents were comparable with respect to age (mean 30.2, 30.6 and 29.8 respectively), number of pregnancies (primiparae in 65%, 58% and 61% of the populations respectively) and education (57%, 63% and 54% secondary school or less). For 24 of the respondents (18%) a difference in the number of pregnancies between the two data sets was noticed. These respondents were excluded from further analysis, because the exact match of pregnancies in both data sets was lost. The pregnancies that were prospectively followed in the original study are called the index pregnancies. Figure 1 shows the distribution of TTP as reported in response to the questionnaire in the group of 100 women for whom complete data were available and for whom there was no disagreement between the two data sets on the number of pregnancies. Table 1 summarizes the results on the differences in the TTP for each measurement method in the same subset of 100 women.

Downloaded from at Virginia Tech on May 14, 2015

The questions were repeated for each pregnancy. Subsequently, the respondents were questioned about the occurrence of periods of 6 months or longer of unprotected intercourse which did not result in a pregnancy. Some questions on potential risk factors and help-seeking behaviour were added to this core questionnaire.

The Reliability Study The reliability of the questionnaire was studied in another population of 89 mothers with children of primary school age. These women were recruited via one school in the centre of Holland without any restriction. With respect to parity, social class and type of work, the group was representative for Dutch women with children in this age group. For example, the median educational level was three on a scale of one (primary school) to six (at least high school). Only the telephone and mailed version of the questionnaire were tested. The questionnaire was presented twice to the population with an interval of 3-5 weeks. The reliability of the TTP is described by means of the distribution of paired differences.

- Did you become pregnant in the first month of being at risk? - How many months did it take to get pregnant estimated from the first month of unprotected intercourse, without lactation, to the recognition of pregnancy?



number of women 70 60 50 40 30 20



11-16 16-20 21-26 26-30 31-36 36-40 41-46 46-60 61-66 66-60 61-66

time to pregnancy FIGURE 1 Distribution of questionnaire reported TTP on all index pregnancies with complete and interpretable data (n = 100)

TABLE 1 Difference between prospectivdy measured time to pregnancy (TTP) of index pregnancy and retrospectively measured TTP according to method of data collection. Only women with complete information and agreement about number of pregnancies are included

Difference in TTP (months) Method

Face-to-face Telephone Mail Total



95% CI


18 40 42 100

0.3 -1.4 -0.1 -0.6

-0.4-1.0 -3.5-0.7 -1.8-1.7 -1.7-0.5

1.4 6.5 5.6 5.5

The mean difference between the 'gold standard' TTP and the 'questionnaire' TTP in these index pregnancies was small and not significantly different from zero. Although the standard deviations (SD) were rather high for the telephone and mail approaches, the disagreement was less than 5 months in either direction for 80% of the total group. Comparing the TTP data from both sources of information about non-index pregnancies, gave similar results. Note that in this case both sources relied on retrospective recall in interview situations. For the women's first pregnancy (n = 122), for instance, the mean differences in the TTP were 0.06 months for

face-to-face interviews (SD = 1.1), -2.5 for telephone interviews (SD = 6.5) and -0.6 for mailed questionnaires. Only the mean difference for telephone interviews was significantly different from zero. Table 2 presents the validity of the questionnaire for each of the three modes of approach after dichotomization at different cutoff levels. These results are based on the same 100 index pregnancies. It shows that for high cutoff levels (12 months or higher) sensitivity and specificity are high for each method of approach. At cutoff levels of 3 or 6 months TTP, the performance of the personal interview remains good, while it reduces for telephone and mailed approaches. With respect to other periods of at least 6 months of unprotected intercourse which did not lead to pregnancy, the agreement between the two data sets was very poor. For example, one woman stated in the original study that she had experienced a waiting period of 18 months before the index pregnancy. In the second study, the same woman reported a period of 30 months. In addition, 19 other women (15 in the mailed questionnaire group) who did not mention such waiting periods in the original study, reported waiting periods varying from 6 to 48 months. Reliability In the reliability study 89 women were contacted. Of these, 76 (85%; 34 by telephone and 42 by mailed ques-

Downloaded from at Virginia Tech on May 14, 2015




TABLE 2 Sensitivity and specificity (%) at different cutoff levels for the estimation of reduced fecundabdity by method of data collection. Only index pregnancies with complete information are included (n = 100)

Method TTP cutoff level

Face-to-face (n = 18) Sensitivity Specificity Sensitivity Specificity Sensitivity Specificity Sensitivity Specificity

3 months 6 months 12 months 24 months

91% (10/11) 86% (6/7) 80% (4/5) 92% (12/13) 100% (2/2) 100% (16/16) 100% (0/0) 100% (18/18)

TABLE 3 Reliability of time to pregnancy: disagreement in duplicate measurements with an interval of 3-5 weeks Pregnancy number

Approach (twice)


Mean difference in reported TTP (months)



Telephone Mail Total Telephone Mai) Total Telephone Mail Total

33 37 70 28 31 59 14 9 23

-0.21 0.08 -0.06 -0.28 0.06 -0.10 0.00 0.33 0.13

0.66 1.08 0.90 0.90 1.39 1.18 1.21 0.71 1.03



' S E = standard error of the differences = V (0.5 with S D = sample variation of the difference between two measurements

DISCUSSION Our study results show that the validity of measuring the TTP by means of a questionnaire was reasonably good. Systematic errors were small for each of the approaches, but small random errors were only encountered in the face-to-face interviews.

Mail (n = 42)

83% (24/29) 45% (5/11) 86% (12/14) 88% (23/26) 78% (7/9) 94% (29/31) 100% (4/4) 100% (36/36)

92% (22/24) 67% (12/18) 61% (11/18) 100% (24/24) 75% (3/4) 97% (37/38) 100% (1/1) 100% (41/41)

This is the first validity study on measuring the time to pregnancy which uses a gold standard and different questionnaire approaches. Other authors have compared the TTP distributions from different study populations or used consistency checks within interviews as indicators of the validity of estimated Yip.4.10,11 Furthermore, most studies on the TTP were done on highly educated volunteers with a presumed higher probability of providing valid information. Only Joffe10 studied women working in industry. Baird et al.n presented data on dental assistants. Our study included moderately educated women (median 10 years of education), selected on the basis of having a non-medical job (kitchen, cleaning and clerical) in a hospital and being willing to participate in a prospective study. These women appeared to have fertility problems more often than would be expected from figures in the total population. This selection of the study group could have overestimated the validity of the questionnaire in general because infertile women could have better recall due to fixation on their problem. Using a relative indicator like TTP this overestimation would be less and at least equal for the three methods of data collection. It should be noted that the number of available subjects was limited and the recall period was rather short (3-20 months) with a possibility that respondents recall their TTP as reported in the original study. However, the focus of the original study was not particularly the occurrence of pregnancy but on several aspects of pregnancy (including outcome), which makes it unlikely that there was a large effect on validity estimation. Moreover, because the recall period was equal for the three approaches, the comparison between the personal interview, telephone interview and postal questionnaire, was unaffected.

Downloaded from at Virginia Tech on May 14, 2015

tionnaire) answered the questions twice, leading to 70 interpretable comparisons. Table 3 shows the comparability of the answers with respect to the time to pregnancy. There were no significant systematic differences in either the telephone interviews or the mailed questionnaires. The SD of the measurement error for TTP was about 1 month in all instances. This means that the TTP can be reproduced with an accuracy of about 2 months in 95% of the cases. The telephone and mail approach did not differ systematically in this respect.

Telephone (n = 40)



number of women




10 >

difference in months approach C H p»r«on«l

ESS t«l«phon»


FIGURE 2 Summary of the data with respect to validity of TTP questioning (difference between standard and reported TTP, for index pregnancies only, n = 100)

The gold standard was TTP partially based on retrospective and partially on prospective data, and therefore not perfectly gold. The population was too small, however, to restrict analysis to the subgroup of women for whom the original information was collected prospectively. Nearly one-fifth of the women were not consistent in the number of pregnancies they reported. Pregnancies which occurred between the time of the two studies could explain why some women report higher pregnancy numbers in retrospect. Lower numbers are most likely explained by pregnancies not ending in a livebirth. The questionnaire could be improved on this point by adding dates of births and by explicitly asking about the number of spontaneous and induced abortions, ectopic pregnancies and stillbirths. It appeared from the data and from the interviews that the question on 'other periods of unprotected intercourse of 6 months or longer that did not lead to pregnancy' was not understood at all. Because a pregnancy test at a fixed point after delayed menstruation was part of the original protocol, the validity results may not exactly generalize to the inherently noisier measure of time to recognition of pregnancy in general. Pregnancy recognition depends on pregnancy testing behaviour and this may even be related to exposures of interest like lifestyle factors. Moreover it should be taken into account that

the study population was restricted to women who participated in the survey with agreement on number of pregnancies etc. This might have led to an overestimation of validity, but again this will not disturb the comparison of modes of data collection. This study confirms the observation of Baird et al.*-12 that estimation of the TTP as a measure of fecundability is possible with reasonable accuracy. The questionnaire needs revision on some items before being used in large-scale epidemiological surveys. It was shown (Table 1 and 2; Figure 2) that the face-to-face approach provided the most valid data. Considering the SD of the difference in the TTP (Table 1), written questionnaires were only slightly better than telephone interviews. The response rates for all three approaches was high in this population, but it should be noted that the original population was already selected with respect to willingness to participate in a scientific study. Although the response figures are based on small numbers, the difference between the three modes of data collection are quite notable. In particular, the high response rate for the mailed approach adds to its potential as a useful research instrument. With respect to reliability, the additional study with repeated measurements after 3-5 weeks showed that the answers were stable within that period for telephone interviews as well as for mailed questionnaires. Finally, there was a remarkable difference in the cost of data collec-

Downloaded from at Virginia Tech on May 14, 2015

< -16






Baird D D, Wilcox A J. Cigarette smoking associated with delayed conception. JAMA 1985; 253: 2979-83. Baird D D, Wilcox A J. Future fertility after prenatal expoiure to cigarette smoking. FertU Steril 1986; 46: 368-72. Baird D D, Wilcox A J. Effects of occupational exposures on the fertility of couples. Occupational Medicine: State of the An Reviews 1986; 1: 361-74.


Baird D D, Wilcox A J, Weinberg C R. Use of time to pregnancy to study environmental exposures. Am J Epidemiol 1986; 124: 470-80. 5 Hull M O R, Olazemer C M A, Kelly tiletal. Population study of causes, treatment, and outcome of infertility. Br Med J 1985; 291: 1693-97. 6 Mosher W D. Fecundity and infertility in the United States. Am J Public Health 1988; 78: 181-82. 7 Wilcox A J, Weinberg C R, Baird D D. Caffeinated beverages and decreased fertility. Lancet 1988; 31: 1453-55. 8 Wilcox A J, Baird D D, Weinberg C R. Do women with childhood exposure to cigarette smoking have increased fecundability? Am J Epidemiol 1989; 129: 1079-83. 9 Weinberg C R, Wilcox A J, Baird D D. Reduced fecundability in women with prenatal exposure to cigarette smoking. Am J Epidemiol 1989; 129: 1072-78. 10 Joffe M. Feasibility of studying subfertility using retrospective selfreports. J Epidemiol Community Health 1989; 43: 268-74. 1 ' Page H. Estimation of the prevalence and incidence of infertility in a population: a pilot study. FertU Steril 1989; 51: 571-77. 12 Baird D D, Weinberg C R, Rowland A S. Reporting errors in timeto-pregnancy data collected with a short questionnaire. Am J Epidemiol 1991; 133: 1282-90. 13 Kleinbaum D G, Kupper L L, Morgenstern H. Epidemlologic Research: Principles and Quantitative Methods. Belmont: Lifetime Learning Publication, 1982.

(Revised version received June 1992)

Downloaded from at Virginia Tech on May 14, 2015

tion for each of the three approaches in favour of the mailed questionnaire. It depends on the specific research question whether a cheaper but less valid method (approach by mail or telephone) will be sufficient or whether a more valid but also more expensive method is indicated. For example, in studying the distribution of the TTP in several European countries per se, the random error will not lead to biased estimates of the central parameters of this distribution. In aetiological research on the other hand, the measurement error could seriously mask relevant associations that exist in reality.13 Furthermore, it is possible that higher educated groups communicate better in written form, while less educated groups would prefer face-to-face interviews because this best matches their usual method of communication.

Validity and reliability of a questionnaire on fecundability.

The validity of a short questionnaire on fertility problems was tested on a group of 151 moderately educated women who had participated in a prospecti...
378KB Sizes 0 Downloads 0 Views