Journal of Applied Bacteriology 1992, 73, 9498
A multiple logistic model for predicting the occurrence of Campylobacter jejuni and Campylobacter colj in water E. Skjerve and 0. Brennhovd Department of Food Hygiene, Norwegian College of Veterinary Medicine, Oslo, Norway 3596/2/91: accepted 15 February 1992
E. S K J E R V E A N D 0.BRENNHOVD. 1992. A multiple logistic regression model was established to predict the occurrence of Campylobacter jejunilcoli, related to index bacteria such as faecal coliforms, faecal streptococci, and sulphitereducing clostridia, in a water source in southern Norway. The fitted model indicated that faecal coliforms were strong predictors for C. jejunilcoli, although the water temperature also had a strong influence on results. Sulphitereducing clostridia, faecal streptococci, and season of the year had no significant influence on the results, in spite of their apparent predictor value as separate variables. T h e model employed offers a new approach to the relationship between index bacteria and the occurrence of pathogenic bacteria in water. Similar models can also be established in general food microbiology.
INTRODUCTION
Pathogenic microorganisms in water are of major concern for human health. Although waterborne pathogens are particularly important causes of infections in developing countries, frequently leading to death, they also pose significant health hazards in industrialized countries, if present in drinking water or swimming pools. Campylobacter jejunilcoli are recognized as major causes of human enteritis. Several waterborne outbreaks have been reported, including two from Norway (Mentzing 1981; Vogt et al. 1982; Taylor et al. 1983; Rogol et al. 1983; Dahl & Melby 1987). Although direct testing for most enteric bacterial pathogens is possible their detection in water or foods is often laborious. Such analytical problems have led to the supplementation of tests for pathogens with appropriate tests for indicator bacteria (Mossel 1982a). The principle of employing indicator organisms was introduced almost 100 years ago by Schardinger (1982) and Smith (1899, who independently suggested using Escherichia coli as an indicator of enteric pathogens such as Salmonella typhi. According to Mossel (1981), Ingram introduced the term ‘marker organisms’, and distinguished between indicator bacteria that suggest inadequate bacteriological quality of a general nature, and index organisms whose presence provides evidence of the potential occurrence of specific pathogens or toxinformers (Mossel 1981, 1982b). This Correspondence to: Dr Eystein Skjetve, Dtpartmenr of Food Hygiene, The Norwegian College of Veterinary Medicine, PO Box 8/46, DEP, 0033 Oslo, Norway.
terminology will be used throughout this paper, the term index bacteria being employed for those which indicate the presence of C. jejunilcoli. Several bacteria have been suggested as markers of faecal pollution of water, among them coliform bacteria ( E . coli and related lactose fermenting, Gramnegative bacteria), faecal coliforms (mostly, though not exclusively, E. coli), faecal streptococci, Enterobacteriaceae, and sulphitereducing closmdia (mostly Clostridium perfringens). Each bacterial group has had its proponents and opponents, but relevant markers for different environments have not been established (Hendriksen 1955; Buttiaux & Mossel 1961; Bonde 1963; Geldreich & Kenner 1969; Dutka 1973; Smith et al. 1973; Mossel 1978, 1982a, b; Bisson & Cabelli 1980; Cabelli 1982; Muller & Mossel 1982). Thus, the question as to which marker organisms are the best for different kinds of water is still controversial, and there is an obvious need for a new approach to deal with the matter. Mossel (1982a) suggested the use of a ratio : colony forming units (cfu) of marker organismlcfu of pathogenic organism (&factor) to ascertain the predictor value of the marker (index bacteria), and to determine from this the acceptable level for the marker organism in question. A linear relationship is difficult to use in the case, as most analyses for pathogens in water or food are qualitative (present or absent), quantification demanding a substantially increased workload. An alternative is to use statistical methods which can handle such dichotomous ( f ) outcomes. Logistic regression is one of these techniques, which is being increasingly applied in the analysis of epidemiological data and in
C . JEJUNI AND C . COLl IN W A T E R 95
medical decision making, adequate computer software having become available (Breslow & Day 1980; Schmitz 1986; Hosmer & Lemeshow 1989). There is also increasing agreement among statisticians that inferences based on regression methods often allow a better interpretation of data than analysis of variance approaches and simple hypothesis testing (Oakes 1986). In the present study, we introduce a multiple logistic regression model for deciding which variables can serve as predictor variables for the occurrence of C. jejuni/coli in water. With the established model the probability of isolating C. jejuni/coli in specific water samples may be predicted, based on results from analyses for index bacteria and other markers.
MATERIALS AND METHODS Biological data
Data were obtained from a study on the occurrence of index bacteria and C. jejunifcoli, through 1 year, in a water source in southern Norway (Brennhovd & Kapperud 1991). Water samples (246) were examined for faecal coliforms (FCB), faecal streptococci (FS), sulphitereducing clostridia (SRC), and C. jejuni/coli (CAMP). Filter methods as described by Brennhovd et al. (1991) were used. Briefly, samples of 100 ml were filtered through separate nitrocellulose membrane filters (Millipore) with a pore size of 45 pm and a diameter of 47 mm. Membranes were placed face up on selectivefindicator agar plates, whereafter plates were incubated and typical colonies were counted. For enumeration of CAMP, Preston agar (Bolton & Robertson 1982) was incubated for 24 and 48 h at 4243°C in a microaerobic atmosphere, for FCB mEndoagar LES (Difco) was incubated at 44°C for 24 h, for FS mEnterococcus agar (Difco) was incubated at 44°C for 24 h and for FS SFP agar (Difco) with Pcycloserine (Sigma) was incubated anaerobically at 44°C for 24 h. Water temperature and dates were also recorded. An external validation data set ( n = 96) containing the same variables was obtained from three different water sources around Oslo, Norway, as described by Brennhovd et al. (1991).
Table 1 Information in the data set used in establishing a
multiple logistic model for prediction. of the Occurrence of Campylobacter jejunilcoli in a Norwegian water source Season : DecembeFebruary (l), MarchMay (2), JuneAugust (3), SeptemberNovember (4) Temperature: 12°C (4) Faecal coliform bacteria (FCB)/100 ml (log,,) Faecal streptococci (FS)/lOO ml (log,,) Sulphitereducing clostridia (SRC)/100 ml (log,,) Campylobacter jejuni/coli present in 100 ml of water (Y/N)
shown in Table 1. Before logarithmic transformation, a constant of 1 was added to FCB, FS and SRC to avoid the nonexisting logarithm of 0. Of all 246 samples, 84 (34%) were found to harbour campylobacters. FCB were detected in 90% of samples, SRC in 97% and FS in 85% of all samples. The levels of FCB, S R C and F C are shown in Table 2. The KruskalWallis test was performed with the statistical/epidemiological programme EPIInfo (Centers for Disease Control, Atlanta, GA/World Health Organization, Geneva, Switzerland). T h e logistic model was built mainly as suggested by Hosmer & Lemeshow (1989) : (1) Univariate logistic analysis of each variable. (2) Variables with a Pvalue ~ 0 . 2 5 from Wald’s test (coeficient/standard error, s.E.) from the univariate analysis were included in the model building steps. (3) Selection of variables in the logistic model was done in a forward selection process, using the likelihood ratio for the model with and without the new variable as determinants. An improvement of the model was decided for an increase in the likelihood ratio by a Pvalue of 0.20. FCB, FS and SRC were tried both as continuous variables, and after creation of dummy variables (groups: 0, 110, 11100 and > 100 cfu/lOO ml). (4) Goodness of fit of the model was assessed by the HosmerLemeshow Xztest.
Table 2 Counts of faecal coliform bacteria (FCB)/lOO ml, faecal
Statistical analysis
A programme for logistic regression in epidemiology was used in the analyses (Multreg, Ludwig Institute of Cancer Research, Epidemiology and Statistical Unit, Sao Paulo, Brazil). Data for FCB, FS and SRC were transformed to logarithms, and dummy variables for seasons and temperature were created. The structure of the final data set is
streptococci (FS)/100 ml and sulphitereducing clostridia (SRC)/100 ml given as median and range counts
Faecal coliform bacteria (FCB)/100 ml
Faecal streptococci (FS)/100 ml Sulphitereducing clostridia (SRC)/100 ml
Median
Range
25 23
(0lO00) (0500) (0500)
8
96 E. SKJERVE AND 0. BRENNHOVD
(5) T h e model was validated on the second dataset using the HosmerLemeshow X2test for goodness of fit.
Table 4 Prediction of the Occurrence of Campylobacter jejunilcoli using a multiple logistic regression model. Predicted and observed
(6) T h e KruskalWallis 'test was run to determine the influence of season and temperature on the index bacteria (FCB, FS, SRC).
frequencies of detection for different cutoff levels for the prediction function P*
P* interval
n
Predicted
Observed
0.w.1 0.10.2 0.243 0.30.4 0.40.5 0.546 0.647 0.70.8 0409 0.91.0
54 48 25 26 33 11
3.1 6.9 6.2 8.9 13.9 5.9 7.8 7.3 14.7 9.2
3 11 4 7 14 5 6 9 16 9
Establishing a prediction model From the results of the logistic analysis, a probability function was established as described by Hosmer & Lemeshow (1989):
P* = P (C. jejunilcoli) = eT/(l + eT) where
+ (coefficient1 x varl) + (coefficient2 x var2) + . . . .
T = constant
P* in the prediction equation represents the overall predicted probability of the occurrence of C. jejunilcoli.
12 10 17 10
The probability function was thus established with the Tfunction from the final multiple regression model :
+ e+T) T = 4.38 + (1.62 x log FCB) + (C,,,,)
RESULTS
P* = eT/(I
Logarithms of counts gave better fit than original counts. Univariate logistic regression analysis revealed Wald statistics with Pvalues of less than 0.25 for all variables. All initial variables were therefore tested in the model. T h e main results from the model building are shown in Table 3. The fitted model included temperature intervals and FCB as a continuous variable. SRC, FS and season had no significant predictor value in the overall model. No interactions between the predictor variables in the model were detected in the analyses (P > 0.15). Table 4 shows the predicted number of samples with C. jejunilcoli compared with observed values. The HosmerLemeshow 'goodness of fit' statistics based on data from Table 4 gave a x2 of 8.33 ( P = OWSO),indicating a very good fit with empirical data. Applying the same model to the external validation data set gave a x2 of 10.67 ( P = 0.30). Table 3 Multiple logistic model for prediction of probability of
finding Campylobacter jejunilcoli in water. Model given with coefficients for each parameter with standard error (s.E.), and Wald test for deviation from zero (coefficient/s.E.) with corresponding Pvalue Variable
Coefficient (s.E.)
Temp. 12"C FCB/100 ml (log) Constant
0 2.17 (0.45) 2.65 (0.51) 1.12 (0.49) 1.62 (0.25) 4.38 (0.57)
FCB, Faecal coliform bacteria.
Wald (P)
where CTemp
= 0(&2"C), 2.18 (27"C),
2.65 (712"C), 1.12 (
> 12°C)
Figure 1 shows an idealized graphical interpretation of the predicted probabilities, P*, of finding C. jejunilcoli in the four different temperature intervals. If an acceptable prevalence of C. jejunilcoli is established, acceptable levels of FCB can also be read from the same figure. The
I .o
8 2
%

0.9 0$8 0.7 0.6 0.5 0.4
0.3 0*2 0.1 J
0
4.84 (