Journal of Environmental Radioactivity 129 (2014) 121e132

Contents lists available at ScienceDirect

Journal of Environmental Radioactivity journal homepage: www.elsevier.com/locate/jenvrad

Determination of radon prone areas by optimized binary classification P. Bossew German Federal Office for Radiation Protection, Köpenicker Allee 120 e 130, 10318 Berlin, Germany

a r t i c l e i n f o

a b s t r a c t

Article history: Received 23 August 2013 Received in revised form 10 December 2013 Accepted 19 December 2013 Available online 10 January 2014

Geogenic radon prone areas are regions in which for natural reasons elevated indoor radon concentrations must be expected. Their identification is part of radon mitigation policies in many countries, as radon is acknowledged a major indoor air pollutant, being the second cause of lung cancer after smoking. Defining and estimating radon prone areas is therefore of high practical interest. In this paper a method is presented which uses the geogenic radon potential as predictor and thresholds of indoor radon concentration for defining radon prone areas, from which thresholds for the geogenic radon potential are deduced which decide whether a location is flagged radon prone or not, in the absence of actual indoor observations. The overall results are different maps of radon prone areas, derived from the geogenic radon map, and depending (1) on the criterion which defines what a radon prone area is; and (2) on the choice of score whose maximization defines the optimal classifier. Such map is not the result of a transfer model (geogenic to indoor radon), but of the optimization of a classification rule. The method is computationally simple but has its caveats and statistical traps, some of which are also addressed. Ó 2013 Elsevier Ltd. All rights reserved.

Keywords: Radon prone area Geogenic radon potential ROC graph

1. Introduction The basic concept of geogenic radon prone area (RPA) is a region where for natural, i.e. geogenic reasons elevated indoor radon (Rn) levels or an elevated probability of their occurrence must be expected. Geogenic factors are high radium concentrations in rock and soil, high permeability due to the coarse texture of the ground or due to fracturing of rocks, hydrological peculiarities and others. Knowing RPAs can assist allocating resources more efficiently for denser surveys, remediation of affected houses and regional implementation of stricter building codes for new houses. Therefore considerable work is being invested into methods of estimating such regions from observed data of geogenic quantities and/or indoor Rn measurements. Although RPA as understood here is, by its very concept, a geogenic entity independent of anthropogenic factors, such as house type or living habits, its quantification e that is, decision whether a certain location is flagged RPA or not e should be linked to quantities related to Rn risk. The latter is often quantified through a proxy, namely the probability that indoor Rn concentration exceeds a threshold. (More precisely the chance of lung cancer depends on exposure to Rn progenies which is however more difficult to measure, and which is substituted by the average Rn concentration, or an exceedance probability.) Since indoor Rn

E-mail addresses: [email protected], peter.bossew@reflex.at. 0265-931X/$ e see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.jenvrad.2013.12.015

concentration is to a high degree controlled by anthropogenic factors, the practically defined RPA also contains these factors, contrary to its concept. One tries to reduce the influence of these factors by standardizing the indoor Rn values used for “calibration” of what is a RPA, e.g. by restricting to ground floor rooms in houses with basement (to be applied in this study), by recalculating indoor values to standard conditions (Austrian approach; Friedmann, 2005), or to restricting to ground floor rooms in single-family houses which have not been remediated with respect to Rn (Belgian approach, Cinelli et al., 2011; B. Dehandschutter, personal comm.). Several methods for defining RPAs in practice have been proposed. As spatial supports administrative or geological units or grid cells are common. Criteria to flag them as RPA include exceedance of mean indoor concentrations, or high probabilities that indoor Rn concentration thresholds are exceeded, or geological units which are known for high Rn potential (the geogenic property which leads to high indoor concentration in houses which allow Rn infiltration), etc. Quantities known to be related to indoor Rn, such as soil Rn (as in this article), geochemistry or dose rate may substitute indoor Rn for assessing RPAs. In this article a binary classification based on ROC (receiver operating characteristic) curve analysis is proposed. It establishes an optimal classification of the predictor (radon potential) by optimizing the number of cells that have been classified correctly according to a target criterion, related to indoor Rn concentration. The optimum is defined by a score derived from the ROC statistic.

122

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

2. Materials and methods 2.1. Data (2) In Germany the following data are available currently (2013): a set of about 4000 measurements of Rn concentration in soil air together with soil permeability; about 40,000 measurements of indoor Rn concentrations, ca. 15,000 of which in ground floor rooms of houses with basement; and geology on 1:1 M resolution, or finer. The soil Rn sampling points are reasonably distributed over most of the territory, though being strongly clustered in certain regions. The indoor samples originate from various, mainly uncoordinated surveys. They are highly clustered and their representativeness is questionable in some cases. Some regions are very poorly covered. It is therefore difficult to derive estimates of RPAs directly from indoor Rn data, although of course they give an indication about which regions are affected. The idea is therefore to use the geogenic quantities e geology as a categorical variable with area support and soil Rn as a continuous one with point support e as predictors and indoor Rn as “calibration” data. Based on these geogenic data one should be able to decide whether a location e in practice an area, either a grid cell or a geological polygon e is labelled RPA or not, given a certain definition of RPA. 2.2. The radon potential From the geogenic data a radon potential (RP) is defined and a RP map of Germany created. The RP definition follows a proposal by Neznal et al. (2004):

RP : ¼ CðsoilÞ=ð  10 logðkÞ  10Þ; with C(soil) is the Rn concentration in soil determined by the “Kemski protocol” (Kemski et al., 2002) in kBq/m3 and k, the permeability (same protocol) in m2. The RP is essentially, for medium permeability, proportional to the advective flux component normalized to the pressure gradient across an interface (e.g. ground e house). For very high and low permeability the RP is smoothed against this quantity. The RP is mapped for Germany on a 10 km  10 km grid by a method described in Bossew (2013a,b); put shortly, geological classes are used as deterministic predictors on top of which cell estimation is performed by conditional (to the point values of RP) simulation. This yields cell-wise statistics of the RP (the local ccdf, or conditional cumulative distribution function), from which statistics such as the expectation or confidence intervals and exceedance probabilities can be derived. An alternative would be ordinary kriging which would however not allow estimating local distributions. (Other, more complicated varieties of kriging would.) Whereas 3506 cells of area 10 km  10 km are available which contain a value of the RP, covering the whole of Germany (357,121 km2) with a few exceptions, the numbers of cells which contain a minimum number of indoor data is much smaller, as a result of their strong spatial clustering. While 1428 cells with at least one observation are available, only 407 are for at least 10, and 126 cells have at least 30 observations.

(3)

(4)

(5)

(6)

geometrical mean or the empirical probability to exceed a threshold, e.g. prob(C > 100 Bq/m3) ¼ number of C > 100 divided by total number of samples in the cell. Establish a criterion CRIT for a cell to be labelled RPA, such as: E[C] > 100 Bq/m3; or: E[prob(C > 100 Bq/m3)] > 10%, where E denotes the expectation. For each cell x* decide whether the criterion is met for the empirical estimates. This results in a binary coding of the cells, positive/negative, or {1,0}. For a given value rp of the quantity RP, decide for each cell x*, whether RP(x*)  rp or RP(x*) < rp. This results in another binary coding of the cells. Next, the two codings or classifications are compared, and the value rp0 which classifies RP(x*) such that the coincidence becomes optimal, is determined. To this end one calculates the following statistics:  True positives, TP: number of cells classified or “predicted” as RPA through RP and through C;  False positives, FP: number of cells classified as RPA though RP, but not through C;  True negatives, TN: number of cells classified as non-RPA through both RP and C;  False negatives, FN, number of cells classified a non-RPA through RP, although they are RPA as classified by C. The logic is summarized in Fig. 1. Evidently, TP þ FP þ TN þ FN ¼ total number of cells. From these numbers several statistics can be derived, in particular:  True positive rate, or “sensitivity”, also “recall”:

TPR : ¼ TP=ðobserved positivesÞ ¼ TP=ðTP þ FNÞ; and  False positive rate, or 1-“specificity”:

FPR : ¼ FP=ðobserved negativesÞ ¼ FP=ðFP þ TNÞ

TPR shall be as high, while FPR as low as possible; they cannot be chosen independently, however. Therefore one has to find a tradeoff considered optimal, after some score. (7) Plotting TPR against FPR results in a graph called ROC, or receiver operating characteristic, an example of which (for the data discussed here) is shown in Fig. 2. The shape of the ROC graph depends on the chosen criterion CRIT; points of the graph correspond to values of the threshold rp for classification according to the RP. By tuning the value rp the classification can be optimized such as to minimize misclassifications (essentially by false positives or negatives), yielding an optimum, rp0. The diagonal indicates a random association between the classifications, whereas a perfect classification would be a point at (0,1). (8) Several scores derived from the ROC have been proposed whose optimization lead to an optimal classifier rp0. The corresponding point on the curve can be considered the best observed classification, according criterion CRIT pos

neg

pos

TP

FP

neg

FN

TN

2.3. Classification by ROC analysis The classification of cells x* containing estimates RP(x*) is performed as follows. (1) In those cells where enough indoor Rn data C are available, calculate empirical statistics of C, such as arithmetical or

predicted classification, according value of rp.

Fig. 1. Classification table and definition of true and false positives and negatives (TP, FP, TN, FN).

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

Introductory information on ROC analysis and references can be found in the Wikipedia entry, http://en.wikipedia.org/wiki/ Receiver_operating_characteristic (visited 18.7.2013). An introduction with examples from medical diagnostic is given by Metz (1978).

1 d01

0.9 0.8

123

increasing value of classifier rp

0.7

3. Results

0.5 Y 0.4

CRIT1: C>100

0.3

CRIT2: prob(C>100)>0.1

0.2 random

0.1 0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR Fig. 2. ROC graph for two criteria (CRIT) of radon prone area: Blue curve: mean indoor Rn concentration in cell >100 Bq/m3 (CRIT1); pink curve: probability of exceeding 100 Bq/m3 > 10% (CRIT2). The curved dashed arrow indicates how the operation point on the graph moves as the classifier rp increases. Also shown are the geometric meanings of scores Y and d01. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

trade-off between wished high TPR and wished low FPR in the sense of the selected score. A large variety of scores is conceivable. The ones used here are:  The Y-score: Y: ¼ TPR  FPR, or the distance from the diagonal; the further away from the diagonal, the more significant the classification in the sense of being non-random.  The F-score: Fa: ¼ (1 þ a2)(precision $ sensitivity)/(a2 $ precision þ sensitivity), where precision: ¼ TP/(predicted positives) ¼ TP/(TP þ FP), and a measures the importance which is given to sensitivity relative to precision. Like Y, to be maximized.  The d01-score: introduced here ad hoc, d01: ¼ O((1  TPR)2 þ FPR2), distance from the ideal point (0,1), to be minimized. The summands could be weighted to attribute them different importance, similar to the Fa. Obviously the optimal rp0 is a function of the criterion CRIT, but also of the type of score which is used for optimization. In practice, the procedure has been implemented by a rather simple home-made Basic program (using the QB64 compiler, www. qb64.net/ e acc. 18.7.2013). Historically, the technique has been developed in the second world war in attempting to evaluate noisy radar signals (hence the name ROC), by classifying them into real or spurious events, related to real incoming enemy aircraft, or rather artefacts on the radar screen. Later the method has been applied in many other, fortunately more peaceful fields of signal detection, in particular in medical diagnostic. In this paper it is proposed to apply it to classification of grid cells into radon prone areas, for the first time to my knowledge (a first version has been presented at the 8th Fachgespräch Radon, Berlin 23 May 2013; Bossew, 2013c). A similar idea, however without embarking ROC analysis proper, has been proposed recently by García-Talavéra et al. (2013), who identify a dose rate threshold (which plays the role that the RP does here) for classifying RPAs. Quality of classification is investigated not by TPR vs. FPR, as we do, but by the complementary TP/(predicted positive) and TN/(predicted negative).

We select two criteria CRIT for defining a radon prone area: CRIT1: E[C] > 100 Bq/m3, in ground floor rooms of houses with basement (expectation over 10 km  10 km cells), and CRIT2: E[prob(C > 100 Bq/m3)] > 10%, chosen as examples only. (Which values should be chosen for regulatory or administrative purposes is a political, but not scientific decision. Here only the method shall be demonstrated.) The results are shown in blue for the first criterion in the following figures, the ones for the second criterion, in pink. The ROC graphs are shown in Fig. 2. Both curves are strongly different from the diagonal, indicating that the RP can indeed be used as classifier for defining radon prone areas, defined through the indoor concentration. (This does of course not come as a surprise, since the relationship between geogenic and indoor Rn has been known for long.) Since the blue curve, based on CRIT1 ¼ classifying the mean indoor concentration, is “more concave” than the pink curved based on CRIT2 ¼ classifying the exceedance probability, the former method is so to say “sharper”. The area under the curve (AUC), greater for the former than for the latter criterion, is a measure for the classification power of a method. A possible reason could be that the empirical probability may be a bad estimate of the true probability, for low numbers of samples in a cell. Further discussion can be found in the Technical annex. According the Y statistic, Fig. 3, the optimal classifying value rp0 of the radon potential RP would be 32 kBq/m3 for the criterion CRIT1: E[C] > 100 Bq/m3, and 22 kBq/m3 for CRIT2: E[prob(C > 100 Bq/m3)] > 10%, respectively. Coding the cells RP(x*) according to these thresholds into {coloured, grey} leads to the maps shown in Fig. 4. (Blank cells ore ones whose centre has not been geologically assigned.) The F1 statistic (sensitivity and precision given same importance) yields the same result for 0.5 0.45

CRIT1: C>100 CRIT2: prob(C>100)>0.1

0.4 0.35 0.3

Y

TPR

0.6

0.25 0.2 0.15 0.1 0.05 0 0

10

20

30

40

50

60

70

80

90

100

110

rp Fig. 3. Y-scores in dependence of the classifying threshold rp. The optimal value rp0 is found for the maximum of Y.

124

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

the first criterion, rp0 ¼ 32, but allows no conclusion for the second one. For CRIT1 and a ¼ 2 (sensitivity, i.e. rate of predicted positives of observed positives, taken twice as important as precision, i.e. rate of correctly out of all predicted positives) the optimal rp0 is again 32 and for a ¼ 0.5 (precision more important), rp0 ¼ 74. For a ¼ 1 no maximum can be found. A summary of the results is given in Table 1, for the Y score and five values of a, for the Fa scores. Staying with the results according the Y score (identical for d01), for CRIT1 and its optimal rp0 ¼ 32, the true positive and the false positive rates equal 0.66 and 0.19; for CRIT2 and its optimal rp0 ¼ 22, TRP ¼ 0.70 and FPR ¼ 0.37, respectively. This is to say that one has to live with relatively high false positive rates, or “false alarms”. A true positive rate TPR ¼ 0.70 also says that the false negative rate, or rate of cells erroneously not labelled RPA, equals 1  TPR ¼ 0.30. Different reasoning, instead of the Y statistic, gives different optimal rp0 and thus different trade-offs between TPR and FPR. Further discussion can be found in Technical annex A.3. A summary of the statistical results is given in Table 2. Several additional statistics are included:  The accuracy (ACC) equals the rate of correct predictions,

ACC : ¼ ðTP þ TNÞ=ðnumber of cellsÞ:  Cohen’s kappa (k) statistic measures the degree of agreement between raters; classification of the cells into RPA/non-RPA by observation and prediction can be interpreted as assessment by two raters who agree only partly. (Details including references in the Wikipedia entry, http://en.wikipedia.org/wiki/Cohen’s_ kappa and in http://kappa.chez-alice.fr/, both acc. 20.7.13). In a qualitative scale, k ¼ 0.79 is commonly being considered “substantial“, k ¼ 0.59, a “moderate“ agreement.

Table 1 Optimally classifying rp0 for the two criteria which define the radon prone area, and different statistics which define what means “optimal”. Symbol “e”“: no conclusion possible. Criterion

CRIT1 CRIT2

Score Y

F0.5

F0.75

F1

F1.5

F2

d01

32 22

74 30

74 22

e 22

32 18

32 e

32 22

 The Matthews correlation coefficient is derived from contingency c2 analysis and also quantifies agreement, based on the notion of the truth table as contingency table (for details and references also see the Wikipedia entry, http://en.wikipedia.org/ wiki/Matthews_correlation_coefficient). One technical problem is the following. A low number of indoor Rn data per cell implies high uncertainty of the empirical arithmetical mean (for CRIT1) as estimate of the true mean, and for the empirical exceedance probability (CRIT2) as estimate of the true one. On the other hand, selecting only cells with higher number of data reduces the number of cells available for evaluation (see last paragraph of section 2.2). Both effects are unfavourable for the ROC analysis but act in the opposite direction; therefore a compromise has to be found. For the analysis presented here cells with at least 5 indoor values were selected. Further discussion of the effect of minimum number of data per cell on the result can be found in the Technical annex. As a further caveat, selecting highly populated cells only, allowing more robust estimation of mean and exceedance probability, may introduce a bias as such cells can be expected to predominantly represent urban environments where the sampling density is often higher. Since the transfer properties geogenic / indoor Rn may be different in different environments,

Fig. 4. Maps of radon prone areas of Germany for the two criteria CRIT given in Fig. 2, and derived from optimization of the ROC graph.

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132 Table 2 Statistical parameters of the results achieved through ROC analysis. Statistic

CRIT1

CRIT2

Optimal classifier rp0 True positive rate TPR False positive rate FPR Y score Accuracy ACC Cohen’s k Matthew correlation MCC

32 0.66 0.19 0.47 0.79 0.79 0.309

22 0.70 0.37 0.33 0.65 0.59 0.313

favouring urban environments may lead to results which do not represent the “average” transfer. Geologically, RPAs in Germany are mainly situated in granitic regions, in certain regions with organic bearing and deformed sedimentary rocks or in regions with specific glacial structures such as moraines from the last ice age. 4. Discussion 4.1. Possible reasons for high misclassification rates The relatively high, but inevitable misclassification rate (Table 2) is a consequence of the relatively weak correlation between soil Rn (quantified by the RP) and indoor Rn. Remember that “indoor Rn” refers here to living rooms in ground floors in houses with basement. While the factors “floor level” and “presence of basement” are known as dominant control factors, also others contribute strongly. For example age of a house (as a proxy to insulation against the ground), attached or non-attached building, urban, sub-urban or rural environment, climate and socio-economic status of the inhabitants have influence. For Germany a significant factor is whether a house is located in former East or West Germany, as formerly systematically different building styles still prevail. However, further filtering the “calibration sample” of indoor Rn concentrations according to more control factors would have reduced its size so that the reduction of the “semantic uncertainty” would have gone on the expense of increasing estimation uncertainty due to smaller samples. Non-accounted factors result in additional variability which can furthermore have a spatial trend (climate; E-/W-Germany), and hence regionally systematic deviations from the mean transfer geogenic / indoor Rn. Therefore, as a summary, mean transfer geogenic / indoor is subject to considerable uncertainty, resulting (a) from “underspecification” of the quantity “indoor Rn” (allowing variability of the remaining non-accounted factors), which is (b), however finely specified, always subject to accidental misspecification.

The result may be compared with a RPA map based on the full analysis (left graph in Fig. 5), shown in Fig. 6, and using the same criterion for defining the RPA, prob(C > 100 Bq/m3) > 0.1. It appears that the RPA map based on the simple approach corresponds to a relatively high quantile of the probability as estimated by the full stochastic transfer model (the probability is itself a random variable with expectation, quantiles and so on). Putting it the other way, the estimated expected probability (map (a)) does not correspond to the one resulting from optimal classification in the sense of optimal Y-statistic of the ROC graph. Yet, both methods are “correct” in their respective logics. Whereas the transfer analyses assumes a linear physical transfer mechanism geogenic / indoor Rn (physically reasonably), no such assumption is made for the method presented here. 4.3. Uncertainty and validation Optimization by different scores leads to the optimal classification threshold, rp0, and at the same time to rates of erroneous classification, given in Table 2. However, these values have to be regarded as estimates; one must expect that they are affected by uncertainty originating in different sources.  Firstly, the input data are uncertain. This holds for the indoor concentrations C (measurement uncertainty, possibly semantic uncertainty, i.e. whether a value has been correctly classified as belonging to a room in ground floor of a house with basement) as well for the estimated RP assigned to cells which result from a geostatistical estimation procedure. One can expect that these uncertainties propagate into the result.  Next, these “calibration data” of indoor Rn, i.e. observations, allow estimating the true rates (see Fig. 1) only up to some uncertainty which will also propagate into the result.  Thirdly, as one reviewer has stressed, the procedure itself can give rise to uncertainty. Some aspects have been addressed in the Technical annex. In particular the trade-off between number of cells available for analysis, and uncertainty of mean (CRIT1) or exceedance probability (CRIT2) of indoor Rn within cells due to low number of data in a cell could be a sensitive issue.  Finally, the anthropogenic factors which in the causal chain lie in between geogenic RP and indoor concentration C, may themselves be subject to a geographical trend; this would imply that the classifier (rp0) is itself not geographically constant, as assumed here, but subject to a geographical trend. Quantifying these uncertainties, i.e. establishing an uncertainty budget of the result, is not possible so far. (It is small comfort that

4.2. Comparison with a different method Since the wanted result is a binary coding of the territory only (RPA/non-RPA) relatively little information contained in the original data is needed which causes the simplicity of the method proposed in this paper. It establishes a kind of optimized transfer model between classifications, instead of one between continuous quantities (geogenic RP and indoor Rn concentration). The conceptual difference of the approaches is shown in Fig. 5. The approach on the left side keeps the full information until the last step (binary coding), while the simpler one retains only the binary information from a very early step on. It is however clear that also the full information, Cin(x*) and local exceedance probability of the indoor concentration, are needed for other purposes, for example if a Rn risk map should be created (e.g. Bossew, 2013 a,b). The method may be considered as a shortcut against full-blown stochastic analysis with all its notorious methodical complications, traps and pitfalls.

125

full transfer model

this study

data: {RP} {Cindoor}

data: RP(x*) cell estimate RP(x*) cell estimate transfer model result: Cin(x*), prob(Cin(x*)>c),…

criterion for RPA

{Cindoor} ROC analysis

code {0,1} according CRIT RPA map

{RP}

criterion for RPA RPA map

Fig. 5. Two approaches to estimate radon prone areas (RPAs).

126

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

Fig. 6. RPA map based on the more complicated approach, Fig. 5, left graph; to be compared with right map of Fig. 4. (a) Expectation, (b) 95% quantile of the probability.

this shortcoming holds for the results of many complex, in particular of geostatistical procedures.) Likewise, I am not able at present to perform significance tests of the results. These tasks will require further research, probably involving extensive simulation exercises with artificially contaminated and reshuffled data. Also validation is not easy for two reasons. Firstly the traditional way of partitioning the data into a calibration and a validation did not appear viable here because otherwise not sufficient cells would be available for analysis. Secondly, validation is again affected by the problem of uncertain data due to often low cell occupation. Alternatives will have to be designed in the future. 5. Conclusions A conceptually and computationally simple method for estimating geogenic radon prone areas given a map of the radon potential has been presented. While a disadvantage, at current state of development, is that uncertainties resulting from the procedure are not easy to quantify, a favourable feature is that is allows easy tuning of importance of first vs. second kind errors (through the parameter a). As an outlook, the following will be tried in the future. Instead of RP(x*), the estimated expectation of the RP at the centre of each cell, one may use block estimates (computationally more demanding), other measures of the central tendency (GM, median), quantiles or percentiles, as local predictors. Next, different criteria CRIT for labelling a cell RPA should be studied. Then, further tools of analyzing the ROC graphs can be applied. Also significance tests for the scores should be explored. Further, the uncertainty budget of such classification still remains to be investigated. Also validation is still an open issue. We also want to see what the result would be if the analysis is not based on the RP but on geochemical quantities such as eU (equivalent

uranium concentration, assessed by ground based or in situ or aerogamma spectrometry) or external terrestrial dose rate; data bases of both quantities are available in Germany but have not yet been explored for their potential to predict the Rn hazard. (Dose rate as predictor is used for assessing RPAs in Spain, but partly through a different method, essentially a type of transfer model; García-Talavera et al., 2013; also see the remarks in section 2.3.) Furthermore, geological units themselves may serve as predictors; this appears however mathematically a bit more complicated since “geology” is by its nature no continuous, but a categorical non-ordinal variable. As an overall caveat, one must keep in mind that any estimate of radon prone areas out of predictor quantities (the RP in this case) depends also on methodology, in particular on the type of score which is used to optimize the classification. As shown on this example, optimal classification of cells may lead to a different result from coding estimated response quantities (mean indoor concentration, exceedance probability) even if the criterion is the same (here the following criteria used as examples: CRIT1: C > 100 Bq/m3 or CRIT2: prob(C > 100 Bq/m3) > 0.1). This means that estimated RPAs, to be interpretable, must always be accompanied by information not only about the criterion but also about the estimation method. This may appear a disadvantage as it seems that there is no “natural” score to optimize the classification. But that situation could be compared with the freedom in choosing a loss or cost function in certain optimization algorithms. In any case, this should be discussed before using this e otherwise intriguingly simple e method more widely. Technical annex In the annex issues of sensitivity against variation of parameters are further discussed. It seems that such effects are not trivial and should be considered to avoid mis-interpretation of results. At

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

current state of development of the method, it is difficult to give concrete guidelines for a best practice, as the reviewers have suggested adding. Qualitatively, it consists in being aware that certain complications exist, considering them in specific cases, and try to find reasonable compromises. I would like to stress again that the complications are not a consequence of the method, but of the data realm, in particular relatively low correlation geogenic e indoor Rn, partly low spatial data density and possibly of heterogeneity of environments to which differently populated cells belong. However

127

associated to such poor estimate cancels over many cells (a consequence of the central limit theorem, as the arithmetic mean of samples is an unbiased estimator of the true mean). It follows, that for low m0, on the one hand, due to inclusion of unreliable observed means the classification power is lower (lower AUC), but since more cells are available, on the other hand, interpretation of the resulting ROC graph is more reliable. How the effect of low numbers of observations translates into uncertainty of classification is further discussed in section A.4.

1

0.9

0.8

0.7

TPR

0.6

m0=1 m0=2 m0=3 m0=5 m0=7 m0=10 m0=12 m0=15 m0=17 m0=20 m0=25 m0=30

0.5

0.4

0.3

0.2

0.1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FPR

Fig. A-1. ROC graphs for CRIT1: C > 100 Bq/m3, calculated for different minimal numbers (m0) of observations per cell.

the method has to be optimized to cope with this reality; but this holds for any method which aims at modelling based on imperfect data. A.1. Sensitivity against observation density

0.95 AUC_C AUC_p

0.9 0.85

AUC

The observed data are samples within 10 km  10 km grid cells. The density of German indoor data is very variable, with numbers of observations per cell between zero and hundreds. Some figures have been given at the end of section 2.2. The fewer cells, the more problematic the interpretation of the respective ROC graphs, shown in Fig. A-1 for CRIT1. For lower minimum occupation of cells (m0), the curves are smoother and easier to interpret. On the other hand, their area under the curve (AUC), and hence the classification power is lower if inclusion of scarcely occupied cells (lower m0) is allowed. Note that already slight variation of m0 results in significant changes of the ROC graph. Fig. A-2 shows the AUC in dependence of m0, for CRIT1 and CRIT2. This may be interpreted as follows. The GSD (geometric standard deviation) of indoor concentration within cells equals typically about 2 or somewhat less (Tollefsen et al., 2011 report 1.85 as median GSD over Europe). This corresponds a coefficient of variation CV ¼ O(exp((ln GSD)2)  1) ¼ 79% for GSD ¼ 2, under the LN (lognormal) assumption. This indicates that one observation only is a poor estimate of the cell mean E[C]. On the other hand, the error

1

0.8 0.75 0.7 0.65 0.6 0

5

10

15

20

25

30

m0 Fig. A-2. Area under curve (AUC) as measure of classification power, in dependence of minimum number of observations, m0, per cell.

128

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

low m0 a robust conclusion can be drawn from Y and d01, much less so from MCC. Theoretically, the optimal rp0 should of course be independent of the occupation density of cells and hence of m0.

Fig. A-3 shows the Y-, MCC- and d01 scores in dependence of the classifier rp, for CRIT1. The optimal value rp0 is the one corresponding to the maximum (Y, MCC) or minimum (d01) of the curves. It is evident that only for the ROC curves corresponding to

0.8

m0=1 m0=2 m0=3 m0=5 m0=7 m0=10 m0=12 m0=15 m0=17 m0=20 m0=25 m0=30

0.7

0.6

Y

0.5

0.4

0.3

0.2

0.1

0 0

10

20

30

40

50

60

70

80

90

100

110

60

70

80

90

100

110

rp 0.45 m0=1 m0=2 m0=3 m0=5 m0=7 m0=10 m0=12 m0=15 m0=17 m0=20 m0=25 m0=30

0.4

0.35

0.3

MCC

0.25

0.2

0.15

0.1

0.05

0 0

10

20

30

40

50 rp

Fig. A-3. Scores of the ROC graphs, CRIT1, in dependence of the value rp of the classifier. From top: Y-score, MCC-score and d01-score.

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

129

1

0.9

0.8 0.7

d01

0.6

m0=1 m0=2 m0=3 m0=5 m0=7 m0=10 m0=12 m0=15 m0=17 m0=20 m0=25 m0=30

0.5

0.4

0.3 0.2

0.1

0 0

10

20

30

40

50

60

70

80

90

100

110

rp

Fig. A-3. (continued).

From Fig. A-4 we recognize that there is indeed dependence, although difficult to interpret. For CRIT1, the most robust scores appear Y and d01, which is the reason that in the main text a rp0 ¼ 32, resulting from m0 ¼ 5 was chosen as a compromise.

areas. Filtering the data for the respective type of environment and repeating the analysis may give a clue; however this would mean further reducing the number of cases, rendering the result even more uncertain. 23.5

120

C-F1 C-Y C-d01 C-MCC

100 80

p-F1 p-Y p-d01 p-MCC

23

rp0

rp0

22.5

60

22

40 21.5

20 0

21

0

5

10

15

20

25

30

m0

0

5

10

15

20

25

30

m0

Fig. A-4. Optimal value of the classifier, rp0, in dependence of the minimal number of observations per cell, m0. Left: CRIT1, right: CRIT2.

We are not able at the moment to present an uncertainty budget for the estimated rp0. However the effect discussed here appears to be one contributor to its uncertainty: the available observations of indoor Rn allow identification of an optimal classifier rp0, and as a consequence of the resulting pattern of the radon prone area RPA which is a function of rp0, only up to uncertainty. An additional, so far not quantified effect may be that higher m0 indicate cells preferentially over towns or cities. There the relation between radon potential and indoor Rn concentration can be assumed to be different than in predominantly rural or sub-urban

A.2. Effect of the defining threshold One would expect that the threshold which defines the radon prone area via CRIT, namely c0 in E[C > c0] or p0 in E[C > 100 Bq/m3] > p0 e for which we had previously set c0 ¼ 100 Bq/m3 and p0 ¼ 0.1 e influences the classifier rp0 which results from the classification procedure. For the threshold c0 the dependence is shown in Fig. A-5. In fact the classifier rp0 increases with c0 in tendency, however not really smoothly and linearly as one would expect. Also there is again a strong dependence on the minimum number of observations per

130

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

cell, m0 (shown only for m0 ¼ 1, 5 and 10 in the graph), in particular for high thresholds c0.

c0 ¼ 100e150 Bq/m3 which may indicate that these values are good choices for defining the criterion CRIT1.

120

100

m0=1; rp0_Y m0=1; rp0_d01 m0=5; rp0_Y m0=5; rp0_d01 m0=10; rp0_Y m0=10; rp0_d01

optimal rp0

80

60

40

20

0 0

50

100

150

200

250

300

350

400

threshold c0 (Bq/m³) Fig. A-5. Dependence of the optimal classifier rp0 on the threshold c0 which defines the criterion for the RPA. “m0 ¼ 1; rp0_Y” denotes the optimal rp0 which results from maximizing Y, for a minimum of m0 ¼ 1 data per cell, and so on.

Without going into detail of analyzing the respective, individual ROC curves, one can understand that again the low number of cases e i.e. number of cells with high mean Rn concentration C, and even less if restricted to densely occupied cells (high m0) e is responsible for this somewhat erratic behaviour.

The case has yet a different pitfall for CRIT2: E[prob(C > 100 Bq/m3)] > p0. The empirical estimate of this quantity based on observations requires a relatively high number of observations; otherwise quantiles cannot be estimated reliably. For example, in a cell with only one observation, the possible empirical probabilities

1 0.9 0.8 0.7

AUC

score

0.6

m0=1; AUC m0=1; Y m0=1; d01 m0=5; AUC m0=5; Y m0=5; d01 m0=10; AUC m0=10; Y m0=10; d01

0.5 0.4

Y

0.3

d01

0.2 0.1 0 0

50

100

150

200

250

300

350

400

threshold c0 (Bq/m³) Fig. A-6. Dependence of the scores which quantify the classification power on the threshold c0 which defines the criterion for the RPA.

Fig. A-6 shows the scores associated to the rp0 resulting from maximization (Y)/minimization (d01) of the scores, as well as the AUC as measure of the classification power. The quality appears to increase with threshold rp0, but a close look to the ROC graphs shows that this is again an effect of the increasingly coarse shape of the graphs; the same effect as in Fig. A-1 for the effect of increasing m0. The curves remain relatively stable in the range around

are 0 or 1 only, whether the value is above or below 100 Bq/m3. For two observations, prob ¼ 0, 0.5 and 1 are possible, and so on. This means that for a cell with 2 observations and empirical probability equalling, say, 0.5, criteria prob(C > 100) > 0.1 are fulfilled, as are prob > 0.05 or prob > 0.3, and so on. It follows that one cannot expect much sensitivity against p0, and the resulting RPA may be much the same for criteria E[prob] > 0.05, 0.1 or 0.3,

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

131

that more cells are flagged radon prone area, than with lower a, or more relative importance of avoiding 1. kind errors, or false alarms.

which is of course counter-intuitive because these criteria represent very different thresholds for defining an RPA, and is therefore not satisfying. One would therefore have to rely on densely occupied cells (high m0) only, for which only few cells are available for evaluation and the ROC graphs are unreliable and difficult to interpret. As a conclusion, given the currently available data e in particular their strong clustering which results in having few cells with high, but many with low occupancy e a classification based approach should not be based on the exceedance probability as criterion of RPA, but preferably on the mean concentration itself.

100

rp0

m0=1 m0=2 m0=5 m0=10

A.3. Relative importance of false negatives and positives The optimal classifier rp0 is detewrmined from maximizing or minimizing scores which have been, for this analysis, defined such as to attribute same importance to minimizing the false positive rate FPR (first kind errors) and maximizing the true positive rate TPR (or minimizing the false negative rate 1  TPR, second kind errors). One may argue that committing a 1. kind error, that is, allowing a false alarm by flagging a cell radon prone although it is not, according to the observations, is less relevant than a 2. kind error, i.e. declaring a cell non-radon prone while it is according to the observations. Remember, though, that the observations can be very uncertain estimates of the cell mean, and the flag of a cell (RPA y/n) based on observations itself very uncertain, as discussed above and further in section A.4. In the end this is of course a political decision. We shall shortly explore the effect of different importance of FPR vs. 1  TPR, for one example: CRIT1: E[C > 100 Bq/m3] and several values of the minimum number of observations per cell (m0). The score to be maximized is a simple modification of Y:

10 0.1

1

10

alpha Fig. A-7. Optimal classifier rp0, in dependence of the weight a given to first, relative to second kind errors; for 4 different minimal numbers of observations (m0) per cell. For CRIT1: EC > 100 Bq/m3.

Again there is dependence on the minimum number of observations allowed, for the same reasons as already discussed. A.4. A simulation exercise

Ya : ¼ að1  TPRÞ  FPR;

To better understand the uncertainty of classification based on a finite number of observations, perform the following simulation

In Ya the 2. kind (false negative) error is weighted a times as serious as the 1. kind error. The result, shown in Fig. A-7 for the range a ¼ 0.3 to 3, is according to what one would expect: higher importance of avoiding false negatives (2. kind error), by high values of a, means lower classifier rp0: this has the consequence

1. Take n independent random samples Cn: ¼ {c1,..,cn} from C w LN(m,s), with m ¼ 0 (without loss of generality) and s ¼ ln(GSD) ¼ ln 2. The true expectation equals EC ¼ GM exp(s2/2). with GM ¼ exp(m) the geometrical mean. Here GM ¼ 1 and EC ¼ 1.2715.

which for a ¼ 1 equals the Y score as above:

1

n=1 n=2 n=5 n=10 n=20 n=50 n=100 n=1, asympt. n=10, asympt.

0.9

prob(sample AM > threshold)

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2

2.2

relative threshold Fig. A-8. Probability that the sample mean is classified above a threshold, in dependence of the threshold (multiples of the true mean), under LN(0,ln(2)) assumption; for different sample sizes. The asymptotical curves are according to the central limit theorem.

132

P. Bossew / Journal of Environmental Radioactivity 129 (2014) 121e132

2. Calculate the empirical mean of Cn, AM{c1,.,cn} ¼ : AMn. Due to the central limit theorem AMn is an unbiased estimator of EC. 3. Decide whether AMn exceeds a threshold c. If so, set zn ¼ 1, otherwise 0. 4. Repeat this many times (here 10,000 times) and calculate the “ergodic” mean of the random variable Zn of which zn is a realization, over simulations u, AMu(zn). This should be 0 if EC < c and 1 if EC > c. Asymptotically with increasing n, AMn w N(EC, SDCn) by virtue of the central limit theorem, with SDCn ¼ SDC/On and SDC the true standard deviation, SDC ¼ EC O(exp(s2)  1), so that Zn y F((EC  c)/SDn) asymptotically, with F the standard normal. The probabilities Zn are shown in Fig. A-8 in dependence of a relative threshold, defined as multiples of the true mean, i.e. in units a: ¼ c/EC, thus preserving generality in choosing m ¼ 0. In addition the asymptotical curves for n ¼ 1 and n ¼ 10 are shown. Of course the asymptotical curves approach the real ones with increasing n, but for low sample numbers the difference is important. We observe in particular: For a ¼ 1, or c ¼ EC, the probability should equal 0.5 which it does not. The bias increases with decreasing sample number. This means in practice, that with n ¼ 1 observation and the threshold c equalling the true mean, or a ¼ 1, the probability is 37% that the sample mean (one value in this case) is classified as above the threshold. For a threshold twice the true mean (a ¼ 2), still with 8.8% probability the sample mean would be flagged as above threshold. This is the 2. kind error connected to classification with one observation only. On the other hand, for a ¼ 0.5 and one sample, or the threshold half the true mean, the sample mean would be classified as above the threshold with 75% probability, or a 1. kind error of 25% of erroneously not flagging as above the threshold. These errors decrease obviously with increasing sample number. The conclusion of this exercise is that cells flagged as radon prone area through a small number of observations, are subject to a

substantial mis-classification probability. This may help explain the lower classification power (lower AUC) if cells with few observations (small m0) are included in the ROC analysis. References Bossew, P., 2013a. Stochastic dependence of Rn-related quantities. Romanian Journal of Physics 68 (Suppl.), S44 (accessed 18.07.13). www.nipne.ro/rjp/2013_58_ Suppl.html. Bossew, P., 2013b. A radon risk map of Germany based on the geogenic radon potential. In: Pardo-Igúzquiza, E., et al. (Eds.), Mathematics of Planet Earth, Lecture Notes in Earth System Sciences. Springer, pp. 527e531. Pres., IAMG (Intl. Ass. Math. Geosciences) 2013, Madrid 2e6 Sept 2013. http://dx.doi.org/10.1007/9783-642-32408-6_115. Bossew, P., 2013c. Wer fürchtet sich vorm radon prone area?. In: Presentation, Fachgespräch Radon, BfS Berlin, 22.-23.5.2013 (in German; unpublished). Cinelli, G., Tondeur, F., Dehandschutter, B., 2011. Development of an indoor radon risk map of the Walloon region of Belgium, integrating geological information. Environmental Earth Sciences 62 (4), 809e819. http://dx.doi.org/10.1007/ s12665-010-0568-5. Friedmann, H., 2005. Final results of the Austrian radon project. Health Physics 89 (4), 339e348. García-Talavera, M., García-Pérez, A., Rey, C., Ramos, L., 2013. Mapping radon-prone areas using g-radiation dose rate and geological information. Journal of Radiological Protection 33, 605e620. http://dx.doi.org/10.1088/0952-4746/33/ 3/605. Kemski, J., Klingel, R., Siehl, A., Stegemann, R., Valdiva-Manchego, M., 2002. Transferfunktion für die Radonkonzentration in der Bodenluft und Wohnraumluft. Abschlussbericht zu den Forschungsvorhaben St. Sch. 4186 und St. Sch. 4187: Ermittlung einer Transferfunktion für die Radonkonzentration in der Bodenluft und der Wohnraumluft incl. Radonmessungen in Häusern zur Validierung des geologisch induzierten Radonpotenzials. Teil A: Bodenuntersuchungen zum geogenen Radonpotenzial. Teil B: Validierung der geologischen Prognose durch Messungen der Radonkonzentration in Gebäuden. Schriftenreihe Reaktorsicherheit und Strahlenschutz, BMU-2002-598. e Excerpt. measurement protocol soil air. www.kemski-bonn.de/downloads/ MessanleitungBodenluft_Web.pdf (accessed 29.06.12). Metz, C.E., 1978. Basic principles of ROC analysis. Seminars in Nuclear Medicine VIII (4), 283e298. http://www.ncbi.nlm.nih.gov/pubmed/112681 (accessed 18.07.13). Neznal, M., Neznal, M., Matolin, M., Barnet, I., Miksova, J., 2004. The New Method for Assessing the Radon Risk of Building Sites. Czech Geol. Survey Special Papers, 16. Czech Geol. Survey, Prague, p. 47. http://www.radon-vos.cz/pdf/ metodika.pdf (accessed 29.03.13). Tollefsen, T., Gruber, V., Bossew, P., De Cort, M., 2011. Status of the European indoor radon map. Radiation Protection Dosimetry 145 (2e3), 110e116.

Determination of radon prone areas by optimized binary classification.

Geogenic radon prone areas are regions in which for natural reasons elevated indoor radon concentrations must be expected. Their identification is par...
2MB Sizes 1 Downloads 0 Views