Environmental Toxicology and Chemistry, Vol. 33, No. 10, pp. 2399–2406, 2014 # 2014 SETAC Printed in the USA

SELECTING THE BEST DESIGN FOR NONSTANDARD TOXICOLOGY EXPERIMENTS JENNIFER M. WEBB, BYRAN J. SMUCKER, and A. JOHN BAILER* Department of Statistics, Miami University, Oxford, Ohio, USA (Submitted 20 January 2014; Returned for Revision 5 March 2014; Accepted 16 June 2014) Abstract: Although many experiments in environmental toxicology use standard statistical experimental designs, there are situations that arise where no such standard design is natural or applicable because of logistical constraints. For example, the layout of a laboratory may suggest that each shelf serve as a block, with the number of experimental units per shelf either greater than or less than the number of treatments in a way that precludes the use of a typical block design. In such cases, an effective and powerful alternative is to employ optimal experimental design principles, a strategy that produces designs with precise statistical estimates. Here, a D-optimal design was generated for an experiment in environmental toxicology that has 2 factors, 16 treatments, and constraints similar to those described above. After initial consideration of a randomized complete block design and an intuitive cyclic design, it was decided to compare a D-optimal design and a slightly more complicated version of the cyclic design. Simulations were conducted generating random responses under a variety of scenarios that reflect conditions motivated by a similar toxicology study, and the designs were evaluated via D-efficiency as well as by a power analysis. The cyclic design performed well compared to the D-optimal design. Environ Toxicol Chem 2014;33:2399–2406. # 2014 SETAC Keywords: Ecotoxicology

Dose–response modeling

Effects-based monitoring

D-optimal design

assigned. In some block-treatment configurations, this leads to incomplete block designs. This issue can also be addressed using the techniques described in the present study; however, for the remainder of this article, we focus on the situation where a block is able to hold more experimental units than the number of specified treatments. This situation presents an interesting experimental design problem. In particular, how should we assign 16 treatments to 324 experimental units that are arranged in 12 blocks of size 27? A completely randomized design is inappropriate because of the block structure imposed by the shelves. A randomized complete block design could be considered if we only wanted to use 16 units per shelf. However, this clearly wastes an opportunity to collect more data, because there is a capacity of 27 experimental units on each shelf but only 16 treatments, and the limiting factor is not the overall number of tadpoles but the physical restrictions of the experimental environment. Balanced or partially balanced incomplete block designs are more promising, except that they are typically employed when the number of treatments exceeds the size of the blocks. In this case, the number of experimental units per block is larger than the number of treatments, but not in a way that divides evenly. One might even consider splitting the design into 2 parts, but this reduces the power of the experiment to detect differences in treatments, and would still involve designing an experiment for which the size of the blocks does not nicely divide the number of treatments. An intuitive solution would involve assigning the treatments to blocks in a cyclic way. The statistical performance of this design is unclear, but we explore such an approach later in the present study. We propose D-optimal designs as a solution to this problem. These designs are based on a well-known, flexible design construction criterion developed in the statistics literature [1,2] that attempts to choose a design that minimizes the determinant of the variance–covariance matrix of the regression parameter estimates. In other words, such a design estimates these parameters precisely. To obtain this design, the experimenter must specify the nature of the predictors (i.e., whether they are

INTRODUCTION

Often in aquatic toxicology tests, the size and design of experiments are constrained by limitations in the amount of physical space available. But what if the availability of space exceeds the needs of an experiment? How can this excess space be used efficiently to enhance the accompanying statistical analysis? Alternatively, what if the available space is less than the amount needed to employ standard statistical experimental designs? A strategy for addressing questions such as these is described in the present study. An experiment was being designed by colleagues with a context that did not easily succumb to a standard experimental design (S. Rumschlag and M. Boone, personal communication). In that study, the physical layout of the laboratory imposed constraints on the experiment (Figure 1). Mass at metamorphosis of a tadpole was the response and there were 16 treatments to be replicated: the presence of a pesticide at a particular concentration (present/absent—2 levels) crossed with timing of pathogen exposure (8 levels). The experiment was conducted in a room containing 3 racks, each rack containing 4 shelves, each shelf accommodating up to 27 beakers, and each beaker housing a single tadpole. Shelf was a natural blocking factor because there was concern that light levels, proximity to the door, and/or other characteristics that differ between shelves might impact the response (S. Rumschlag and M. Boone, personal communication). Therefore, space for 324 (¼ 3 racks  4 shelves/rack  27 beakers/shelf) experimental units was available, which allowed for approximately 20 (324/16) replicates of each treatment. A complementary problem would be the situation where there was room for fewer than the number of treatments on each shelf, say only 10 beakers; however, there were 16 treatments to be All Supplemental Data may be found in the online version of this article. * Address correspondence to [email protected] Published online 18 June 2014 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/etc.2671 2399

2400

Environ Toxicol Chem 33, 2014

J.M. Webb et al.

designs. Subsequently, we describe D-optimal designs along with other design possibilities for the toxicology experiment. Finally, we describe a simulation experiment that was conducted to compare alternative designs. Statistical model for analyzing the impact of treatments on mass at metamorphosis

Figure 1. Laboratory setup for tadpole study: 3 racks with 4 shelves, each shelf large enough to accommodate 27 beakers.

continuous or categorical, and if categorical how many levels they possess), the form of the statistical model to be fit (e.g., which predictors will appear in the model, as well as interactions), the number of observations, and the blocking structure of the design. The designs can then be constructed, via algorithm, in common commercial software. To demonstrate the flexibility of this approach, consider again the motivating experiment that had 16 treatments but blocks of size 27. The D-optimal design, and indeed any design for this experiment, will indicate which treatments should be included in each block. The D-optimal approach, however, works for any treatment/block size combination. For instance, instead of 16 treatments and 12 blocks of size 27, we might, in a different situation in which resources are scarce, only have 6 blocks of size 11. The possible situations that preclude the routine use of standard blocking designs are endless, but the Doptimal approach would provide designs with good statistical properties regardless. Optimal designs are often applied in industrial statistics settings, as alternatives to classical response surface designs, where process or product design and improvement are the objectives. Most of the applications of optimal design to the chemical and biological sciences have been in the context of nonlinear regression models (environmental toxicology [3], systems biology [4–6], microbiology [7], and environmental engineering [8]), although chemometrics applications [9,10] use it in the context of linear regression. More on optimal design can be found in the aforementioned books by Goos and Jones [1] and Atkinson et al. [2]; see also a description in an environmentrics context by Fedorov [11]. In the present study, we describe in more detail this optimal approach to experimental design. Specifically, we define and discuss the benefits of D-optimal designs and show how the Dcriterion can be used to compare designs. Finally, we conduct a small computer simulation study to compare the D-optimal design with an alternative design in terms of power to detect effects of interest.

The mass at metamorphosis of tadpoles may be affected by the presence of a pesticide. Furthermore, tadpoles may be simultaneously exposed to a pathogen at a specific time prior to or at metamorphosis. Each combination of pesticide concentration and timing of pathogen exposure defines a unique treatment that might impact the biological response, mass at metamorphosis. Evaluating the effects of these 2 factors (as well as their interaction) on a biological endpoint is the ultimate goal of such a study. Linear statistical models are a standard tool for detecting differences among treatments. In this case, we wish to evaluate the effect of a pesticide and timing of pathogen introduction on mass at metamorphosis. If we assume no interaction between the 2 factors, an appropriate model is as follows: Massijk ¼ m þ Pesticidei þ Timingj þ Blockk þ eijk

ð1Þ

where eijk  Nð0; s 2e Þ and Massijk is the response for tadpole k exposed to pesticide level i for pathogen timing level j; m is the overall mean; Pesticidei is the pesticide effect for i ¼ present, absent; Timingj is the timing of pathogen exposure effect for j ¼1 (¼ no pathogen), 2, 3, 4, 5, 6, 7, 8 (¼ pathogen introduced early); Blockk is the block/shelf effect for k ¼ 1, 2, …, 12; and eijk is the random error term The response variable, Mass, is measured at the metamorphosis of the tadpole. As an aside, when estimating the model some restrictions need to be imposed; for example, Pesticideabsent ¼ Timing1¼ Block1 ¼ 0. Then the overall mean, m, represents the mean mass at metamorphosis for a tadpole that was not exposed to a pesticide, was not exposed to a pathogen, and was in a beaker located in the first block. Therefore, all effects are interpreted in relation to these reference levels of the 2 factors and the block. The various times of pathogen introduction prior to metamorphosis are labeled 1 to 8, where 1 signifies no pathogen was introduced, and 2 is the time of introduction where these tadpoles have been exposed to the pathogen for the shortest amount of time. The random error term, eijk, is assumed to be independently normally distributed with mean 0, and constant variance s 2e . The second model is an extension of the first, this time assuming that an interaction effect exists between the factors. The model is as follows: Massijk ¼ m þ Pesticidei þ Timingj þ Blockk þ Pesticidei  Timingj þ eijk

ð2Þ

where the Pesticidei  Timingj term represents the interaction effect between pesticide and timing levels. The lm() function in R [12] was used to fit these models using data simulated as explained below.

METHODS

In this section, we first add some additional background for the motivating example in environmental toxicology, and describe the statistical model of interest. We then relate the model to more general regression models, which are described in preparation for the description of the D-criterion and D-optimal

Relating the statistical model to a design

We note that models 1 and 2 can be reformulated as regression models in matrix form. We present this version of the model because it is necessary for the subsequent explanation of D-optimal design. However, we give more details in

Selecting the best design for nonstandard toxicology experiments

Environ Toxicol Chem 33, 2014

Supplemental Data, Section 1; see also Kutner et al. [13] for additional background. The general form of the model is Y ¼ Xb þ e

ð3Þ

where Y is a vector of responses with n rows and 1 column (i.e., an n  1 vector), n is the total number of observations, b is a p  1 vector of model parameters, X is an n  p model matrix, and e is an n  1 vector of error terms such that e  N (0, s2 I), with 0 representing the n  1 vector of 0’s and I an n  n identity matrix. The matrix X is also called the expanded design matrix and a row in this matrix corresponds to a replicate of a unique treatment-block combination. The optimal design problem can be framed as: Which rows should be included in the design matrix? Using ordinary least squares to estimate b gives a vector ^ ¼ ðX T XÞ1 X T Y. This formulation is particularly of estimates b useful in designing experiments because the variance–covari^ is s2(XTX)1, and we call XTX the information ance matrix of b matrix. Minimizing the generalized variance of these estimators can be achieved by minimizing the determinant of the variance– covariance matrix of the estimators. This is equivalent to maximizing the determinant of the information matrix. Rows of the design matrix X reflect a replicate of a particular unique combination of treatments (pesticide, timing of pathogen) and block (shelf), and the design implemented by the experimenter determines the rows of this matrix. This is a key observation because the selection of the rows (conditions in the experiment) can be chosen to minimize the determinant of the ^ (i.e., the D-criterion), which variance–covariance matrix of b results in relatively small variances for estimates of the model parameters. We will revisit this in the subsequent section discussing experiment designs. More details regarding how model 3 is equivalent to models 1 and 2 are given in Supplemental Data, Section 1. Next, we have a slightly more complicated situation, because we wish to make inference on just the pesticide and timing factors, while the blocks exist in the model but inference on them is unimportant. Because this is the case, we split the matrix X into 2 smaller matrices: X1 is n  p1 and includes the columns in X corresponding to the intercept and pesticide/timing factors, and X2 is n  p2 and corresponds to the columns in X associated with the blocks (p1 is the number of parameters associated with the intercept and treatment factors, and p2 is the number of parameters associated with blocks; see Supplemental Data, Section 1). Then, X can be partitioned as X ¼ [X1|X2] where the vertical bar | implies the concatenation of 2 matrices and in a similar way, b ¼ [b1|b2] where b1 is a p1-vector and b2 is a p2-vector. In our experiment, the total number of parameters for the first-order model is p ¼ 20 (1 for the intercept b0, 1 for the pesticide factor, 7 for the timing factor, and 11 for the blocks), so that r1 ¼ 9 and r2 ¼ 11 (again, see Supplemental Data, Section 1). Given the above development, we can rewrite model 3 as Y ¼ X 1 b1 þ X 2 b2 þ e

ð4Þ

As noted previously, the goal of the experiment is to detect treatment effects, and these effects are associated with the parameters in b1 Then, the variance–covariance matrix for b1 is V b^1 ¼ s 2 ½X T1 X 1  X T1 X 2 ðX T2 X 2 Þ1 X T2 X 1 1 as shown in Cook and Nachtsheim [14].

ð5Þ

2401

We reiterate that this is relevant because in the next section we discuss how a design can be chosen to make this variance– covariance matrix small by minimizing its determinant, and thus increasing the precision of our parameter estimates. Experimental designs

We considered several candidate designs for this experimental situation. First, a randomized complete block design was evaluated where only the first 16 beakers on each shelf were utilized. In this type of design, every block contains the same number of treatments, which means that each treatment would be randomly assigned once per shelf. Tailoring the experiment to this design has a significant drawback: there are 11 unutilized spaces for beakers per shelf (44 per rack). The additional data would increase the power to detect biologically significant effects, and so assuming that cost is not a limiting factor, it is desirable to use the extra spaces if possible. Second, we considered the so-called cyclic designs. These represent designs that experimenters might construct intuitively. First, each of the 16 treatments was assigned once per shelf, leaving 11 experimental units to be filled. The treatments were then cycled through the remaining 11 beakers on each shelf, totaling 27 treatments for each block. The first eight treatments consist of all timing levels where pesticide is absent (labeled 1–8), and the next 8 where pesticide is present (labeled 9–16). An initial cyclic design was considered where treatments 1 to 11 filled the leftover 11 beakers in the first block. For the second block, we began with treatments 12 to 16 and started over with treatment 1 again (therefore, treatments 12–16 and 1–6 would be assigned here). Because this approach resulted in 1 pesticide level (absent or present) appearing many more times per block than the other, a more balanced cyclic design was considered, where each of the 2 pesticide levels appears only 13 or 14 times per block. A detailed description of this design’s construction is in Supplemental Data, Section 2. The last 11 beakers on the first rack of 4 shelves for the balanced cyclic design can be viewed in Table 1. Note that once the treatments have been assigned to a block, they must be randomized within that block before the experiment is conducted. In subsequent simulation comparisons, we used the balanced cyclic design, although results for the cyclic design were not much different. The last class of designs we considered was D-optimal. In contrast to traditional designs, which are typically selected from a catalog and are appropriate for particular experimental scenarios, these designs are flexibly constructed to reflect particularities in the available resources, constraints, and context for conducting the experiment. These designs require that a criterion be specified that reflects the goal of the experiment. For instance, if the goal is to detect and interpret treatment effects, the D-criterion might be used because it will produce designs with relatively precise parameter estimates, which translates into more power for hypothesis tests of interest [1,2]. Typically, these designs are used to produce models with precise estimates of all regression parameters. However, in the present case our goal was to detect significant pesticide and timing effects, while also accommodating block effects in the model. Then, a slightly more complicated version of the Dcriterion, the Ds-criterion [2], can be used to construct designs that focus only on the factors of interest and not the blocks. In this case, the design would be chosen so that the determinant of the variance–covariance matrix of b1 is minimized. Recall that b1 only contains the parameter estimates associated with pesticide and timing effects and that the variance–covariance

2402

Environ Toxicol Chem 33, 2014

J.M. Webb et al.

Table 1. Treatment assignment for the last 11 experimental units on the first rack of shelves for the balanced cyclic designa Treatments Shelf Shelf Shelf Shelf

1 2 3 4

1 1 1 2

2 2 4 3

3 3 5 4

4 7 6 5

5 8 7 6

6 9 8 9

9 12 10 10

10 13 11 11

11 14 12 12

15 15 13 15

16 16 14 16

a The total number of experimental units on each shelf is 27, and in each shelf, in addition to the 11 experimental units shown, 1 replication of each of the 16 treatments is included.

matrix of b1 is V b^ 1 , given in Equation 5. Thus, the Ds-criterion is     V b^ 1  where |A| denotes the determinant of the matrix A and we         wish to minimize it (or equivalently, maximize M b^ 1  ¼ V 1 ^ , b 1

the determinant of the information matrix  for b1). In the end, it   turns out that the design minimizingV b^ 1  is the same as

the 1 maximizing the information matrix, |XTX/s2| of the original model (3) [15]. Thus, from here on out, we will refer to D-optimal designs and not Ds-optimal designs, because of this equivalence. Although the preceding discussion is somewhat technical, we have retained it because of its ramifications on the efficiency measures described below. D-optimal designs can only be generated with a prestated model form, which is defined by the columns of the X matrix. Each column in this matrix corresponds to a parameter in the model. Thus, there will be a certain set of columns associated with treatment effects, other columns associated with block effects, and other columns associated with interaction effects. Because the researcher in general is unsure in advance what the true model form is (e.g., whether interaction terms are active or not), 2 D-optimal designs were considered, both including blocks: 1 assuming no interaction of pesticide and pathogen exposure timing and the other assuming that an interaction exists. Ultimately, we chose to present results only for the Doptimal design constructed assuming interaction. We did this to simplify the output and discussion, and because the results for the 2 designs were not strikingly different. Furthermore, in the face of model uncertainty, it is typically prudent to assume a larger model as long as there are adequate resources to estimate this model. The treatment assignment for the first rack of shelves in the D-optimal design assuming interaction is shown in Table 2 and the full design is represented in Table 4. We constructed the D-optimal design in JMP version 10 (SAS Institute), which uses the coordinate-exchange algorithm [15]. This heuristic optimization procedure starts with a random initial design in which levels for pesticide and timing for each experimental unit in every block are assigned. Then, changes are made to these assignments, over and over again, as long as there is improvement in the criterion. The result is a locally D-optimal design; that is, the design cannot be further

improved given the algorithm and the particular random start. This process can be repeated with many random starts, with the criterion of the resulting designs compared and the best design chosen. The D-optimal designs in the present study were generated with 2 random starts, but others generated with 100 random starts were essentially equivalent with respect to the criterion value. Supplemental Data, Section 4, provides an outline of the steps followed to produce an optimal design in JMP. Relative D-efficiency is a measure of the relative quality of 2 designs, in terms of the D-criterion. Even though the cyclic design and the randomized complete block design were not generated as D-optimal designs, the appropriate information matrices can be constructed and the designs compared. Relative D-efficiency for design 1 with respect to design 2 for b1 is computed as follows: 0   11=p    M b^ 1  B  1 C  Rel D-Eff ¼ @ A  M b^ 1   where

 M b^ 1

1

 and

2

 M b^ 1

ð6Þ

2

are the information matrices for

designs 1 and 2, respectively, and p is the number of elements in b1. If the relative D-efficiency is less than one, then design two is the more efficient design. (Technically, this measure might be better termed relative Ds-efficiency because it is with respect to the parameters in b1, not all parameters in the model; however, for the sake of simplicity we will overlook this slight abuse of notation.) Although we later compare the designs via simulation, an initial, theoretical comparison can be made by evaluating their relative D-efficiencies. This measure roughly compares the amount of information in each design. For instance, if 1 design is 90% D-efficient with respect to another, it indicates that the first design would need approximately 10% more observations to attain the same amount of information as the second design. The D-criterion values were calculated in Table 3 for 2 scenarios: 1) assuming the main effects model is the truth; and 2) assuming the interaction model is the truth. These values were

Table 2. Treatment assignment for the last 11 experimental units on the first rack of shelves for the D-optimal design constructed assuming interactiona Treatments Shelf Shelf Shelf Shelf a

1 2 3 4

1 1 2 2

2 2 3 3

3 3 4 4

4 4 6 5

7 5 7 6

8 6 8 8

9 9 9 9

12 11 10 10

14 13 11 14

15 15 12 15

16 16 13 16

The total number of experimental units on each shelf is 27, and in each shelf, in addition to the 11 experimental units shown, 1 replication of each of the 16 treatments is included.

Selecting the best design for nonstandard toxicology experiments

Environ Toxicol Chem 33, 2014

2403

Table 3. Determinant of M b1 ^ and D-efficiency for 3 designs, assuming models 1 and 2 Main effects model Design DoptInteraction Balanced cyclic RCBD

Interaction model

Determinant of M b1 ^

Relative efficiencya

Determinant of M b1 ^

Relative efficiencya

4.68  1013 4.78  1013 4.40  1011

0.998 1 0.594

5.60  1020 5.42  1020 1.35  1017

1 0.998 0.594

a

Relative D-efficiency is with respect to the design with the largest determinant for the indicated model. Dopt ¼ D-optimal; RCBD ¼ randomized complete block design.

calculated for the D-optimal design constructed assuming interaction, the balanced cyclic design, and the randomized complete block design. From the efficiency results, along with Table 4, we have initial indications that there is little difference between the D-optimal and balanced cyclic designs. Thus, we would expect that there would be no obvious winner between the designs in terms of their power to detect effects. Because the randomized complete block design only has 16 units/shelf and the other designs have 27 units/shelf, we expect that the randomized complete block design has approximately 16/27 (59 %) the information available relative to the cyclic or D-optimal designs. As we have noted, the randomized complete block design contained the lowest number of runs per block, that is, 1 of each treatment is assigned to each block, for a total of 16 treatments. The number of times each unique treatment appears in each block varied from block to block for the other designs, and the difference between these designs and the randomized complete block design can be seen in Table 4. For example, the treatment “pesticide absent, timing absent” appears twice in block 1 for the D-optimal interaction design, so the difference between the number of these treatments in this design and the randomized complete block design is 1. If a space is left blank, the difference is 0 (no difference in number of treatments). The balanced cyclic design is similar to the D-optimal interaction design in that the difference between these designs and the randomized complete block design is 0 or 1 for every treatment.

Simulation study

To compare the designs in terms of the ultimate goal of the experiment—to detect pesticide and/or pathogen exposure timing effects—a simulation study was implemented, and the designs were evaluated based on statistical power. The power, or sensitivity, of a statistical test is the probability of detecting an effect of a particular size. Data were simulated from each design based on model 1. That is, random responses were generated based on model 1; that is, MassN (m þ Pesticide þ Timing þ Block, s2). Each simulation required the specification of pesticide, timing, and block effects. A variety of effect sizes—low, medium, and high—that have different power properties were specified. Model 2 was also simulated, and the power was calculated to compare the designs in this case as well. For the simulations of model 1, we considered 9 conditions, each a different combination of effect sizes for pesticide and timing. For each of these conditions, the designs were compared and the main effects analyzed. We also simulated data from interaction model 2, in which case we only analyzed the interaction. Two different types of interaction were considered: synergistic (response is more than expected from the additive pesticide and pathogen exposure timing effects) and antagonistic (response is less than expected from the additive pesticide and pathogen exposure timing effects).

Table 4. Comparison of the D-optimal interaction design, the balanced cyclic design, and the randomized complete block design (RCBD)a Block D-optimal interaction Pesticide Absent

Present

a

Timing

1

2

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

1 1 1 1

1 1 1 1 1 1

1 1 1

1 1

1 1 1 1 1

1 1

3

4

1 1 1

1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1

1 1 1

5

6

7

8

1 1

1 1 1

1

1

1 1 1 1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1 1 1 1

1 1

1

Balanced cyclic 9

10

11 1 1

1

1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1

12

1 1 1 1 1 1 1 1 1 1 1

1

2

3

1 1 1 1 1 1

1 1 1

1

1 1 1

1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

4

1 1 1 1 1 1 1 1 1 1 1

5

6

1 1 1 1

1

1 1 1

1 1 1 1

1 1 1 1 1 1 1 1 1 1

7

1 1 1 1 1 1 1 1 1 1

1

8

9

1 1 1 1

1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1

10

1 1 1 1 1 1 1 1 1 1 1

11

12

1 1 1 1 1

1 1

1 1 1

1 1 1

1 1 1 1 1 1 1 1 1

Each entry represents the number of treatments in a particular block for a particular design. A 1 represents an extra treatment assigned to a particular block compared with the RCBD; an empty space represents the same number of treatments as the RCBD. For instance, in the RCBD, the first treatment appears once in block 1, but twice in both the D-optimal and balanced cyclic designs (denoted by a 1 in the upper left corner for each design).

2404

Environ Toxicol Chem 33, 2014

J.M. Webb et al.

For each of the 9 simulation conditions, power was estimated by calculating the proportion of simulations resulting in pesticide, timing, and/or interaction effects that were significantly different from 0. For each of the designs, the simulation was replicated 1800 times with the type I error rate controlled at a ¼ 0.05. Using 1800 simulation experiments/condition provides an approximate margin of error of  1% when a true 5% probability is estimated. The simulation experiments were implemented in R. Experimental conditions simulated

For all simulations, an overall mean mass (m) of 2.5 g with a standard deviation (se) of 0.11 was assumed. These values were observed in previous work where the effect of carbaryl (an insecticide) exposure on southern leopard frog tadpoles was studied [16]. Effect sizes, interaction effects, and block effects were chosen based on these quantities. The size of an effect can be measured relative to the amount of variation in the data. A large effect for a system with little inherent variability may be smaller than a small effect in a noisy system. Because the quantity se is a measure of the variation in the data, we chose effect sizes in relation to se. Pesticide effects were established by choosing 3 different effect sizes. For example, a medium pesticide effect of se  0.25 refers to a condition where, if the mean response in the absence of pesticide was m ¼ 2.5 (at a particular timing level, assuming the main effects model), then the mean response in the presence of pesticide would be m þ 0.25  se ¼ 2.5 þ 0.25(0.11) ¼ 2.528 (at the same timing level). For pathogen exposure timing effects, a standard deviation multiple was chosen for the timing level that had the largest effect on mass. In this case, the earlier the pathogen was introduced the greater the reduction in mass. The remaining levels were equally spaced between 0 and this effect size. For instance, for the medium pathogen effect size, the first pathogen exposure timing level (pathogen not introduced at all) was assumed to have no effect, while the effect of the eighth pathogen exposure timing level (earliest introduction of the pathogen) was the largest with a size of s e   0:55 ¼ 0:0605. The remaining 6 pathogen exposure timing levels were equally spaced between 0 and 0.0605. For these medium pathogen effects, along with the small and large effects, Table 5 shows the maximum effect sizes for both pesticide and timing predictors. We explored 2 sizes of block effects. First, block effects are assumed to be small and approximately equally spaced, ranging from 0.25  se to 0.25  se. Second, large block effects are assumed ranging from se to se. For the first condition, for example, the 16 levels of the block/shelf effect are (Shelf 1 ¼ 0:25s e ; Shelf 2 ¼ 0:2167s e ; …; Shelf 16 ¼ 0:25s e ). For the interaction model, we explored 2 types of interaction: synergistic and antagonistic. If no interaction between the effects was assumed, as the amount of time the tadpole is exposed to the pathogen increased, the difference between pesticide exposed

and not pesticide exposed stayed the same; that is, pesticide effect on mass was the same for all pathogen timing conditions. If a synergistic interaction was present, the difference in mean mass for pesticide absent and present increased as time of exposure to a pathogen lengthened. As the time of exposure to a pathogen lengthened with the antagonistic interaction, pesticide exposure moved from producing a higher mass for tadpoles to a lower mass compared with the no pesticide condition. As shown in Equation A2 in Supplemental Data, Section 1, the interaction model required the specification of each of the interaction parameters. For both interaction types, these parameter values were specified based on the timing and pesticide parameter values. Details are provided in Supplemental Data, Section 3. RESULTS

The sensitivity to detect significant effects (i.e., power) was computed for each of the 3 designs for each of 9 simulation conditions. For the no-interaction model 1, these results were compiled and are displayed graphically in Figures 2 to 5 for pesticide and timing for small and large block sizes. Each of the conditions, which represent a specific combination of true effect sizes, can be seen along the horizontal axis. The randomized complete block design had lower power to detect pesticide or pathogen timing effects compared with the other designs (randomized complete block design results not shown). This was expected because this design provides 40% less information than the other designs. The segments within each plot represent the power of the D-optimal (constructed assuming interaction) and balanced cyclic designs. In Figure 2, it appears that the Doptimal design is better than the cyclic in terms of power to detect pesticide effects when the difference between blocks are big and the true model is main effects only. However, the cyclic design appears better to detect timing effects (Figure 3). For small block effects (Figures 4 and 5), there is no great difference, although the D-optimal may be slightly preferred. The power to detect synergistic and antagonistic interaction is almost identical for the balanced cyclic and the D-optimal design (results not shown).

Table 5. Effects sizes for pesticide effect and the maximum level for timing effect for small, medium, and large effect sizes.

Low Medium High

Pesticidepresent

Timingg

se  0.15 se  0.25 se  0.35

se  0.35 se  0.55 se  0.75

Figure 2. Power to detect pesticide effects for each design across each effect level combination (pesticide/timing) for large blocks, when the true model includes no interaction. Effect size (pesticide/timing). L ¼ low; M ¼ medium; H ¼ high.

Selecting the best design for nonstandard toxicology experiments

Environ Toxicol Chem 33, 2014

2405

DISCUSSION AND CONCLUSIONS

Figure 3. Power to detect timing effects for each design across each effect level combination (pesticide/timing) for large blocks, when the true model includes no interaction. Effect size (timing/pesticide). L ¼ low, M ¼ medium, H ¼ high.

Figure 4. Power to detect pesticide effects for each design across each effect level combination (pesticide/timing) for small blocks, when the true model includes no interaction. Effect size (pesticide/timing). L ¼ low, M ¼ medium; H ¼ high.

Optimal design is a powerful strategy that can be flexibly adapted to nonstandard experimental scenarios. In the ecotoxicology study that motivated this investigation, the size of the blocks and the number of treatments were incompatible, and a standard design was either unavailable or did not utilize available space and thus had less power to detect effects of interest. D-optimal designs can accommodate such experimental constraints, and so this design methodology was introduced, and a D-optimal design was constructed, simulated, and evaluated in terms of power to detect effects of interest. Along with the Doptimal design, an inefficient randomized complete block design and a relatively intuitive balanced cyclic design were explored. We compared these 3 designs and found, as expected, that the randomized complete block design had less power to detect treatment effects because it fails to utilize all available experimental units. Increasing the number of experimental units observed increases the precision of the estimates, and therefore the power to detect existing effects. In a small simulation study, the balanced cyclic and Doptimal design produced similar results in terms of D-efficiency or power. Although this may be surprising at first glance because the optimal design was constructed expressly to make parameter estimates precise, a review of Table 4 gives insights as to why the cyclic design performed so well. We see that both the optimal and cyclic designs added treatments to blocks in a similar manner—11 additional conditions of each of the 16 treatments were added in both designs, and there was no discernible pattern to the pairing of added treatments. The use of optimal experimental design ideas is beneficial for ecotoxicologists considering how to assign treatments to experimental units in situations where the familiar standard designs do not apply. Although this assumes the specification of a statistical model for analysis, as well as a knowledge of the number of units to be assigned and the experimental constraints, the experimenter often has a good idea about each of these in the planning stages of the experiment. The computing tools for obtaining these designs are widely available, and we believe that this makes their implementation accessible. As noted, a strength of optimal design is its adaptability to the experiment at hand. For instance, it is possible that, for an experiment similar to the one described, say, only 14 replications of each of the 16 treatments are necessary, as a consequence of a sample size calculation suggesting that this is sufficient to detect biologically significant effects. The randomized complete block design is still not appropriate because there are still only 12 blocks (shelves) available. A D-optimal design can still easily be constructed as long as the correct number of experimental units (14  16 ¼ 224) and the number and size of blocks are correctly specified. In fact, the optimal design strategy will work regardless of the number of treatments, the number of replications, or the number or size of blocks. One final note: although the designs considered in the present study may be unfamiliar to many readers, the associated analysis is not. Once the data are collected, using the chosen design, a standard analysis of variance or regression-type analysis can be performed to determine whether the timing and/or pesticide factors have an effect, in the presence of blocks. SUPPLEMENTAL DATA

Figure 5. Power to detect timing effects for each design across each effect level combination (pesticide/timing) for small blocks, when the true model includes no interaction. Effect size (timing/pesticide). L ¼ low; M ¼ medium; H ¼ high.

Sections S1–S4. (105 KB DOC). Acknowledgment—The authors express their thanks to S. Rumschlag for providing the motivating example that inspired the present study, and to J.

2406

Environ Toxicol Chem 33, 2014

Oris for his comments on a previous version of this article. The study by Rumschlag and Boone received the 2013 Herpetologists’ League Graduate Research Award (Herpetologica 2013;69:493–494), and we expect that this article will appear in print soon. In addition, the authors thank the anonymous journal reviewers for their helpful comments and critiques.

REFERENCES 1. Goos P, Jones B. 2011. Optimal Design of Experiments: A Case Study Approach. John Wiley & Sons, West Sussex, UK. 2. Atkinson AC, Donev AN, Tobias R. 2007. Optimum Experimental Designs, with SAS. Oxford University Press, Oxford, UK. 3. Wright SE, Bailer AJ. 2006. Optimal experimental design for a nonlinear response in environmental toxicology. Biometrics 62:886–892. 4. Faller D, Klingmüller U, Timmer J. 1995. Simulation methods for optimal experimental design in systems biology. Simulation 79:717– 725. 5. Balsa-Canto E, Alonso AA, Banga JR. 2008. Computational procedures for optimal experimental design in biological systems. IET Syst Biol 2:163–172. 6. Bandara S, Schl€oder JP, Eils R, Bock HG, Meyer T. 2009. Optimal experimental design for parameter estimation of a cell signaling model. PLOS Comput Biol 5:1–12.

J.M. Webb et al. 7. Versyck KJ, Bernaerts K, Geeraerd AH, Van Impe JF. 1999. Introducing optimal experimental design in predictive modeling: A motivating example. Int J Food Microbiol 51:39–51. 8. Insel G, Orhon D, Vanrolleghem PA. 2003. Identification and modelling of aerobic hydrolysis—application of optimal experimental design. J Chem Technol Biotechnol 78:437–445. 9. de Aguiar PF, Bourguignon B, Khots MS, Massart DL, Phan-Than-Luu R. 1995. D-optimal designs. Chemometr Intell Lab 30:199–210. 10. Olsson I-M, Gottfries J, Wold S. 2004. D-optimal onion designs in statistical molecular design. Chemometr Intell Lab 73:37–46. 11. Fedorov V. 2002. Optimal design. In Encyclopedia of Environmetrics Vol 2. John Wiley & Sons, New York, NY, USA, pp 1474–1481. 12. R Core Team. 2013. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, [cited 2014 January 20]. Available from: http://www.R-project.org/. 13. Kutner M, Nachtsheim C, Neter J, Li W. 2004. Applied Linear Statistical Models. 5th ed. McGraw-Hill Irwin, New York, NY, USA. 14. Cook RD, Nachtsheim CJ. 1989. Computer-aided blocking of factorial and response-surface designs. Technometrics 31:339–346. 15. Meyer RK, Nachtsheim CJ. 1995. The coordinate-exchange algorithm for constructing exact optimal experimental designs. Technometrics 37:60–69. 16. Boone M, James S. 2003. Interactions of an insecticide, herbicide and natural stressors in amphibian community mesocosms. Ecol Applic 13:829–841.

Selecting the best design for nonstandard toxicology experiments.

Although many experiments in environmental toxicology use standard statistical experimental designs, there are situations that arise where no such sta...
204KB Sizes 2 Downloads 4 Views