Time series analysis of contaminant transport in the subsurface: applications to conservative tracer and engineered nanomaterials.

Time Series Analysis of Contaminant Transport in the Subsurface: Applications to Conservative Tracer and Engineered Nanomaterials Chunmei Bai, Yusong Li PII: DOI: Reference:

S0169-7722(14)00083-7 doi: 10.1016/j.jconhyd.2014.06.002 CONHYD 3009

To appear in:

Journal of Contaminant Hydrology

Received date: Revised date: Accepted date:

30 August 2013 29 May 2014 3 June 2014

Please cite this article as: Bai, Chunmei, Li, Yusong, Time Series Analysis of Contaminant Transport in the Subsurface: Applications to Conservative Tracer and Engineered Nanomaterials, Journal of Contaminant Hydrology (2014), doi: 10.1016/j.jconhyd.2014.06.002

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Time Series Analysis of Contaminant Transport in the Subsurface:

SC

RI

PT

Applications to Conservative Tracer and Engineered Nanomaterials

MA

NU

Chunmei Bai and Yusong Li*

D

Department of Civil Engineering, University of Nebraska – Lincoln

AC CE P

TE

362R Whittier Building, 2200 Vine Street, Lincoln, NE, 68583

Submitted to:

Journal of Contaminant Hydrology August 30th, 2013 Revision: May 30th , 2014

* Corresponding

author phone: (402) 472-5972; fax: (402) 472-8934; e-mail: [email protected].

1

ACCEPTED MANUSCRIPT ABSTRACT Accurately predicting the transport of contaminants in the field is subject to multiple

PT

sources of uncertainty due to the variability of geological settings, the complexity of field

RI

measurements, and the scarcity of data. Such uncertainties can be amplified when modeling

SC

some emerging contaminants, such as engineered nanomaterials, when a fundamental understanding of their fate and transport is lacking. Typical field work includes collecting

NU

concentration at a certain location for an extended period of time, or measuring the movement of

MA

plume for an extended period time, which would result in a time series of observation data. This work presents an effort to evaluate the possibility of applying time series analysis, particularly,

D

Autoregressive Integrated Moving Average (ARIMA) models, to forecast contaminant transport

TE

and distribution in the subsurface environment. ARIMA modeling was first assessed in terms of its capability to forecast tracer transport at two field sites, which had different levels of

AC CE P

heterogeneity. After that, this study evaluated the applicability of ARIMA modeling to predict the transport of engineered nanomaterials at field sites, including field measured data of nanoscale zero valent iron and (nZVI) and numerically generated data for the transport of nanofullerene aggregates (nC60). This proof-of-concept effort demonstrates the possibility of applying ARIMA to predict the contaminant transport in the subsurface environment. Like many other statistical models, ARIMA modeling is only descriptive and not explanatory. The limitation and the challenge associated with applying ARIMA modeling to contaminant transport in the subsurface are also discussed. Keywords: Transport; Subsurface; MT3DMS; Time series analysis, Autoregressive Integrated Moving Average; Engineered nanomaterials

2

ACCEPTED MANUSCRIPT 1. Introduction Although the theory of modeling contaminant transport in the subsurface is well

PT

developed and validated in laboratory settings (Fetter 2008), accurately predicting the transport

RI

of contaminants in the field is still a challenge. Variability of geological settings, such as

SC

multiscale heterogeneity of hydraulic conductivity fields and existence of fractured formations, can lead to uncertainties for subsurface characterization (Freyberg 1986; Gwo, Jardine et al.

NU

2005). The scarcity of data due to high cost associated with subsurface characterization often

MA

brings in another level of uncertainty in predictive modeling. Such uncertainties can be amplified when modeling some emerging contaminants, such as engineered nanomaterials, when a

D

fundamental understanding of their fate and transport is lacking.

TE

Due to large production and the wide application of engineered nanomaterials

AC CE P

(WWICS ; LuxResearch 2009), these novel materials may enter the environment, including the subsurface, via various pathways (Benn and Westerhoff 2008; Kaegi, Ulrich et al. 2008; Zhang, Chen et al. 2008; Geranio, Heuberger et al. 2009; Benn, Cavanagh et al. 2010); such as direct leaking from underground waste sites, reuse of wastewater, and agricultural use of biosolids containing engineered nanomaterials (Benn and Westerhoff 2008; Kaegi, Ulrich et al. 2008; Zhang, Chen et al. 2008; Geranio, Heuberger et al. 2009; Benn, Cavanagh et al. 2010). Accurately predicting the transport of engineered nanomaterials in the subsurface will provide critical information for risk assessment and policy development for these emerging contaminants. Predicting the fate and transport of engineered nanomaterials in a natural subsurface environment is complicated by the unique properties of nanomaterials and their interaction with aquifer materials.

3

ACCEPTED MANUSCRIPT The majority of the available nanomaterial transport models (Petosa, Jaisi et al. 2010) focused on simulation of one-dimensional column experiments by modifying the classic

PT

filtration theory (CFT) (Yao, Habibian et al. 1971). While these models can provide valuable

RI

insight on the mechanisms of nanoparticle transport in well-controlled experimental conditions, field scale uncertainties were typically not considered. Based upon column scale mechanistic

SC

understanding, several multiple dimensional numerical models (Cullen, O'Carroll et al. 2010; Bai

NU

and Li 2012; Bai, Eskridge et al. 2013)were developed to simulate nanomaterial transport at the field scale. Random hydraulic conductivity fields were used for transport simulations to

MA

incorporate field scale uncertainty. None of these models were validated at field scale. Very recently, a field validated model (Krol, Oleniuk et al. 2013) was developed to simulate in situ

TE

D

transport of nanoscale zerovalent iron (nZVI). The success of such a field scale model is dependent on a careful characterization of the field site, detailed laboratory study on the

AC CE P

mechanisms involved, and also extensive field scale tracer tests. Should the release of nanomaterials occur in another location where detailed characterization was lacking, it be difficult to achieve timely prediction of contamination if solely relying on such complicated models.

In this work, we evaluated an alternative statistically based time series analysis approach, namely autoregressive integrated moving average (ARIMA) modeling (George E. P. Box 1976), to predict contaminant transport in the subsurface. A time series refers to a set of measurements collected sequentially over time, where the observation points are not independent of each other but show significant autocorrelation between subsequent observations. These autocorrelations prohibit using traditional statistical methods (e.g. regression) to fit the data, because those methods assume that the observation points are independent. Transport of contaminants,

4

ACCEPTED MANUSCRIPT including nanomaterials, in the subsurface is a time dependent process. Typical field work includes collecting concentration at a certain location for an extended period of time, or

PT

measuring the movement of plume for an extended period time, which would result in a time

RI

series of observation data. For example, the contaminant concentration measured in an observation well on a certain day is correlated to the concentration measured a day before in the

SC

same well.

NU

Presumably, ARIMA models should be able to applicable to predict time series data for

MA

contaminant transport in the subsurface, as long as a time series of observation data are available. While we fully understand that ARIMA models will not be able to provide mechanistic

D

explanations on the contaminant transport mechanisms, it could still be a valuable tool for a

TE

quick prediction of contaminant transport in the near future based on historical observation data. There is currently no such study available, although ARIMA models have been previously used

AC CE P

in various areas in environmental studies, including air pollution (Sharma, Chandra et al. 2009; Mann, Balmes et al. 2010), noise forecasting (Kumar and Jain 1999), ecology (Jia, Zhao et al. 2010), and hydrology (Hameed, Marino et al. 1996). In this work, ARIMA modeling was first assessed in terms of its capability to forecast tracer transport at two field sites, which had different levels of heterogeneity. After that, this study evaluates the applicability of ARIMA modeling to predict the transport of engineered nanomaterials at the field site, where the physics of contaminant transport was more complicated than for conservative tracers. Two cases studies, one based on numerically generated data for the transport of nano-fullerene aggregates (nC60) (Bai and Li 2012), another based on the field data of nZVI transport (Krol, Oleniuk et al. 2013) were studied. Finally, the manuscript concludes

5

ACCEPTED MANUSCRIPT with a discussion and recommendations for applying ARIMA modeling for contaminant

PT

transport in the subsurface.

RI

2. Description of the ARIMA Models

SC

2.1 ARIMA Theory

ARIMA models consist of two parts: an autoregressive (AR) model and a moving

NU

average (MA) model. In addition, the middle letter “I” in ARIMA, means integrated and

MA

represents the order of differencing.

Let yt, t=1, … T , denote a sequence of measurements of a variable y at subsequent and

D

equally spaced times t. If yt is a nonstationary time series, differencing, or the derivative of the

TE

series over time, will need to be applied to remove trends from the data: (1)

AC CE P

,

where, dt is typically chosen as a unit time step. Differencing can be continued until the series processes a constant mean and variance. The AR and MA model can then be applied to the stationary data series xt.

The AR model represents the regression of xt on time lags of itself. An AR model with an order of p , i.e. AR (p), can be expressed as a linear combination of its previous values: ,

(2)

where Ф1, ... , Ф p are the coefficients of AR (p), µ is the mean of xt, and wt is the residual error (or white noise) at time t, which follows a normal distribution with zero mean and variance of σ2. Ф values with absolute values closer to 1 would indicate a higher autocorrelation in the x series. 6

ACCEPTED MANUSCRIPT The MA part of the ARIMA model denotes the structure on the residual term. A MA model with an order of q, i.e. MA (q), can be expressed as a linear combination of white noises of

,

(3)

RI

PT

its previous times:

SC

where, θ1, …, θq are the coefficients of the MA(q) parameters, θ values with absolute values

NU

closer to 1 would indicate a higher autocorrelation of the residual term. An ARIMA model is generally presented as ARIMA(p,d,q), in which p and q represent

MA

the orders of AR and MA models, respectively, and d denotes the order of differencing. The

D

general ARIMA(p,d,q) can be expressed as:

TE

(4)

;

AC CE P

Here, B is the backshift operator defined as

the AR (p) operator;

is

is the MA (q) operator.

2.2 ARIMA Model Development Procedure Three key steps are involved in the development of ARIMA models (George E. P. Box 1976), including model identification, parameter estimation, and diagnostic checking. The model identification step is the process to find values of p, d, and q for an ARIMA (p, d, q) model. First, the stationarity of the time series will be checked. A data series is considered as stationary, if the key parameters such as the mean and the variance are constant over time. If the time series is not stationary, it is then transformed to be stationary either by differencing or by some other means of transformation (e.g. logarithmic transformation). The value of d is 7

ACCEPTED MANUSCRIPT determined as the number of differencing steps necessary to make the data series stationary. After obtaining a stationary data series, the orders p and q of the AR and MA models will be

PT

determined. Here, plots of the autocorrelation function (ACF) and partial autocorrelation

RI

function (PACF) are typically used to identify the values of p and q. The ACF at time lag h represents the correlation of a variable with itself at different time points (e.g. correlation

SC

between xt and xt+h). The PACF at time lag h represents the remaining autocorrelation between xt

NU

and xt+h after the linear dependency through xt+1 to xt+h-1 has been removed. The value of p is determined by the number of significant spikes in the PACF plot, and the value of q is

MA

determined by looking at after how many lags the ACF plot cuts off.

D

The second step of ARIMA model development is parameter estimation, which is to , as defined in equation

TE

estimate the values of µ and the coefficients in operators

AC CE P

(4). A nonlinear least squares scheme, which minimizes the forecasting errors, is typically used for parameter estimation. Sometimes more than one ARIMA model, i.e. ARIMA models with different values of p, q, and d, may seem appropriate. In such situations, the Akaike Information Criteria (AIC) (Shumway 1988), can be used to determine the best model. The AIC is a widely used model selection criteria, which selects the models by rewarding goodness of fit while imposing a penalty for each additional model parameter. In addition to AIC, the residual variances of different models can be used for the model selection. The last step of ARIMA model development is diagnostic checking, which is to check the adequacy of the model. Particularly, this step is to check whether the residuals follow a normal distribution with constant variance. The ACF and PACF of the residuals from a satisfactory ARIMA model should not be significantly different from zero at any lag. In addition, the Ljung-

8

ACCEPTED MANUSCRIPT Box-Pierce Q-statistic test (Ljung and Box 1978) can be applied to further confirm the independence among the residuals.

PT

In this study, the statistical environment R (R Development Core Team 2012) (Team

RI

2011)were used to conduct ARIMA time series modeling.

SC

3. ARIMA Model for Predicting Tracer Transport

NU

3.1 Field Tracer Data

The first set of field tracer data was based on a long term large scale natural gradient field

MA

experiment conducted at the aquifer that underlies an abandoned sand quarry at Canadian Forces Base Borden in Ontario, Canada (Mackay, Freyberg et al. 1986). In the experiment, 12 cubic

D

meters of solution containing two inorganic nonreactive tracers, i.e. bromide and chloride, were

TE

slowly injected into the aquifer at a rate of about 0.8m3/hr. After injection, the plume of tracer

AC CE P

moved under the action of natural gradient at the site, and was monitored using a dense network of multilevel point samplers. Based on nearly 10,000 samples from within and around the tracer plume, the locations of center of mass of the plume was estimated for three years by Freyberg (Freyberg 1986) (Figure 1). This field site is herein called the Borden site. The second set of field tracer data was based on tracer release and monitoring experiments at a low-level waste disposal site in eastern Tennessee (Gwo, Jardine et al. 2005). The aquifer has a fractured bedrock formation with a preferred zone underlain by a bedrock zone. The conservative tracer bromide was released through an upstream source well for 180 day, and its concentration was monitored at a downstream observation well with a distance of 16.8 m away from the source well. The observed bromide relative concentration breakthrough curve at

9

ACCEPTED MANUSCRIPT one of the observation wells was plotted in Figure 2. This field site is herein called Tennessee site.

PT

3.2 ARIMA Prediction of Field Tracer Transport

RI

The historical data at the Borden Site included eleven unequally spaced points for the

SC

locations of the center of the plume from day 9 to day 447. Interpolations were applied to obtain a time series of 87 data with equal space of 5 days, which were used to predict the location of the

NU

center of mass at day 647 of the experiments. Application of the ARIMA modeling procedure

MA

identified an ARIMA (2, 1, 0) model. Model parameters with p-value less than 0.05 are provided in Table 1. Here, the value of q equals to 0, indicating that the model is not dependent on the

D

error structure from previous time steps. In other words, noise from a previous step does not

TE

influence the location of the center of mass in the next step, which could be due to the

AC CE P

homogeneous nature of the site.

Figure 1 plots the original experimental data and the ARIMA model simulated results. The ARIMA model can provide very accurate simulation of the historical data, while successfully predicting the last two experimental data. The mean squared error (MSE) values for historical data and prediction are 8.66e-4 and 6.72e-2, respectively. The trend of the original data indicates a remarkably linear horizontal trajectory of tracer plume over the duration of the experiments, indicating a largely homogeneous aquifer on this site. In an early publication (Freyberg 1986), a straight line which minimizes the sum of squared orthogonal distances between the data points and the lines was used to fit the data, which gave a deviation of 1.56 m at the last sampling point (day 647). In comparison, the prediction by the ARIMA model presented here is only 0.4 m different from the measurement. Clearly, the ARIMA model was able to capture a small change of slope of the original data set during the time period, which may be due 10

ACCEPTED MANUSCRIPT to some uncertainties associated with groundwater flow field changes or anisotropic properties of the aquifer.

PT

At the Tennessee site (Figure 2), 38 measurements were available for tracer concentration

RI

in an observation well over a period of 106 days. The DigitizeIt software (Bormisoft, Germany)

SC

was used to obtain the value of each measurement from a previously published paper (Gwo, Jardine et al. 2005). Interpolations were applied to obtain a time series with equal space of 1 day.

NU

83 data points in the period from day 8 to day 90 were used as historical data, to predict tracer

MA

concentration until day 100. Application of the ARIMA modeling procedure identified an ARIMA (0, 1, 4) model. Model parameters with p-value less than 0.05 are provided in Table 2.

D

Here, a p value of 0 and a d value of 1together indicate that the data in the series are not

TE

autocorrelated after a first order differencing. A q value of 4 indicates that error structure from the previous 4 time steps will influence the prediction at a certain time, which is consistent with

AC CE P

the heterogeneous nature of the site. According to previous work (Gwo, Jardine et al. 2005), the aquifer on this site presents multiscale heterogeneity due to the presence of fractures in porous medium.

Figure 2 provides the comparison of originally measured data series and the model simulated results. The ARIMA model can accurately simulate the rising concentration, including some anomalous lower concentration points. The MSE value for historical data was 5.24E-6. Up until day 100 (red line in Figure 3), the model can reasonably predict the tracer concentration, keeping all the interpolated and observed data points within a 95% prediction confidence interval with a prediction MSE of 8.09E-05. The prediction, however, will be out of the 95% confidence interval when the prediction period is extended. Further discussion on the prediction length of ARIMA will be provided below. 11

ACCEPTED MANUSCRIPT 3.3 Implementation Issues 3.3.1 Interpolation interval

PT

The first challenge of developing ARIMA models for field tracer transport in the

RI

subsurface is to satisfy the requirement of data. To successfully conduct time series analysis, it is

SC

typically suggested to start with a series of at least 50 equally spaced data (George E. P. Box 1976), which can be difficult to obtain for monitoring contaminant transport in the subsurface

NU

field. At the Borden site, eleven data points were available for the locations of the center of the

MA

plume during two years of monitoring. At the Tennessee site, 38 points were available for tracer concentration over a period of 106 days. More importantly, field data are typically not equally

D

spaced, so that ARIMA modeling cannot be directly applied. In this work, relatively dense

TE

interpolations were conducted, i.e. a 5 day interval for the Borden site, and a 1 day interval for

AC CE P

the Tennessee Site. To explore how the interpolation interval influences the model performance, we examined different interpolation intervals for both the Borden site and the Tennessee site. Tables 1 and 2 provide the identified ARIMA models based on various time intervals for both the Borden site and the Tennessee site, as well as the MSE based on the historical data and prediction. Interestingly, the different time intervals lead to different ARIMA models, although all the cases explored here were able to provide very good predictions, as demonstrated by the low MSE values. For the Borden site, ARIMA models identified for intervals of 10 days and 15 days require a p value of 1, but the ARIMA model for an interval of 5 days requires a p value of 2. The smaller time interval size requires higher p values because the autoregression process is dependent on not only on the previous time step, but also on another prior time step. All three ARIMA models provided very small MSE for history data and prediction data. Among them the 12

ACCEPTED MANUSCRIPT model based on an interval of 5 days provided the lowest MSE to simulate historical data, but the model based on an interval of 10 days provided the lowest MSE for the prediction.

PT

For the Tennessee site, ARIMA models identified for intervals of 1.5 days and 2 days

RI

require a q value of 2, but the ARIMA model for an interval of 1 day requires a q value of 4.

SC

Higher q values indicate that error structures from previous time steps will influence the model, which is consistent with the heterogeneity nature of the site (Gwo, Jardine et al. 2005). All three

NU

ARIMA models provided very small MSE for history data and prediction. Among them, the

MA

model based on an interval of 1 day provided the lowest MSE values for both history data and prediction.

D

Findings from this part of the work indicate that a smaller time interval generally may

TE

introduce additional parameters for ARIMA models. However, models based on smaller time

AC CE P

intervals indeed will provide improved accuracy for modeling. 3.3.2 Order of differencing

One of the fundamental assumptions of ARIMA modeling is that the data series can be transformed to stationary. If the time series is not stationary, differencing is typically conducted to make the data series stationary. During the ARIMA modeling of field tracer tests, we found that the stationarity process is one of the rather subjective steps, which is explained below using tracer transport data from the Borden site. Typically, if the mean and the variance of the time sequence are not consistent over time, it is a sign of nonstationarity. Figure 3(A) provides a visual check on the consistency of the Borden site tracer data over time after first order differencing. The data is generally consistent, but with a visible slightly decreasing trend. The mean was estimated at 0.436 for up to day 117,

13

ACCEPTED MANUSCRIPT while it decreased to 0.414 for up to day 417. Another indicator of stationarity is based on the autocorrelation. When the sample autocorrelation is relatively high at the beginning (>0.8) and

PT

decays very slowly to zero (e.g. over more than ten time lags), it is an indicator of nonstationarity

RI

(Entink, Fransman et al. 2011). Figure 3(B) provides a plot of the autocorrelation of the data after a first order differencing. As shown here, the ACF gradually decreased from 0.6 to about

NU

criteria of stationarity, with a weak trend over time.

SC

0.1 after ten time lags. Based on Figure 3 (A) and (B), it seems that the data barely meets the

MA

Because of that, a second order differencing was conducted, and Figure 3 (C) and (D) plot the trend and autocorrelation of data after second order differencing. Data after second order

D

differencing showed much stronger stationary features than after only one differencing, indicated

TE

by a more consistent mean and faster decreasing of ACF over time. As such, an ARIMA (0, 2, 1) model was identified based on the second order differencing. Figure 4 provides a comparison

AC CE P

between ARIMA (2, 1, 0) and ARIMA (0, 2, 1) models for prediction. Surprisingly, ARIMA (0, 2, 1) provided much higher MSE and wider 95% confidence interval than ARIMA (2, 1, 0), although ARIMA (0, 2, 1) was based on a much more stationary data set. For predicting tracer transport in the subsurface, this finding indicates that higher data stationarity does not necessarily mean the model will perform better. Additional differencing actually may introduce artificial errors and higher values of MSE for the model or forecasting. Because the criteria for stationarity is somewhat subjective, we suggest minimizing the number of differencing to reduce the artificial errors. 3.3.3 Prediction length Traditionally, ARIMA modeling is recommended for short-term forecasting (e.g. forecasting 5 points based on every 100 historical data points) (Shumway and Stoffer 2010). 14

ACCEPTED MANUSCRIPT Actual engineering applications, however, typically require a longer prediction period. Here, we evaluate the possible length that a model can provide accurate predictions based on the field

PT

tracer sites.

RI

In the ARIMA model for the Borden site (Figure 2), the ratio of the prediction length to

SC

the history data length is about 46.5%, which is remarkably higher than the typically suggested prediction length for an ARIMA model (George E. P. Box 1976). During this prediction length,

NU

all the observation data were within the 95% confidence interval. Due to the limitation of data

MA

availability, it is not possible to check whether an even longer prediction length is reasonable for the Borden site.

D

For the Tennessee site, the ratio of the prediction length to the history data length is 12%

TE

for predicting from day 91 to day 100 (the red line in Figure 2). Up until this range, all the

AC CE P

observation data are within the 95% confidence interval of prediction. We applied the ARIMA model to forecast a longer time period. As show in Figure 2, two observation points will be outside of the 95% confidence interval of prediction if the prediction period is extended to day 109. Therefore, the model is not capable of predicting as long as the Borden site. We attribute the reduced predictive performance of the model at the Tennessee site to the heterogeneity nature of the site. The higher heterogeneity results in higher uncertainty, so that the prediction length of the ARIMA model was relatively smaller at the Tennessee site. This finding indicates that the acceptable prediction length of ARIMA models is very site specific for contaminant transport in the subsurface, which is highly dependent on the heterogeneity of the site and the uncertainty of the data.

15

ACCEPTED MANUSCRIPT 4. ARIMA Model for Predicting Nanomaterial Transport 4.1 nC60 transport

PT

Due to the limited availability of actual field site data for nanoparticle transport, this

RI

evaluation was first accomplished based on a hypothetical scenario of injecting nC60 into a field

SC

site. This field site was based on a study site in Oscoda, MI, which was extensively characterized in multiple previous studies (Lemke, Abriola et al. 2004; Lemke, Abriola et al. 2004; Phelan,

NU

Lemke et al. 2004; Ramsburg, Abriola et al. 2004; Christ, Lemke et al. 2005; Cullen, O'Carroll et

MA

al. 2010). The hydraulic conductivity estimates varied from 1 to 48 m/d, with a mean of 16.8 m/d and a lnK variance of 0.29. Although 167 homogenized core subsamples from fourteen aquifer

D

cores were analyzed in the field experiment, the amount of hydraulic conductivity measurements

TE

is significantly below the required resolution for accurate simulation of groundwater flow and

AC CE P

contaminant transport. The hydraulic conductivity distribution in this field site was therefore treated as a random field. A conditional Sequential Gaussian Simulation (SGS) (Lemke, Abriola et al. 2004; Lemke, Abriola et al. 2004) was applied to generate a series of random hydraulic conductivity fields based on the statistics and spatial correlation information from the available measurements. In this work, fifteen realizations of the non-uniform permeability random fields were constructed. An example hydraulic conductivity field is provided in Figure 5. Numerical experiments were conducted to study scenarios of nanoparticle transport at this site. First, a steady state groundwater flow condition was applied through a hydraulic gradient of 0.01 along the x direction. nC60 nanoparticle solution was released through a 1.2 m long well screen located upstream of the site for a period of 48 hours. Initial concentration of nanoparticles was assumed to be 1 mg/L. Nanoparticles were assumed to have a particle size of 100nm and a collision efficiency factor of 0.001. A modified Modular Three-Dimensional 16

ACCEPTED MANUSCRIPT Transport Model for Multispecies (MT3DMS) (Zheng and Wang 1999)was used to simulate nC60 nanoparticle transport at the site (Bai and Li 2012). As detailed in a previous publication

PT

(Bai and Li 2012), transport of nC60 nanoparticles were simulated based on a modified filtration

RI

theory(Li, Wang et al. 2008; Mauter and Elimelech 2008). In this model, transport of nanoparticles in the subsurface was influenced by advection, dispersion and attachment

SC

processes. An effective attachment rate was considered as linearly related to the available soil

NU

surfaces and also was subject to a maximum retention capacity (Li, Wang et al. 2008). Figure 5 provides an example simulated distribution of nC60 nanoparticle in the field site at the end of

MA

simulation (i.e. 48 hours continuous release of nC60 nanoparticle). Based on the simulated concentration distributions at every 20 minutes, two time series were generated for nanoparticle

TE

D

transport, including (1) the x location of aqueous plume far front (C/C0 = 0.001), or C front; (2) the x location of attached phase far front (S = 0.001μg/g), or S front.

AC CE P

As illustrated in Figure 6, 129 equal spaced data points were obtained from the first 43 hours of nanoparticle injection, which were used to predict the locations of C front and S front for the next 5 hours. Figure 6 illustrates C front and S front time series data, which was averaged from fifteen hydraulic conductivity realizations, and the error bars indicate the variances among the realizations. As expected, the locations of C front gradually increase, as the aqueous plume moves forward. Similarly, the locations of S front also gradually increase with a slower rate than the C front. Following the ARIMA modeling procedure, an ARIMA (0,1,0) model and an ARIMA (1,1,0) model were identified for C front and S front, respectively. The model parameters and MSE values are provided in Table 3. Here, a first order differencing was able to remove the time dependent trend for both the C front and S front time series. Because both C front and S front

17

ACCEPTED MANUSCRIPT reflect the locations of the plumes, the first order differencing, or dy/dt, should be dependent on the groundwater velocity, which is in steady state during the modeling period. Therefore, it is not

PT

surprising that a first order differencing is sufficient to remove the time dependent trend.

RI

For modeling C front, no significant spike was found in the ACF plot after a first order

SC

differencing, which means both p and q are equal to zero. These results lead to the conclusion that the first order differenced C front location follows a random walk model (ARIMA (0,1,0))

NU

with a drift of 0.0191 (µ in Table 3). This model has an upward increasing trend, which was

MA

reflected by the positive sign of the drift. The surprisingly simple ARIMA model for predicting nanoparticle C front can be attributed to the nature of the numerical hypothetical scenario, where

D

the uncertain only comes from the permeability random fields. As previously discussed, the

TE

Bachman Road site is a comparably homogeneous field site, so that the error structure originated

nanoparticles.

AC CE P

from the local scale heterogeneity does not have significant impact on the movement of nC60

The ARIMA model for the S front is slightly more complicated than the model for the C front with a p value of 1. As briefly described in the nanoparticle transport model (Bai and Li 2012), the attachment of nanoparticles was considered as a function of the previously attached concentration. Therefore, it is reasonable in the ARIMA model that the movement rate (dy/dt) of the S front is influenced by the rate of one time step ahead, indicating a p value of 1. Figure 6 compares the observed and ARIMA model simulated C front and S front. Please note that simulations were based on averaged values at each time step for 15 realizations of the random field. The error bars in the figure denote the standard deviations of the observation at each time point, due to the variations of the hydraulic conductivity fields. ARIMA models are

18

ACCEPTED MANUSCRIPT able to provide good fits to the historical data; the 95% confidence intervals for the prediction period are within the standard deviation of the observation data.

PT

4.2 nZVI transport in the field

RI

In several remediation applications, nZVI was injected into the subsurface to treat

SC

contaminated groundwater. However, most of these studies have suffered from poor nZVI mobility, and many of them have well clogging issues due to particle aggregation and settling, so

NU

that very limited data are available for nZVI transport in the field. In a recent field test (Bennett,

MA

He et al. 2010), carboxylmethyl cellulose (CMC) stabilized nZVI was delivered into the subsurface. A series of push-pull tests (PPTs) were performed to assess the mobility of CMC-

D

nZVI. The transport of CMC-nZVI in the subsurface was controlled by many factors, such as

TE

aquifer characteristics, injection velocities, and interaction between particles and aquifer

AC CE P

materials. Based on a tremendous amount of site characterization, tracer tests, and well controlled PPT operation, a recent study (Krol, Oleniuk et al. 2013) published a three dimensional finite difference based numerical model that was able to fit CMC-nZVI transport data from several PPTs by treating collision efficiency as a fitting parameter. Here, applicability of ARIMA to predict CMC-nZVI transport in this relatively well controlled field experiment will be evaluated. The observed CMC-nZVI data in the PPT-1 was used for the simulation (Figure 7). The DigitizeIt software (Bormisoft, Germany) was used to extract seven CMC-nZVI concentration observation data from a published paper (Krol, Oleniuk et al. 2013) . Interpolations were applied to obtain 83 equal spaced time series data points based on the measurements from beginning to the second last data point, which is when the volume extracted is about 4 times volume injected. These 83 interpolated data were used as historical data to predict CMC-nZVI concentration for the last data point. Application of the ARIMA modeling procedure identified 19

ACCEPTED MANUSCRIPT an ARIMA (2, 1, 0) model. Model parameters with p-value less than 0.05 are provided in Table 3. Figure 7 plots the original experimental data and the ARIMA model simulated results.

PT

Consistent with ARIMA models for tracer and numerically generated nC 60 data, a first

RI

order differencing was sufficient to eliminate the correlation between data points. In this model,

SC

the value of q is 0, or the model is not dependent on the error structure. Consistent with the Borden site and the Bachman site, a q value of 0 may imply a relatively homogenous aquifer.

NU

According to the discussion in the original study (Krol, Oleniuk et al. 2013), the aquifer in this

MA

site consists of distinct layers without interlayer flows, and the PPT-1 was located in a homogeneous and isotropic sandy layer. Finally, the p value of this ARIMA model is 2, in

D

contrast to a 0 and 1 for the C front and S front of numerically generated nC60 transport data,

TE

which inidcates more complicated physical chemical processes occurred during the transport and deposition of nZVI in the aquifer. According to Figure 7, the ARIMA model can provide very

AC CE P

accurate simulation of the historical data. The last measured point was just within the 95% confidence interval of the prediction. 5. Discussion

While acceptable results were demonstrated from applying ARIMA models to simulate transport of tracers and nanomaterials in the subsurface, it is very important to point out the drawback of such model. Like many other statistical models, ARIMA modeling is only descriptive and not explanatory. Therefore, ARIMA models will not be able to provide mechanistic analysis of the processes controlling contaminant transport in the subsurface. However, ARIMA models could serve as a reasonable tool to quickly predict the transport of contaminant when limited site characterization is available. Another limitation of ARIMA modeling is the requirement of some statistical experience from the researcher. However, these 20

ACCEPTED MANUSCRIPT models are implemented in most standard statistical software packages. The challenging part is the evaluation of the collected data, examining the trends and peaks in the data, and choosing

In order to satisfy the requirement of data for ARIMA modeling, interpolation is

RI



PT

appropriate interpolation scheme and prediction length. During this endeavor, we found that:

SC

necessary in many cases. Finer interpolation will provide improved accuracy, which however may also introduce additional parameters for ARIMA models. Although differencing might be necessary to convert data to make it stationary,

NU



MA

we found that over differencing would introduce additional error. We suggest minimizing the number of times that differencing is applied when data are weakly

The acceptable predictive length can be varied from site to site, which is highly

TE



D

stationary.

AC CE P

dependent on the heterogeneity of the site and the uncertainty of the data.

Acknowledgements

We thank Dr. Anne M. Parkhurst at the University of Nebraska-Lincoln, Drs. Linda M. Abriola and Kurt D. Pennell at Tufts University for their helpful suggestions and discussions. This research was supported by the National Science Foundation Award No. CBET-0854136 and CBET-1133502

21

ACCEPTED MANUSCRIPT References

PT

Bai, C. M., K. M. Eskridge, et al. (2013). "Analysis of the fate and transport of nC(60) nanoparticles in the subsurface using response surface methodology." Journal of Contaminant Hydrology 152: 60-69.

RI

Bai, C. M. and Y. S. Li (2012). "Modeling the transport and retention of nC(60) nanoparticles in the subsurface under different release scenarios." Journal of Contaminant Hydrology 136: 43-55.

SC

Benn, T., B. Cavanagh, et al. (2010). "The Release of Nanosilver from Consumer Products Used in the Home." Journal of Environmental Quality 39(6): 1875-1882.

NU

Benn, T. M. and P. Westerhoff (2008). "Nanoparticle silver released into water from commercially available sock fabrics." Environmental Science & Technology 42(11): 4133-4139.

MA

Bennett, P., F. He, et al. (2010). "In situ testing of metallic iron nanoparticle mobility and reactivity in a shallow granular aquifer." Journal of Contaminant Hydrology 116(1-4): 35-46.

TE

D

Christ, J. A., L. D. Lemke, et al. (2005). "Comparison of two-dimensional and three-dimensional simulations of dense nonaqueous phase liquids (DNAPLs): Migration and entrapment in a nonuniform permeability field." Water Resources Research 41(1). Cullen, E., D. M. O'Carroll, et al. (2010). "Simulation of the subsurface mobility of carbon nanoparticles at the field scale." Advances in Water Resources 33(4): 361-371.

AC CE P

Entink, R. H. K., W. Fransman, et al. (2011). "How to statistically analyze nano exposure measurement results: using an ARIMA time series approach." Journal of Nanoparticle Research 13(12): 6991-7004. Fetter, C. W. (2008). Contaminant Hydrogeology, Waveland Pr Inc. Freyberg, D. L. (1986). "A Natural Gradient Experiment on Solute Transport in a Sand Aquifer .2. Spatial Moments and the Advection and Dispersion of Nonreactive Tracers." Water Resources Research 22(13): 2031-2046. George E. P. Box, G. M. J., Gregory C. Reinsel (1976). Time series analysis, forecasting and control, Holden Day. Geranio, L., M. Heuberger, et al. (2009). "The Behavior of Silver Nanotextiles during Washing." Environmental Science & Technology 43(21): 8113-8118. Gwo, J. P., P. M. Jardine, et al. (2005). "Modeling field-scale multiple tracer injection at a lowlevel waste disposal site in fractured rocks: Effect of multiscale heterogeneity and source term uncertainty on conceptual understanding of mass transfer processes." Journal of Contaminant Hydrology 77(1-2): 91-118. Hameed, T., M. A. Marino, et al. (1996). "Time series modeling of channel transmission losses." Agricultural Water Management 29(3): 283-298. Jia, J. S., J. Z. Zhao, et al. (2010). "Ecological footprint simulation and prediction by ARIMA model-A case study in Henan Province of China." Ecological Indicators 10(2): 538-544. 22

ACCEPTED MANUSCRIPT Kaegi, R., A. Ulrich, et al. (2008). "Synthetic TiO2 nanoparticle emission from exterior facades into the aquatic environment." Environmental Pollution 156(2): 233-239.

PT

Krol, M. M., A. J. Oleniuk, et al. (2013). "A Field-Validated Model for In Situ Transport of Polymer-Stabilized nZVI and Implications for Subsurface Injection." Environmental Science & Technology 47(13): 7332-7340.

RI

Kumar, K. and V. K. Jain (1999). "Autoregressive integrated moving averages (ARIMA) modelling of a traffic noise time series." Applied Acoustics 58(3): 283-294.

SC

Lemke, L. D., L. M. Abriola, et al. (2004). "Dense nonaqueous phase liquid (DNAPL) source zone characterization: Influence of hydraulic property correlation on predictions of DNAPL infiltration and entrapment." Water Resources Research 40(1).

NU

Lemke, L. D., L. M. Abriola, et al. (2004). "Influence of hydraulic property correlation on predicted dense nonaqueous phase liquid source zone architecture, mass recovery and contaminant flux." Water Resources Research 40(12).

MA

Li, Y., Y. Wang, et al. (2008). "Investigation of the Transport and Deposition of Fullerene (C60) Nanoparticles in Quartz Sands under Varying Flow Conditions." Environmental Science & Technology 42(19): 7174-7180.

D

Ljung, G. M. and G. E. P. Box (1978). "Measure of Lack of Fit in Time-Series Models." Biometrika 65(2): 297-303.

TE

LuxResearch (2009) "Nanomaterials State of the Market Q1 2009: Cleantech's Dollar Investments, Penny Returns."

AC CE P

Mackay, D. M., D. L. Freyberg, et al. (1986). "A Natural Gradient Experiment on Solute Transport in a Sand Aquifer .1. Approach and Overview of Plume Movement." Water Resources Research 22(13): 2017-2029. Mann, J. K., J. R. Balmes, et al. (2010). "Short-Term Effects of Air Pollution on Wheeze in Asthmatic Children in Fresno, California." Environmental Health Perspectives 118(10): 1497-1502. Mauter, M. S. and M. Elimelech (2008). "Environmental applications of carbon-based nanomaterials." Environmental Science & Technology 42(16): 5843-5859. Petosa, A. R., D. P. Jaisi, et al. (2010). "Aggregation and Deposition of Engineered Nanomaterials in Aquatic Environments: Role of Physicochemical Interactions." Environmental Science & Technology 44(17): 6532-6549. Phelan, T. J., L. D. Lemke, et al. (2004). "Influence of textural and wettability variations on predictions of DNAPL persistence and plume development in saturated porous media." Advances in Water Resources 27(4): 411-427. Ramsburg, C. A., L. M. Abriola, et al. (2004). "Stimulated microbial reductive dechlorination following surfactant treatment at the Bachman Road site." Environmental Science & Technology 38(22): 5902-5914. Sharma, P., A. Chandra, et al. (2009). "Forecasts using Box-Jenkins models for the ambient air quality data of Delhi City." Environmental Monitoring and Assessment 157(1-4): 105112. 23

ACCEPTED MANUSCRIPT Shumway, R. H. (1988). Applied Statistical Time Series Analysis, Prentice-Hall, Englewood Cliffs, NJ. Shumway, R. H. and D. S. Stoffer (2010). Time Series Analysis and Its Applications: With R Examples, Third Edition, Springer.

PT

Team, R. D. C. (2011). R: a language and environment for statistical computing. R Foundation for Statistical Computing: Vienna. http://www.r-project.org/ "Nanotechnology Consumer Product Inventory. http://www.nanotechproject.org/inventories/consumer.

RI

WWICS.

."

2012,

from

SC

Yao, K. M., M. M. Habibian, et al. (1971). "Water and Waste Water Filtration - Concepts and Applications." Environmental Science & Technology 5(11): 1105-1112.

NU

Zhang, Y., Y. S. Chen, et al. (2008). "Stability of commercial metal oxide nanoparticles in water." Water Research 42(8-9): 2204-2212.

AC CE P

TE

D

MA

Zheng, C. and P. Wang (1999). MT3DMS: A Modular Three-Dimensional Multispecies Model for Simulation of Advection, Dispersion and Chemical Reactions of Contaminants in Groundwater Systems; Documentation and User's Guide. Vicksburg, Mississippi.

24

ACCEPTED MANUSCRIPT Table1. ARIMA models based on various interpolation intervals for the Borden Site.

10

(1,1,0)

15

(1,1,0)

µ = 0.8205 ± 0.0243 φ1= 0.6244 ± 0.12581 µ = 1.2257 ± 0.0431 φ1= 0.6386 ± 0.1537

PT

(2,1,0)

Model MSE 8.66E-04

RI

5

Parameters µ = 0.4088 ± 0.0134 φ1= 0.468 ± 0.1021 φ2= 0.3033 ± 0.3033

3.75E-03

SC

ARIMA

AC CE P

TE

D

MA

NU

Interpolation Intervals (day)

25

7.61E-03

Prediction MSE 6.71E-02

2.10E-02 1.39E-01

ACCEPTED MANUSCRIPT

1.5

(0,1,2)

2

(0,1,2)

SC

(0,1,4)

Model MSE

NU

1

Parameters µ = 0.0012 ± 0.0002 θ1= 0.6447 ± 0.1035 θ2= 0.1282 ± 0.1148 θ3=-0.4649 ± 0.1301 θ4= -0.4723 ± 0.0963 µ = 0.0017 ± 0.0004 θ1= 0.0481 ± 0.1237 θ2= -0.499 ± 0.1252 µ = 0.0023 ± 0.0005 θ1= 0.2437 ± 0.1358 θ2= -0.6378 ± 0.1302

MA

ARIMA

AC CE P

TE

D

Interpolation Intervals (day)

RI

PT

Table2. ARIMA models based on various interpolation intervals for the Tennessee Site.

26

Prediction MSE

5.24E-06

8.09E-05

2.17E-05

1.29E-04

1.88E-05

1.28E-04

ACCEPTED MANUSCRIPT

PT

Table 3. ARIMA models for nanoparticles transport

Parameters

Model MSE

nC60 C front

(0,1,0)

µ = 0.0191 ± 0.0026

8.95E-04

2.01E-02

-531.12

nC60 S front

(1,1,0)

4.61E-04

7.43E-03

-614.05

nZVI

(2,1,0)

µ = 0.0162 ± 0.0025 φ1= 0.2410 ± 0.0856 µ = -0.1316 ± 0.0377 φ1= 1.2323 ± 0.1139 φ2 = -0.3154 ± 0.1141

7.98E-4

1.18

-277.53

AC CE P

TE

D

MA

NU

SC

RI

ARIMA

Prediction MSE

27

AIC

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

AC CE P

Figure 1. The comparison of the observed and the ARIMA model predicted plum center of mass at the Borden Site

28

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

AC CE P

TE

Figure 2. Comparison of the observed and the ARIMA model predicted concentration change in an observation well in the Tennessee Site. The red line represents the prediction ending at day 100.

29

ACCEPTED MANUSCRIPT

(A)

(C) 1st-order difference of Bromide

PT

RI

0.10

SC

0.05

200

300

-0.10

MA

100

-0.05

NU

0.00

BromideDiff2

0.45 0.40 0.35

BromideDiff1

0.50

2nd-order difference of Bromide

400

100

TE

(B)

2nd-difference autocorrelation

0.0 0.2 0.4 0.6 0.8 1.0

ACF

AC CE P

1.0 0.8 0.6 0.4

400

-0.4

-0.2

0.0

0.2

300

(D)

1st-difference autocorrelation

ACF

200

time(day)

D

time(day)

0

2

4

6

8

10

0

Lag

2

4

6

8

10

Lag

Figure 3. First-order differenced time series and its ACF- PACF (plots A and B); Secondorder differenced time series and its ACF- PACF (plots C and D)

30

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

AC CE P

Figure 4. Comparison of the performance of ARIMA(2,1,0) and ARIMA(0,2,1) for predicting center of mass of a tracer plume at the Borden site

31

ACCEPTED MANUSCRIPT (B)

TE

D

MA

NU

SC

RI

PT

(A)

AC CE P

Figure 5. An example (A) aqueous-phase and (B) attached-phase nC60 concentration distribution at the Bachman Road site after 48 hours continuous release of nC60. The following parameters were used: α = 0.001, dp = 100 nm, i = 0.01, and C0 = 1 mg/L.

32

ACCEPTED MANUSCRIPT

MA

NU

SC

RI

PT

(A)

AC CE P

TE

D

(B)

Figure 6. Comparison of the observed and the ARIMA model predicted (A) C front and (B) S front for nanoparticle transport at a field site.

33

AC CE P

TE

D

MA

NU

SC

RI

PT

ACCEPTED MANUSCRIPT

Figure 7. Comparison of the observed and the ARIMA model predicted CMC-nZVI concentration in the PPT-1 of a field site.

34

ACCEPTED MANUSCRIPT Highlights Time series analysis of tracer and nanoparticle transport in the field.



ARIMA implementation issues for predicting transport in the subsurface.



The first effort of applying ARIMA modeling for transport in the subsurface.

AC CE P

TE

D

MA

NU

SC

RI

PT



35

Engineered Nanomaterials in the Environment.

Applications of time-series analysis to mood fluctuations in bipolar disorder to promote treatment innovation: a case series.

Applications of Computational Intelligence in Time Series.

Analysis of conservative tracer measurement results using the Frechet distribution at planted horizontal subsurface flow constructed wetlands filled with coarse gravel and showing the effect of clogging processes.

Biological Responses to Engineered Nanomaterials: Needs for the Next Decade.

The current state of engineered nanomaterials in consumer goods and waste streams: the need to develop nanoproperty-quantifiable sensors for monitoring engineered nanomaterials.

Introduction to the special issue: nanomaterials-- toxicology and medical applications.

Subsurface transport behavior of micro-nano bubbles and potential applications for groundwater remediation.

Contribution of engineered nanomaterials physicochemical properties to mast cell degranulation.

Achieving Control of Occupational Exposures to Engineered Nanomaterials.

Nanomaterials for Biosensing Applications.

Meta-analysis of transcriptomic responses as a means to identify pulmonary disease outcomes for engineered nanomaterials.

Magnetic Nanomaterials and Their Applications.

Pharmacological potential of bioactive engineered nanomaterials.

Measuring isotropic subsurface light transport.

Overview of Risk Management for Engineered Nanomaterials.

Exposure to Engineered Nanomaterials: Impact on DNA Repair Pathways.

Analytical applications of nanomaterials in electrogenerated chemiluminescence.

Modeling microorganism transport and survival in the subsurface.

Voltage gated ion and molecule transport in engineered nanochannels: theory, fabrication and applications.

Species sensitivity distributions for engineered nanomaterials.

Time-series analysis--spectral analysis and the search for cycles.

Biomarkers of susceptibility: State of the art and implications for occupational exposure to engineered nanomaterials.

Metal halide perovskite nanomaterials: synthesis and applications.