Time Series Analysis of Contaminant Transport in the Subsurface: Applications to Conservative Tracer and Engineered Nanomaterials Chunmei Bai, Yusong Li PII: DOI: Reference:
S0169-7722(14)00083-7 doi: 10.1016/j.jconhyd.2014.06.002 CONHYD 3009
To appear in:
Journal of Contaminant Hydrology
Received date: Revised date: Accepted date:
30 August 2013 29 May 2014 3 June 2014
Please cite this article as: Bai, Chunmei, Li, Yusong, Time Series Analysis of Contaminant Transport in the Subsurface: Applications to Conservative Tracer and Engineered Nanomaterials, Journal of Contaminant Hydrology (2014), doi: 10.1016/j.jconhyd.2014.06.002
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Time Series Analysis of Contaminant Transport in the Subsurface:
SC
RI
PT
Applications to Conservative Tracer and Engineered Nanomaterials
MA
NU
Chunmei Bai and Yusong Li*
D
Department of Civil Engineering, University of Nebraska – Lincoln
AC CE P
TE
362R Whittier Building, 2200 Vine Street, Lincoln, NE, 68583
Submitted to:
Journal of Contaminant Hydrology August 30th, 2013 Revision: May 30th , 2014
* Corresponding
author phone: (402) 472-5972; fax: (402) 472-8934; e-mail:
[email protected].
1
ACCEPTED MANUSCRIPT ABSTRACT Accurately predicting the transport of contaminants in the field is subject to multiple
PT
sources of uncertainty due to the variability of geological settings, the complexity of field
RI
measurements, and the scarcity of data. Such uncertainties can be amplified when modeling
SC
some emerging contaminants, such as engineered nanomaterials, when a fundamental understanding of their fate and transport is lacking. Typical field work includes collecting
NU
concentration at a certain location for an extended period of time, or measuring the movement of
MA
plume for an extended period time, which would result in a time series of observation data. This work presents an effort to evaluate the possibility of applying time series analysis, particularly,
D
Autoregressive Integrated Moving Average (ARIMA) models, to forecast contaminant transport
TE
and distribution in the subsurface environment. ARIMA modeling was first assessed in terms of its capability to forecast tracer transport at two field sites, which had different levels of
AC CE P
heterogeneity. After that, this study evaluated the applicability of ARIMA modeling to predict the transport of engineered nanomaterials at field sites, including field measured data of nanoscale zero valent iron and (nZVI) and numerically generated data for the transport of nanofullerene aggregates (nC60). This proof-of-concept effort demonstrates the possibility of applying ARIMA to predict the contaminant transport in the subsurface environment. Like many other statistical models, ARIMA modeling is only descriptive and not explanatory. The limitation and the challenge associated with applying ARIMA modeling to contaminant transport in the subsurface are also discussed. Keywords: Transport; Subsurface; MT3DMS; Time series analysis, Autoregressive Integrated Moving Average; Engineered nanomaterials
2
ACCEPTED MANUSCRIPT 1. Introduction Although the theory of modeling contaminant transport in the subsurface is well
PT
developed and validated in laboratory settings (Fetter 2008), accurately predicting the transport
RI
of contaminants in the field is still a challenge. Variability of geological settings, such as
SC
multiscale heterogeneity of hydraulic conductivity fields and existence of fractured formations, can lead to uncertainties for subsurface characterization (Freyberg 1986; Gwo, Jardine et al.
NU
2005). The scarcity of data due to high cost associated with subsurface characterization often
MA
brings in another level of uncertainty in predictive modeling. Such uncertainties can be amplified when modeling some emerging contaminants, such as engineered nanomaterials, when a
D
fundamental understanding of their fate and transport is lacking.
TE
Due to large production and the wide application of engineered nanomaterials
AC CE P
(WWICS ; LuxResearch 2009), these novel materials may enter the environment, including the subsurface, via various pathways (Benn and Westerhoff 2008; Kaegi, Ulrich et al. 2008; Zhang, Chen et al. 2008; Geranio, Heuberger et al. 2009; Benn, Cavanagh et al. 2010); such as direct leaking from underground waste sites, reuse of wastewater, and agricultural use of biosolids containing engineered nanomaterials (Benn and Westerhoff 2008; Kaegi, Ulrich et al. 2008; Zhang, Chen et al. 2008; Geranio, Heuberger et al. 2009; Benn, Cavanagh et al. 2010). Accurately predicting the transport of engineered nanomaterials in the subsurface will provide critical information for risk assessment and policy development for these emerging contaminants. Predicting the fate and transport of engineered nanomaterials in a natural subsurface environment is complicated by the unique properties of nanomaterials and their interaction with aquifer materials.
3
ACCEPTED MANUSCRIPT The majority of the available nanomaterial transport models (Petosa, Jaisi et al. 2010) focused on simulation of one-dimensional column experiments by modifying the classic
PT
filtration theory (CFT) (Yao, Habibian et al. 1971). While these models can provide valuable
RI
insight on the mechanisms of nanoparticle transport in well-controlled experimental conditions, field scale uncertainties were typically not considered. Based upon column scale mechanistic
SC
understanding, several multiple dimensional numerical models (Cullen, O'Carroll et al. 2010; Bai
NU
and Li 2012; Bai, Eskridge et al. 2013)were developed to simulate nanomaterial transport at the field scale. Random hydraulic conductivity fields were used for transport simulations to
MA
incorporate field scale uncertainty. None of these models were validated at field scale. Very recently, a field validated model (Krol, Oleniuk et al. 2013) was developed to simulate in situ
TE
D
transport of nanoscale zerovalent iron (nZVI). The success of such a field scale model is dependent on a careful characterization of the field site, detailed laboratory study on the
AC CE P
mechanisms involved, and also extensive field scale tracer tests. Should the release of nanomaterials occur in another location where detailed characterization was lacking, it be difficult to achieve timely prediction of contamination if solely relying on such complicated models.
In this work, we evaluated an alternative statistically based time series analysis approach, namely autoregressive integrated moving average (ARIMA) modeling (George E. P. Box 1976), to predict contaminant transport in the subsurface. A time series refers to a set of measurements collected sequentially over time, where the observation points are not independent of each other but show significant autocorrelation between subsequent observations. These autocorrelations prohibit using traditional statistical methods (e.g. regression) to fit the data, because those methods assume that the observation points are independent. Transport of contaminants,
4
ACCEPTED MANUSCRIPT including nanomaterials, in the subsurface is a time dependent process. Typical field work includes collecting concentration at a certain location for an extended period of time, or
PT
measuring the movement of plume for an extended period time, which would result in a time
RI
series of observation data. For example, the contaminant concentration measured in an observation well on a certain day is correlated to the concentration measured a day before in the
SC
same well.
NU
Presumably, ARIMA models should be able to applicable to predict time series data for
MA
contaminant transport in the subsurface, as long as a time series of observation data are available. While we fully understand that ARIMA models will not be able to provide mechanistic
D
explanations on the contaminant transport mechanisms, it could still be a valuable tool for a
TE
quick prediction of contaminant transport in the near future based on historical observation data. There is currently no such study available, although ARIMA models have been previously used
AC CE P
in various areas in environmental studies, including air pollution (Sharma, Chandra et al. 2009; Mann, Balmes et al. 2010), noise forecasting (Kumar and Jain 1999), ecology (Jia, Zhao et al. 2010), and hydrology (Hameed, Marino et al. 1996). In this work, ARIMA modeling was first assessed in terms of its capability to forecast tracer transport at two field sites, which had different levels of heterogeneity. After that, this study evaluates the applicability of ARIMA modeling to predict the transport of engineered nanomaterials at the field site, where the physics of contaminant transport was more complicated than for conservative tracers. Two cases studies, one based on numerically generated data for the transport of nano-fullerene aggregates (nC60) (Bai and Li 2012), another based on the field data of nZVI transport (Krol, Oleniuk et al. 2013) were studied. Finally, the manuscript concludes
5
ACCEPTED MANUSCRIPT with a discussion and recommendations for applying ARIMA modeling for contaminant
PT
transport in the subsurface.
RI
2. Description of the ARIMA Models
SC
2.1 ARIMA Theory
ARIMA models consist of two parts: an autoregressive (AR) model and a moving
NU
average (MA) model. In addition, the middle letter “I” in ARIMA, means integrated and
MA
represents the order of differencing.
Let yt, t=1, … T , denote a sequence of measurements of a variable y at subsequent and
D
equally spaced times t. If yt is a nonstationary time series, differencing, or the derivative of the
TE
series over time, will need to be applied to remove trends from the data: (1)
AC CE P
,
where, dt is typically chosen as a unit time step. Differencing can be continued until the series processes a constant mean and variance. The AR and MA model can then be applied to the stationary data series xt.
The AR model represents the regression of xt on time lags of itself. An AR model with an order of p , i.e. AR (p), can be expressed as a linear combination of its previous values: ,
(2)
where Ф1, ... , Ф p are the coefficients of AR (p), µ is the mean of xt, and wt is the residual error (or white noise) at time t, which follows a normal distribution with zero mean and variance of σ2. Ф values with absolute values closer to 1 would indicate a higher autocorrelation in the x series. 6
ACCEPTED MANUSCRIPT The MA part of the ARIMA model denotes the structure on the residual term. A MA model with an order of q, i.e. MA (q), can be expressed as a linear combination of white noises of
,
(3)
RI
PT
its previous times:
SC
where, θ1, …, θq are the coefficients of the MA(q) parameters, θ values with absolute values
NU
closer to 1 would indicate a higher autocorrelation of the residual term. An ARIMA model is generally presented as ARIMA(p,d,q), in which p and q represent
MA
the orders of AR and MA models, respectively, and d denotes the order of differencing. The
D
general ARIMA(p,d,q) can be expressed as:
TE
(4)
;
AC CE P
Here, B is the backshift operator defined as
the AR (p) operator;
is
is the MA (q) operator.
2.2 ARIMA Model Development Procedure Three key steps are involved in the development of ARIMA models (George E. P. Box 1976), including model identification, parameter estimation, and diagnostic checking. The model identification step is the process to find values of p, d, and q for an ARIMA (p, d, q) model. First, the stationarity of the time series will be checked. A data series is considered as stationary, if the key parameters such as the mean and the variance are constant over time. If the time series is not stationary, it is then transformed to be stationary either by differencing or by some other means of transformation (e.g. logarithmic transformation). The value of d is 7
ACCEPTED MANUSCRIPT determined as the number of differencing steps necessary to make the data series stationary. After obtaining a stationary data series, the orders p and q of the AR and MA models will be
PT
determined. Here, plots of the autocorrelation function (ACF) and partial autocorrelation
RI
function (PACF) are typically used to identify the values of p and q. The ACF at time lag h represents the correlation of a variable with itself at different time points (e.g. correlation
SC
between xt and xt+h). The PACF at time lag h represents the remaining autocorrelation between xt
NU
and xt+h after the linear dependency through xt+1 to xt+h-1 has been removed. The value of p is determined by the number of significant spikes in the PACF plot, and the value of q is
MA
determined by looking at after how many lags the ACF plot cuts off.
D
The second step of ARIMA model development is parameter estimation, which is to , as defined in equation
TE
estimate the values of µ and the coefficients in operators
AC CE P
(4). A nonlinear least squares scheme, which minimizes the forecasting errors, is typically used for parameter estimation. Sometimes more than one ARIMA model, i.e. ARIMA models with different values of p, q, and d, may seem appropriate. In such situations, the Akaike Information Criteria (AIC) (Shumway 1988), can be used to determine the best model. The AIC is a widely used model selection criteria, which selects the models by rewarding goodness of fit while imposing a penalty for each additional model parameter. In addition to AIC, the residual variances of different models can be used for the model selection. The last step of ARIMA model development is diagnostic checking, which is to check the adequacy of the model. Particularly, this step is to check whether the residuals follow a normal distribution with constant variance. The ACF and PACF of the residuals from a satisfactory ARIMA model should not be significantly different from zero at any lag. In addition, the Ljung-
8
ACCEPTED MANUSCRIPT Box-Pierce Q-statistic test (Ljung and Box 1978) can be applied to further confirm the independence among the residuals.
PT
In this study, the statistical environment R (R Development Core Team 2012) (Team
RI
2011)were used to conduct ARIMA time series modeling.
SC
3. ARIMA Model for Predicting Tracer Transport
NU
3.1 Field Tracer Data
The first set of field tracer data was based on a long term large scale natural gradient field
MA
experiment conducted at the aquifer that underlies an abandoned sand quarry at Canadian Forces Base Borden in Ontario, Canada (Mackay, Freyberg et al. 1986). In the experiment, 12 cubic
D
meters of solution containing two inorganic nonreactive tracers, i.e. bromide and chloride, were
TE
slowly injected into the aquifer at a rate of about 0.8m3/hr. After injection, the plume of tracer
AC CE P
moved under the action of natural gradient at the site, and was monitored using a dense network of multilevel point samplers. Based on nearly 10,000 samples from within and around the tracer plume, the locations of center of mass of the plume was estimated for three years by Freyberg (Freyberg 1986) (Figure 1). This field site is herein called the Borden site. The second set of field tracer data was based on tracer release and monitoring experiments at a low-level waste disposal site in eastern Tennessee (Gwo, Jardine et al. 2005). The aquifer has a fractured bedrock formation with a preferred zone underlain by a bedrock zone. The conservative tracer bromide was released through an upstream source well for 180 day, and its concentration was monitored at a downstream observation well with a distance of 16.8 m away from the source well. The observed bromide relative concentration breakthrough curve at
9
ACCEPTED MANUSCRIPT one of the observation wells was plotted in Figure 2. This field site is herein called Tennessee site.
PT
3.2 ARIMA Prediction of Field Tracer Transport
RI
The historical data at the Borden Site included eleven unequally spaced points for the
SC
locations of the center of the plume from day 9 to day 447. Interpolations were applied to obtain a time series of 87 data with equal space of 5 days, which were used to predict the location of the
NU
center of mass at day 647 of the experiments. Application of the ARIMA modeling procedure
MA
identified an ARIMA (2, 1, 0) model. Model parameters with p-value less than 0.05 are provided in Table 1. Here, the value of q equals to 0, indicating that the model is not dependent on the
D
error structure from previous time steps. In other words, noise from a previous step does not
TE
influence the location of the center of mass in the next step, which could be due to the
AC CE P
homogeneous nature of the site.
Figure 1 plots the original experimental data and the ARIMA model simulated results. The ARIMA model can provide very accurate simulation of the historical data, while successfully predicting the last two experimental data. The mean squared error (MSE) values for historical data and prediction are 8.66e-4 and 6.72e-2, respectively. The trend of the original data indicates a remarkably linear horizontal trajectory of tracer plume over the duration of the experiments, indicating a largely homogeneous aquifer on this site. In an early publication (Freyberg 1986), a straight line which minimizes the sum of squared orthogonal distances between the data points and the lines was used to fit the data, which gave a deviation of 1.56 m at the last sampling point (day 647). In comparison, the prediction by the ARIMA model presented here is only 0.4 m different from the measurement. Clearly, the ARIMA model was able to capture a small change of slope of the original data set during the time period, which may be due 10
ACCEPTED MANUSCRIPT to some uncertainties associated with groundwater flow field changes or anisotropic properties of the aquifer.
PT
At the Tennessee site (Figure 2), 38 measurements were available for tracer concentration
RI
in an observation well over a period of 106 days. The DigitizeIt software (Bormisoft, Germany)
SC
was used to obtain the value of each measurement from a previously published paper (Gwo, Jardine et al. 2005). Interpolations were applied to obtain a time series with equal space of 1 day.
NU
83 data points in the period from day 8 to day 90 were used as historical data, to predict tracer
MA
concentration until day 100. Application of the ARIMA modeling procedure identified an ARIMA (0, 1, 4) model. Model parameters with p-value less than 0.05 are provided in Table 2.
D
Here, a p value of 0 and a d value of 1together indicate that the data in the series are not
TE
autocorrelated after a first order differencing. A q value of 4 indicates that error structure from the previous 4 time steps will influence the prediction at a certain time, which is consistent with
AC CE P
the heterogeneous nature of the site. According to previous work (Gwo, Jardine et al. 2005), the aquifer on this site presents multiscale heterogeneity due to the presence of fractures in porous medium.
Figure 2 provides the comparison of originally measured data series and the model simulated results. The ARIMA model can accurately simulate the rising concentration, including some anomalous lower concentration points. The MSE value for historical data was 5.24E-6. Up until day 100 (red line in Figure 3), the model can reasonably predict the tracer concentration, keeping all the interpolated and observed data points within a 95% prediction confidence interval with a prediction MSE of 8.09E-05. The prediction, however, will be out of the 95% confidence interval when the prediction period is extended. Further discussion on the prediction length of ARIMA will be provided below. 11
ACCEPTED MANUSCRIPT 3.3 Implementation Issues 3.3.1 Interpolation interval
PT
The first challenge of developing ARIMA models for field tracer transport in the
RI
subsurface is to satisfy the requirement of data. To successfully conduct time series analysis, it is
SC
typically suggested to start with a series of at least 50 equally spaced data (George E. P. Box 1976), which can be difficult to obtain for monitoring contaminant transport in the subsurface
NU
field. At the Borden site, eleven data points were available for the locations of the center of the
MA
plume during two years of monitoring. At the Tennessee site, 38 points were available for tracer concentration over a period of 106 days. More importantly, field data are typically not equally
D
spaced, so that ARIMA modeling cannot be directly applied. In this work, relatively dense
TE
interpolations were conducted, i.e. a 5 day interval for the Borden site, and a 1 day interval for
AC CE P
the Tennessee Site. To explore how the interpolation interval influences the model performance, we examined different interpolation intervals for both the Borden site and the Tennessee site. Tables 1 and 2 provide the identified ARIMA models based on various time intervals for both the Borden site and the Tennessee site, as well as the MSE based on the historical data and prediction. Interestingly, the different time intervals lead to different ARIMA models, although all the cases explored here were able to provide very good predictions, as demonstrated by the low MSE values. For the Borden site, ARIMA models identified for intervals of 10 days and 15 days require a p value of 1, but the ARIMA model for an interval of 5 days requires a p value of 2. The smaller time interval size requires higher p values because the autoregression process is dependent on not only on the previous time step, but also on another prior time step. All three ARIMA models provided very small MSE for history data and prediction data. Among them the 12
ACCEPTED MANUSCRIPT model based on an interval of 5 days provided the lowest MSE to simulate historical data, but the model based on an interval of 10 days provided the lowest MSE for the prediction.
PT
For the Tennessee site, ARIMA models identified for intervals of 1.5 days and 2 days
RI
require a q value of 2, but the ARIMA model for an interval of 1 day requires a q value of 4.
SC
Higher q values indicate that error structures from previous time steps will influence the model, which is consistent with the heterogeneity nature of the site (Gwo, Jardine et al. 2005). All three
NU
ARIMA models provided very small MSE for history data and prediction. Among them, the
MA
model based on an interval of 1 day provided the lowest MSE values for both history data and prediction.
D
Findings from this part of the work indicate that a smaller time interval generally may
TE
introduce additional parameters for ARIMA models. However, models based on smaller time
AC CE P
intervals indeed will provide improved accuracy for modeling. 3.3.2 Order of differencing
One of the fundamental assumptions of ARIMA modeling is that the data series can be transformed to stationary. If the time series is not stationary, differencing is typically conducted to make the data series stationary. During the ARIMA modeling of field tracer tests, we found that the stationarity process is one of the rather subjective steps, which is explained below using tracer transport data from the Borden site. Typically, if the mean and the variance of the time sequence are not consistent over time, it is a sign of nonstationarity. Figure 3(A) provides a visual check on the consistency of the Borden site tracer data over time after first order differencing. The data is generally consistent, but with a visible slightly decreasing trend. The mean was estimated at 0.436 for up to day 117,
13
ACCEPTED MANUSCRIPT while it decreased to 0.414 for up to day 417. Another indicator of stationarity is based on the autocorrelation. When the sample autocorrelation is relatively high at the beginning (>0.8) and
PT
decays very slowly to zero (e.g. over more than ten time lags), it is an indicator of nonstationarity
RI
(Entink, Fransman et al. 2011). Figure 3(B) provides a plot of the autocorrelation of the data after a first order differencing. As shown here, the ACF gradually decreased from 0.6 to about
NU
criteria of stationarity, with a weak trend over time.
SC
0.1 after ten time lags. Based on Figure 3 (A) and (B), it seems that the data barely meets the
MA
Because of that, a second order differencing was conducted, and Figure 3 (C) and (D) plot the trend and autocorrelation of data after second order differencing. Data after second order
D
differencing showed much stronger stationary features than after only one differencing, indicated
TE
by a more consistent mean and faster decreasing of ACF over time. As such, an ARIMA (0, 2, 1) model was identified based on the second order differencing. Figure 4 provides a comparison
AC CE P
between ARIMA (2, 1, 0) and ARIMA (0, 2, 1) models for prediction. Surprisingly, ARIMA (0, 2, 1) provided much higher MSE and wider 95% confidence interval than ARIMA (2, 1, 0), although ARIMA (0, 2, 1) was based on a much more stationary data set. For predicting tracer transport in the subsurface, this finding indicates that higher data stationarity does not necessarily mean the model will perform better. Additional differencing actually may introduce artificial errors and higher values of MSE for the model or forecasting. Because the criteria for stationarity is somewhat subjective, we suggest minimizing the number of differencing to reduce the artificial errors. 3.3.3 Prediction length Traditionally, ARIMA modeling is recommended for short-term forecasting (e.g. forecasting 5 points based on every 100 historical data points) (Shumway and Stoffer 2010). 14
ACCEPTED MANUSCRIPT Actual engineering applications, however, typically require a longer prediction period. Here, we evaluate the possible length that a model can provide accurate predictions based on the field
PT
tracer sites.
RI
In the ARIMA model for the Borden site (Figure 2), the ratio of the prediction length to
SC
the history data length is about 46.5%, which is remarkably higher than the typically suggested prediction length for an ARIMA model (George E. P. Box 1976). During this prediction length,
NU
all the observation data were within the 95% confidence interval. Due to the limitation of data
MA
availability, it is not possible to check whether an even longer prediction length is reasonable for the Borden site.
D
For the Tennessee site, the ratio of the prediction length to the history data length is 12%
TE
for predicting from day 91 to day 100 (the red line in Figure 2). Up until this range, all the
AC CE P
observation data are within the 95% confidence interval of prediction. We applied the ARIMA model to forecast a longer time period. As show in Figure 2, two observation points will be outside of the 95% confidence interval of prediction if the prediction period is extended to day 109. Therefore, the model is not capable of predicting as long as the Borden site. We attribute the reduced predictive performance of the model at the Tennessee site to the heterogeneity nature of the site. The higher heterogeneity results in higher uncertainty, so that the prediction length of the ARIMA model was relatively smaller at the Tennessee site. This finding indicates that the acceptable prediction length of ARIMA models is very site specific for contaminant transport in the subsurface, which is highly dependent on the heterogeneity of the site and the uncertainty of the data.
15
ACCEPTED MANUSCRIPT 4. ARIMA Model for Predicting Nanomaterial Transport 4.1 nC60 transport
PT
Due to the limited availability of actual field site data for nanoparticle transport, this
RI
evaluation was first accomplished based on a hypothetical scenario of injecting nC60 into a field
SC
site. This field site was based on a study site in Oscoda, MI, which was extensively characterized in multiple previous studies (Lemke, Abriola et al. 2004; Lemke, Abriola et al. 2004; Phelan,
NU
Lemke et al. 2004; Ramsburg, Abriola et al. 2004; Christ, Lemke et al. 2005; Cullen, O'Carroll et
MA
al. 2010). The hydraulic conductivity estimates varied from 1 to 48 m/d, with a mean of 16.8 m/d and a lnK variance of 0.29. Although 167 homogenized core subsamples from fourteen aquifer
D
cores were analyzed in the field experiment, the amount of hydraulic conductivity measurements
TE
is significantly below the required resolution for accurate simulation of groundwater flow and
AC CE P
contaminant transport. The hydraulic conductivity distribution in this field site was therefore treated as a random field. A conditional Sequential Gaussian Simulation (SGS) (Lemke, Abriola et al. 2004; Lemke, Abriola et al. 2004) was applied to generate a series of random hydraulic conductivity fields based on the statistics and spatial correlation information from the available measurements. In this work, fifteen realizations of the non-uniform permeability random fields were constructed. An example hydraulic conductivity field is provided in Figure 5. Numerical experiments were conducted to study scenarios of nanoparticle transport at this site. First, a steady state groundwater flow condition was applied through a hydraulic gradient of 0.01 along the x direction. nC60 nanoparticle solution was released through a 1.2 m long well screen located upstream of the site for a period of 48 hours. Initial concentration of nanoparticles was assumed to be 1 mg/L. Nanoparticles were assumed to have a particle size of 100nm and a collision efficiency factor of 0.001. A modified Modular Three-Dimensional 16
ACCEPTED MANUSCRIPT Transport Model for Multispecies (MT3DMS) (Zheng and Wang 1999)was used to simulate nC60 nanoparticle transport at the site (Bai and Li 2012). As detailed in a previous publication
PT
(Bai and Li 2012), transport of nC60 nanoparticles were simulated based on a modified filtration
RI
theory(Li, Wang et al. 2008; Mauter and Elimelech 2008). In this model, transport of nanoparticles in the subsurface was influenced by advection, dispersion and attachment
SC
processes. An effective attachment rate was considered as linearly related to the available soil
NU
surfaces and also was subject to a maximum retention capacity (Li, Wang et al. 2008). Figure 5 provides an example simulated distribution of nC60 nanoparticle in the field site at the end of
MA
simulation (i.e. 48 hours continuous release of nC60 nanoparticle). Based on the simulated concentration distributions at every 20 minutes, two time series were generated for nanoparticle
TE
D
transport, including (1) the x location of aqueous plume far front (C/C0 = 0.001), or C front; (2) the x location of attached phase far front (S = 0.001μg/g), or S front.
AC CE P
As illustrated in Figure 6, 129 equal spaced data points were obtained from the first 43 hours of nanoparticle injection, which were used to predict the locations of C front and S front for the next 5 hours. Figure 6 illustrates C front and S front time series data, which was averaged from fifteen hydraulic conductivity realizations, and the error bars indicate the variances among the realizations. As expected, the locations of C front gradually increase, as the aqueous plume moves forward. Similarly, the locations of S front also gradually increase with a slower rate than the C front. Following the ARIMA modeling procedure, an ARIMA (0,1,0) model and an ARIMA (1,1,0) model were identified for C front and S front, respectively. The model parameters and MSE values are provided in Table 3. Here, a first order differencing was able to remove the time dependent trend for both the C front and S front time series. Because both C front and S front
17
ACCEPTED MANUSCRIPT reflect the locations of the plumes, the first order differencing, or dy/dt, should be dependent on the groundwater velocity, which is in steady state during the modeling period. Therefore, it is not
PT
surprising that a first order differencing is sufficient to remove the time dependent trend.
RI
For modeling C front, no significant spike was found in the ACF plot after a first order
SC
differencing, which means both p and q are equal to zero. These results lead to the conclusion that the first order differenced C front location follows a random walk model (ARIMA (0,1,0))
NU
with a drift of 0.0191 (µ in Table 3). This model has an upward increasing trend, which was
MA
reflected by the positive sign of the drift. The surprisingly simple ARIMA model for predicting nanoparticle C front can be attributed to the nature of the numerical hypothetical scenario, where
D
the uncertain only comes from the permeability random fields. As previously discussed, the
TE
Bachman Road site is a comparably homogeneous field site, so that the error structure originated
nanoparticles.
AC CE P
from the local scale heterogeneity does not have significant impact on the movement of nC60
The ARIMA model for the S front is slightly more complicated than the model for the C front with a p value of 1. As briefly described in the nanoparticle transport model (Bai and Li 2012), the attachment of nanoparticles was considered as a function of the previously attached concentration. Therefore, it is reasonable in the ARIMA model that the movement rate (dy/dt) of the S front is influenced by the rate of one time step ahead, indicating a p value of 1. Figure 6 compares the observed and ARIMA model simulated C front and S front. Please note that simulations were based on averaged values at each time step for 15 realizations of the random field. The error bars in the figure denote the standard deviations of the observation at each time point, due to the variations of the hydraulic conductivity fields. ARIMA models are
18
ACCEPTED MANUSCRIPT able to provide good fits to the historical data; the 95% confidence intervals for the prediction period are within the standard deviation of the observation data.
PT
4.2 nZVI transport in the field
RI
In several remediation applications, nZVI was injected into the subsurface to treat
SC
contaminated groundwater. However, most of these studies have suffered from poor nZVI mobility, and many of them have well clogging issues due to particle aggregation and settling, so
NU
that very limited data are available for nZVI transport in the field. In a recent field test (Bennett,
MA
He et al. 2010), carboxylmethyl cellulose (CMC) stabilized nZVI was delivered into the subsurface. A series of push-pull tests (PPTs) were performed to assess the mobility of CMC-
D
nZVI. The transport of CMC-nZVI in the subsurface was controlled by many factors, such as
TE
aquifer characteristics, injection velocities, and interaction between particles and aquifer
AC CE P
materials. Based on a tremendous amount of site characterization, tracer tests, and well controlled PPT operation, a recent study (Krol, Oleniuk et al. 2013) published a three dimensional finite difference based numerical model that was able to fit CMC-nZVI transport data from several PPTs by treating collision efficiency as a fitting parameter. Here, applicability of ARIMA to predict CMC-nZVI transport in this relatively well controlled field experiment will be evaluated. The observed CMC-nZVI data in the PPT-1 was used for the simulation (Figure 7). The DigitizeIt software (Bormisoft, Germany) was used to extract seven CMC-nZVI concentration observation data from a published paper (Krol, Oleniuk et al. 2013) . Interpolations were applied to obtain 83 equal spaced time series data points based on the measurements from beginning to the second last data point, which is when the volume extracted is about 4 times volume injected. These 83 interpolated data were used as historical data to predict CMC-nZVI concentration for the last data point. Application of the ARIMA modeling procedure identified 19
ACCEPTED MANUSCRIPT an ARIMA (2, 1, 0) model. Model parameters with p-value less than 0.05 are provided in Table 3. Figure 7 plots the original experimental data and the ARIMA model simulated results.
PT
Consistent with ARIMA models for tracer and numerically generated nC 60 data, a first
RI
order differencing was sufficient to eliminate the correlation between data points. In this model,
SC
the value of q is 0, or the model is not dependent on the error structure. Consistent with the Borden site and the Bachman site, a q value of 0 may imply a relatively homogenous aquifer.
NU
According to the discussion in the original study (Krol, Oleniuk et al. 2013), the aquifer in this
MA
site consists of distinct layers without interlayer flows, and the PPT-1 was located in a homogeneous and isotropic sandy layer. Finally, the p value of this ARIMA model is 2, in
D
contrast to a 0 and 1 for the C front and S front of numerically generated nC60 transport data,
TE
which inidcates more complicated physical chemical processes occurred during the transport and deposition of nZVI in the aquifer. According to Figure 7, the ARIMA model can provide very
AC CE P
accurate simulation of the historical data. The last measured point was just within the 95% confidence interval of the prediction. 5. Discussion
While acceptable results were demonstrated from applying ARIMA models to simulate transport of tracers and nanomaterials in the subsurface, it is very important to point out the drawback of such model. Like many other statistical models, ARIMA modeling is only descriptive and not explanatory. Therefore, ARIMA models will not be able to provide mechanistic analysis of the processes controlling contaminant transport in the subsurface. However, ARIMA models could serve as a reasonable tool to quickly predict the transport of contaminant when limited site characterization is available. Another limitation of ARIMA modeling is the requirement of some statistical experience from the researcher. However, these 20
ACCEPTED MANUSCRIPT models are implemented in most standard statistical software packages. The challenging part is the evaluation of the collected data, examining the trends and peaks in the data, and choosing
In order to satisfy the requirement of data for ARIMA modeling, interpolation is
RI
PT
appropriate interpolation scheme and prediction length. During this endeavor, we found that:
SC
necessary in many cases. Finer interpolation will provide improved accuracy, which however may also introduce additional parameters for ARIMA models. Although differencing might be necessary to convert data to make it stationary,
NU
MA
we found that over differencing would introduce additional error. We suggest minimizing the number of times that differencing is applied when data are weakly
The acceptable predictive length can be varied from site to site, which is highly
TE
D
stationary.
AC CE P
dependent on the heterogeneity of the site and the uncertainty of the data.
Acknowledgements
We thank Dr. Anne M. Parkhurst at the University of Nebraska-Lincoln, Drs. Linda M. Abriola and Kurt D. Pennell at Tufts University for their helpful suggestions and discussions. This research was supported by the National Science Foundation Award No. CBET-0854136 and CBET-1133502
21
ACCEPTED MANUSCRIPT References
PT
Bai, C. M., K. M. Eskridge, et al. (2013). "Analysis of the fate and transport of nC(60) nanoparticles in the subsurface using response surface methodology." Journal of Contaminant Hydrology 152: 60-69.
RI
Bai, C. M. and Y. S. Li (2012). "Modeling the transport and retention of nC(60) nanoparticles in the subsurface under different release scenarios." Journal of Contaminant Hydrology 136: 43-55.
SC
Benn, T., B. Cavanagh, et al. (2010). "The Release of Nanosilver from Consumer Products Used in the Home." Journal of Environmental Quality 39(6): 1875-1882.
NU
Benn, T. M. and P. Westerhoff (2008). "Nanoparticle silver released into water from commercially available sock fabrics." Environmental Science & Technology 42(11): 4133-4139.
MA
Bennett, P., F. He, et al. (2010). "In situ testing of metallic iron nanoparticle mobility and reactivity in a shallow granular aquifer." Journal of Contaminant Hydrology 116(1-4): 35-46.
TE
D
Christ, J. A., L. D. Lemke, et al. (2005). "Comparison of two-dimensional and three-dimensional simulations of dense nonaqueous phase liquids (DNAPLs): Migration and entrapment in a nonuniform permeability field." Water Resources Research 41(1). Cullen, E., D. M. O'Carroll, et al. (2010). "Simulation of the subsurface mobility of carbon nanoparticles at the field scale." Advances in Water Resources 33(4): 361-371.
AC CE P
Entink, R. H. K., W. Fransman, et al. (2011). "How to statistically analyze nano exposure measurement results: using an ARIMA time series approach." Journal of Nanoparticle Research 13(12): 6991-7004. Fetter, C. W. (2008). Contaminant Hydrogeology, Waveland Pr Inc. Freyberg, D. L. (1986). "A Natural Gradient Experiment on Solute Transport in a Sand Aquifer .2. Spatial Moments and the Advection and Dispersion of Nonreactive Tracers." Water Resources Research 22(13): 2031-2046. George E. P. Box, G. M. J., Gregory C. Reinsel (1976). Time series analysis, forecasting and control, Holden Day. Geranio, L., M. Heuberger, et al. (2009). "The Behavior of Silver Nanotextiles during Washing." Environmental Science & Technology 43(21): 8113-8118. Gwo, J. P., P. M. Jardine, et al. (2005). "Modeling field-scale multiple tracer injection at a lowlevel waste disposal site in fractured rocks: Effect of multiscale heterogeneity and source term uncertainty on conceptual understanding of mass transfer processes." Journal of Contaminant Hydrology 77(1-2): 91-118. Hameed, T., M. A. Marino, et al. (1996). "Time series modeling of channel transmission losses." Agricultural Water Management 29(3): 283-298. Jia, J. S., J. Z. Zhao, et al. (2010). "Ecological footprint simulation and prediction by ARIMA model-A case study in Henan Province of China." Ecological Indicators 10(2): 538-544. 22
ACCEPTED MANUSCRIPT Kaegi, R., A. Ulrich, et al. (2008). "Synthetic TiO2 nanoparticle emission from exterior facades into the aquatic environment." Environmental Pollution 156(2): 233-239.
PT
Krol, M. M., A. J. Oleniuk, et al. (2013). "A Field-Validated Model for In Situ Transport of Polymer-Stabilized nZVI and Implications for Subsurface Injection." Environmental Science & Technology 47(13): 7332-7340.
RI
Kumar, K. and V. K. Jain (1999). "Autoregressive integrated moving averages (ARIMA) modelling of a traffic noise time series." Applied Acoustics 58(3): 283-294.
SC
Lemke, L. D., L. M. Abriola, et al. (2004). "Dense nonaqueous phase liquid (DNAPL) source zone characterization: Influence of hydraulic property correlation on predictions of DNAPL infiltration and entrapment." Water Resources Research 40(1).
NU
Lemke, L. D., L. M. Abriola, et al. (2004). "Influence of hydraulic property correlation on predicted dense nonaqueous phase liquid source zone architecture, mass recovery and contaminant flux." Water Resources Research 40(12).
MA
Li, Y., Y. Wang, et al. (2008). "Investigation of the Transport and Deposition of Fullerene (C60) Nanoparticles in Quartz Sands under Varying Flow Conditions." Environmental Science & Technology 42(19): 7174-7180.
D
Ljung, G. M. and G. E. P. Box (1978). "Measure of Lack of Fit in Time-Series Models." Biometrika 65(2): 297-303.
TE
LuxResearch (2009) "Nanomaterials State of the Market Q1 2009: Cleantech's Dollar Investments, Penny Returns."
AC CE P
Mackay, D. M., D. L. Freyberg, et al. (1986). "A Natural Gradient Experiment on Solute Transport in a Sand Aquifer .1. Approach and Overview of Plume Movement." Water Resources Research 22(13): 2017-2029. Mann, J. K., J. R. Balmes, et al. (2010). "Short-Term Effects of Air Pollution on Wheeze in Asthmatic Children in Fresno, California." Environmental Health Perspectives 118(10): 1497-1502. Mauter, M. S. and M. Elimelech (2008). "Environmental applications of carbon-based nanomaterials." Environmental Science & Technology 42(16): 5843-5859. Petosa, A. R., D. P. Jaisi, et al. (2010). "Aggregation and Deposition of Engineered Nanomaterials in Aquatic Environments: Role of Physicochemical Interactions." Environmental Science & Technology 44(17): 6532-6549. Phelan, T. J., L. D. Lemke, et al. (2004). "Influence of textural and wettability variations on predictions of DNAPL persistence and plume development in saturated porous media." Advances in Water Resources 27(4): 411-427. Ramsburg, C. A., L. M. Abriola, et al. (2004). "Stimulated microbial reductive dechlorination following surfactant treatment at the Bachman Road site." Environmental Science & Technology 38(22): 5902-5914. Sharma, P., A. Chandra, et al. (2009). "Forecasts using Box-Jenkins models for the ambient air quality data of Delhi City." Environmental Monitoring and Assessment 157(1-4): 105112. 23
ACCEPTED MANUSCRIPT Shumway, R. H. (1988). Applied Statistical Time Series Analysis, Prentice-Hall, Englewood Cliffs, NJ. Shumway, R. H. and D. S. Stoffer (2010). Time Series Analysis and Its Applications: With R Examples, Third Edition, Springer.
PT
Team, R. D. C. (2011). R: a language and environment for statistical computing. R Foundation for Statistical Computing: Vienna. http://www.r-project.org/ "Nanotechnology Consumer Product Inventory. http://www.nanotechproject.org/inventories/consumer.
RI
WWICS.
."
2012,
from
SC
Yao, K. M., M. M. Habibian, et al. (1971). "Water and Waste Water Filtration - Concepts and Applications." Environmental Science & Technology 5(11): 1105-1112.
NU
Zhang, Y., Y. S. Chen, et al. (2008). "Stability of commercial metal oxide nanoparticles in water." Water Research 42(8-9): 2204-2212.
AC CE P
TE
D
MA
Zheng, C. and P. Wang (1999). MT3DMS: A Modular Three-Dimensional Multispecies Model for Simulation of Advection, Dispersion and Chemical Reactions of Contaminants in Groundwater Systems; Documentation and User's Guide. Vicksburg, Mississippi.
24
ACCEPTED MANUSCRIPT Table1. ARIMA models based on various interpolation intervals for the Borden Site.
10
(1,1,0)
15
(1,1,0)
µ = 0.8205 ± 0.0243 φ1= 0.6244 ± 0.12581 µ = 1.2257 ± 0.0431 φ1= 0.6386 ± 0.1537
PT
(2,1,0)
Model MSE 8.66E-04
RI
5
Parameters µ = 0.4088 ± 0.0134 φ1= 0.468 ± 0.1021 φ2= 0.3033 ± 0.3033
3.75E-03
SC
ARIMA
AC CE P
TE
D
MA
NU
Interpolation Intervals (day)
25
7.61E-03
Prediction MSE 6.71E-02
2.10E-02 1.39E-01
ACCEPTED MANUSCRIPT
1.5
(0,1,2)
2
(0,1,2)
SC
(0,1,4)
Model MSE
NU
1
Parameters µ = 0.0012 ± 0.0002 θ1= 0.6447 ± 0.1035 θ2= 0.1282 ± 0.1148 θ3=-0.4649 ± 0.1301 θ4= -0.4723 ± 0.0963 µ = 0.0017 ± 0.0004 θ1= 0.0481 ± 0.1237 θ2= -0.499 ± 0.1252 µ = 0.0023 ± 0.0005 θ1= 0.2437 ± 0.1358 θ2= -0.6378 ± 0.1302
MA
ARIMA
AC CE P
TE
D
Interpolation Intervals (day)
RI
PT
Table2. ARIMA models based on various interpolation intervals for the Tennessee Site.
26
Prediction MSE
5.24E-06
8.09E-05
2.17E-05
1.29E-04
1.88E-05
1.28E-04
ACCEPTED MANUSCRIPT
PT
Table 3. ARIMA models for nanoparticles transport
Parameters
Model MSE
nC60 C front
(0,1,0)
µ = 0.0191 ± 0.0026
8.95E-04
2.01E-02
-531.12
nC60 S front
(1,1,0)
4.61E-04
7.43E-03
-614.05
nZVI
(2,1,0)
µ = 0.0162 ± 0.0025 φ1= 0.2410 ± 0.0856 µ = -0.1316 ± 0.0377 φ1= 1.2323 ± 0.1139 φ2 = -0.3154 ± 0.1141
7.98E-4
1.18
-277.53
AC CE P
TE
D
MA
NU
SC
RI
ARIMA
Prediction MSE
27
AIC
TE
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
AC CE P
Figure 1. The comparison of the observed and the ARIMA model predicted plum center of mass at the Borden Site
28
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
AC CE P
TE
Figure 2. Comparison of the observed and the ARIMA model predicted concentration change in an observation well in the Tennessee Site. The red line represents the prediction ending at day 100.
29
ACCEPTED MANUSCRIPT
(A)
(C) 1st-order difference of Bromide
PT
RI
0.10
SC
0.05
200
300
-0.10
MA
100
-0.05
NU
0.00
BromideDiff2
0.45 0.40 0.35
BromideDiff1
0.50
2nd-order difference of Bromide
400
100
TE
(B)
2nd-difference autocorrelation
0.0 0.2 0.4 0.6 0.8 1.0
ACF
AC CE P
1.0 0.8 0.6 0.4
400
-0.4
-0.2
0.0
0.2
300
(D)
1st-difference autocorrelation
ACF
200
time(day)
D
time(day)
0
2
4
6
8
10
0
Lag
2
4
6
8
10
Lag
Figure 3. First-order differenced time series and its ACF- PACF (plots A and B); Secondorder differenced time series and its ACF- PACF (plots C and D)
30
TE
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
AC CE P
Figure 4. Comparison of the performance of ARIMA(2,1,0) and ARIMA(0,2,1) for predicting center of mass of a tracer plume at the Borden site
31
ACCEPTED MANUSCRIPT (B)
TE
D
MA
NU
SC
RI
PT
(A)
AC CE P
Figure 5. An example (A) aqueous-phase and (B) attached-phase nC60 concentration distribution at the Bachman Road site after 48 hours continuous release of nC60. The following parameters were used: α = 0.001, dp = 100 nm, i = 0.01, and C0 = 1 mg/L.
32
ACCEPTED MANUSCRIPT
MA
NU
SC
RI
PT
(A)
AC CE P
TE
D
(B)
Figure 6. Comparison of the observed and the ARIMA model predicted (A) C front and (B) S front for nanoparticle transport at a field site.
33
AC CE P
TE
D
MA
NU
SC
RI
PT
ACCEPTED MANUSCRIPT
Figure 7. Comparison of the observed and the ARIMA model predicted CMC-nZVI concentration in the PPT-1 of a field site.
34
ACCEPTED MANUSCRIPT Highlights Time series analysis of tracer and nanoparticle transport in the field.
ARIMA implementation issues for predicting transport in the subsurface.
The first effort of applying ARIMA modeling for transport in the subsurface.
AC CE P
TE
D
MA
NU
SC
RI
PT
35