Hao Yea,1, Richard J. Beamishb, Sarah M. Glaserc, Sue C. H. Grantd, Chih-hao Hsiehe, Laura J. Richardsb, Jon T. Schnuteb, and George Sugiharaa,1 a Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA 92093; bPacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC V9T 6N7, Canada; cJoseph S. Korbel School of International Studies, University of Denver, Denver, CO 80210; dFisheries and Oceans Canada, Delta, BC V3M 6A2, Canada; and eInstitute of Oceanography and Institute of Ecology and Evolutionary Biology, National Taiwan University, Taipei, Taiwan 10617

PNAS PLUS SEE COMMENTARY

Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling

It is well known that current equilibrium-based models fall short as predictive descriptions of natural ecosystems, and particularly of fisheries systems that exhibit nonlinear dynamics. For example, model parameters assumed to be fixed constants may actually vary in time, models may fit well to existing data but lack out-of-sample predictive skill, and key driving variables may be misidentified due to transient (mirage) correlations that are common in nonlinear systems. With these frailties, it is somewhat surprising that static equilibrium models continue to be widely used. Here, we examine empirical dynamic modeling (EDM) as an alternative to imposed model equations and that accommodates both nonequilibrium dynamics and nonlinearity. Using time series from nine stocks of sockeye salmon (Oncorhynchus nerka) from the Fraser River system in British Columbia, Canada, we perform, for the the first time to our knowledge, real-data comparison of contemporary fisheries models with equivalent EDM formulations that explicitly use spawning stock and environmental variables to forecast recruitment. We find that EDM models produce more accurate and precise forecasts, and unlike extensions of the classic Ricker spawner–recruit equation, they show significant improvements when environmental factors are included. Our analysis demonstrates the strategic utility of EDM for incorporating environmental influences into fisheries forecasts and, more generally, for providing insight into how environmental factors can operate in forecast models, thus paving the way for equation-free mechanistic forecasting to be applied in management contexts.

|

|

ecosystem forecasting fisheries ecology physical–biological interactions empirical dynamic modeling nonlinear dynamics

|

these two time periods, conflicting with the assumption of a fixed equilibrium and constant parameter values. Indeed, Beamish et al. (5) found that the Ricker model fit better when constrained by climate regimes, suggesting that the spawner–recruit relationship does vary in time, a fact consistent with the general notion of nonlinear state dependence (6, 7). At its core, nonlinear dynamics [which are known to be ubiquitous in marine species (8, 9)] occur when variables have interdependent effects; this can be problematic when applying a reductionist approach to understand nonlinear systems. For example, in laboratory experiments, guppies (Poecilia reticulatus) preferentially eat Drosophila or tubificid worms depending on which prey is more abundant (10). Thus, the strength of predation on, say, Drosophila, will change depending on the abundance of tubificid worms. This prey-switching behavior typifies nonlinear state dependence, whereby different components cannot be treated independently, as would be the case in a linear system or even a nonlinear system approximated at equilibrium. Consequently, applying a model that assumes separability of effects [e.g., vector autoregression (11)] to a system that is actually nonlinear can give the appearance of nonstationarity or stochasticity even when the underlying mechanisms are unchanged and deterministic. Nonlinearity is also known to affect the correct identification of causal drivers—a key prerequisite for understanding and predicting system behavior. In nonlinear systems, because interacting variables

|

Significance

O

The conventional parametric approach to modeling relies on hypothesized equations to approximate mechanistic processes. Although there are known limitations in using an assumed set of equations, parametric models remain widely used to test for interactions, make predictions, and guide management decisions. Here, we show that these objectives are better addressed using an alternative equation-free approach, empirical dynamic modeling (EDM). Applied to Fraser River sockeye salmon, EDM models (i) recover the mechanistic relationship between the environment and population biology that fisheries models dismiss as insignificant, (ii) produce significantly better forecasts compared with contemporary fisheries models, and (iii) explicitly link control parameters (spawning abundance) and ecosystem objectives (future recruitment), producing models that are suitable for current management frameworks.

ne of the fundamental challenges of environmental science is to understand and predict the behavior of complex natural ecosystems. This task can be especially difficult when multiple drivers (e.g., species interactions, environmental influences) interact in a nonlinear state-dependent way to produce dynamics that appear to be erratic and nonstationary (1). In the standard parametric approach, which implicitly assumes that the selected model and its equations are essentially correct, the equations (really just mechanistic hypotheses) can lack the flexibility to describe the nonlinear dynamics that occur in nature. Consequently, these parametric models tend to perform poorly as descriptions of reality, with little explanatory or predictive power (2, 3), and limited usefulness for prediction and management.

Parametric Models as Hypotheses A common problem when applying the parametric approach to nonlinear systems is that of ephemeral fitting. That is, although population models may assume that demographic parameters such as growth rate or carrying capacity are fixed constants, these quantities are often observed to vary in time or in relation to other variables (e.g., resource availability, changing climate regimes) when tested on actual data (4). This principle is illustrated in Fig. 1A, where the Ricker spawner–recruit relationship is fit to the early (1948–1976) and late (1977–2005) halves of the time series from the Seymour stock. Very different relationships emerge in www.pnas.org/cgi/doi/10.1073/pnas.1417063112

Author contributions: H.Y., S.M.G., C.-h.H., and G.S. designed research; H.Y. performed research; H.Y. analyzed data; and H.Y., R.J.B., S.M.G., S.C.H.G., C.-h.H., L.J.R., J.T.S., and G.S. wrote the paper. The authors declare no conflict of interest. This article is a PNAS Direct Submission. Freely available online through the PNAS open access option. See Commentary on page 3856. 1

To whom correspondence may be addressed. Email: [email protected] or [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1417063112/-/DCSupplemental.

PNAS | Published online March 2, 2015 | E1569–E1576

ECOLOGY

Edited by Stephen R. Carpenter, University of Wisconsin–Madison, Madison, WI, and approved January 28, 2015 (received for review September 17, 2014)

A

B

idealized Ricker model surface

C empirical (non-parametric) model surface

0.8

0.4

0.2

0.0 0.00

0.03 0.06 Spawners (x 106)

0.09

Recruits (x 106)

0.8

0.6

Recruits (x 106)

Recruits (x 106)

0.8

0.6 0.4 0.2

0.10 0.08 Spa wn 0.060.04 ers (x 1 0.02 0 6)

7.0 7.5 8.0 C) 8.5 T (° 9.0 SS

0.6 0.4 0.2

0.10 Spa 0.08 0.06 wn 0.04 ers 0.02 (x 1 0 6)

7.0 7.5 8.0 C) ° 8.5 T ( S S 9.0

Fig. 1. Model output for the Ricker, extended Ricker, and multivariate EDM models. (A) Ricker curves for the Seymour stock of Fraser River sockeye salmon are quite different for the early (blue; 1948–1976 brood years) vs. later (red; 1977–2005) time segments (triangles are observed data). Even fit to the whole time series (gray line), large errors remain. (B and C) Model output (surfaces; points are observed data) from the extended Ricker model (B) and EDM (C) using spawner abundance and Pine Island April SST to forecast recruitment of Seymour sockeye salmon. Although the Ricker model varies smoothly, it can forecast recruitment to be many times higher than the historical maximum. In the EDM model, however, the relationship between temperature and spawners is defined empirically by the data and, thus, more realistically depicted.

can exhibit transient (mirage) correlations that change in magnitude or sign (6, 7), the use of correlation to identify causal environmental variables can be misleading, producing both false positives (i.e., correlation does not imply causation) and false negatives (i.e., lack of correlation does not imply a lack of causation). Given the prevalence of nonlinear interactions in ecology, mirage correlations can be misleading. Indeed, a metaanalysis examining the robustness of correlations between recruitment and the environment (12) found that only 28 out of 74 initially significant correlations were upheld when subsequent data were included. Even when causal variables are known, their inclusion into improperly formulated models can produce conflicting results. For example, with sockeye salmon in the Fraser River, although anomalous oceanic conditions experienced by juveniles are thought to be responsible for the low abundance of returning adults in 2009 (13–15), extensions of the standard Ricker model that explicitly include environmental factors surprisingly show no significant improvements in the actual forecasts (16–18). A simple explanation for this apparent contradiction is that the extended Ricker model does not accurately portray the relevant interaction between the oceanic environment and sockeye salmon. Indeed, the model naively assumes that the environment acts on recruitment dynamics independently with a constant multiplicative effect (e.g., a 1 °C decrease in temperature always doubles recruitment regardless of other factors important to the state of the system). Although temperature, in all likelihood, does affect recruitment, it probably does not follow this arbitrary form. We demonstrate this by fitting the model to Pine Island sea surface temperature (SST) and Seymour spawner–recruit data (Fig. 1B), finding that the model predicts unrealistically high recruitment (much higher than the historically observed maximum) for hypothetical (but plausible) conditions of high spawner abundance and low temperature. Thus, although the equation may appear reasonable as a hypothesis, it apparently does not incorporate the environment realistically. Empirical Dynamic Modeling In contrast to fitting an assumed set of equations, empirical dynamic modeling (EDM) instead relies on time series data to reveal the dynamic relationships among variables as they occur (1, 6, 19–21). By extracting these relationships empirically, EDM accommodates potentially complex and changing interactions E1570 | www.pnas.org/cgi/doi/10.1073/pnas.1417063112

that cannot be described in a simple set of equations. Thus, prediction accuracy with EDM is constrained by the quantity and quality of data rather than by the hypotheses represented in a set of equations [which may be subject to process error due to false or incomplete specification (22)]. Fundamental to EDM is the concept of a time series as an observation on a dynamic system. Broadly speaking, a dynamic system can be viewed as a set of “states” (d-dimensional vectors where each coordinate is a system variable) and deterministic rules (governing dynamics) for how the states evolve over time. Collectively, the set of states and their trajectories forms an “attractor manifold,” and projecting the motion on this manifold to a coordinate axis produces a time series of the corresponding variable (SI Appendix, Fig. S1A). For example, in a simple predator– prey system where the system evolves as a function of the two abundances, the system state could be represented as the ordered pair of predator and prey abundances. This system state can be projected onto the prey coordinate axis to produce a time series of prey abundance, although many other observation functions are also possible (e.g., predator abundance, average number of prey for each predator). In theory, with time series for all of the system variables, it would be possible to reconstruct the original attractor manifold by plotting each time series as a separate coordinate. In practice, however, we typically do not have these data or know the identity of all relevant variables. Fortunately, a fundamental mathematical result proves that information about the entire system is contained in any one variable (23, 24), meaning that a shadow version of the original attractor can be constructed from just a single time series. This is accomplished by substituting lags of that time series for the unknown or unobserved variables (SI Appendix, Fig. S1B). These essential mechanics of EDM are detailed in ref. 6 and crisply summarized in a short animation (Movie S1). Although a single time series is usually sufficient to reconstruct a system’s dynamics, there are exceptions (e.g., it is not a closed system). In the case of sockeye salmon, abundance alone may not skillfully predict future returns because they are influenced by external environmental factors. Here, the environment may be thought to act as stochastic external forcing, necessitating its inclusion as an additional coordinate in a multivariate reconstruction (1, 7, 24). We demonstrate this by using spawners and Ye et al.

PNAS PLUS

SST to predict recruitment (Fig. 1C). Unlike a parametric model in which a hypothesized interaction must be specified in advance (the extended Ricker model; Fig. 1B), the empirical surface in Fig. 1C makes no assumptions about the relationship between variables, but instead captures the interaction between density dependence and environmental conditions as revealed by the data: ocean temperatures have a stronger effect on recruitment when spawner abundance is low.

SEE COMMENTARY

Pine Island Lighthouse

Stellako

Quesnel Queen Charlotte Sound Seymour Chilko Queen Charlotte Strait

ser r Rive

Returns (millions of fish)

Entrance Island Lighthouse

Fra

Birkenhead Va nc

Late Shuswap

Strait of Georgia ou

ve

rI

sla

nd

0

Weaver

50

100

kilometers

Canada U.S.A.

Fig. 3. Early ocean environment for Fraser River sockeye salmon. Upon exiting the Fraser River, juvenile sockeye salmon migrate north through the Strait of Georgia, spending up to a month moving through this ecosystem (31), before continuing through Queen Charlotte Strait and into Queen Charlotte Sound. Red labels for the nine stocks studied in this work are located at the approximate spawning sites. Blue triangles denote the locations of the two lighthouses where SST is recorded. Image courtesy of DFO.

Methods), which enables us to focus on just the natural population dynamics. Second, to investigate the causal influence of the oceanic environment, we consider forecasts produced by the extended Ricker model and equivalent multivariate EDM formulations. In both cases, if the inclusion of environmental variables significantly improves forecasts (Materials and Methods), those variables are taken to have a causal influence on salmon recruitment. Last, to avoid arbitrary fitting and to obtain a robust measure of forecast skill, we apply a fourfold cross-validation scheme for each model: the model is fit to three-fourths of the data to predict the remaining one-fourth out-of-sample, and the procedure is repeated for each one-fourth segment of the time series. Results

20

Comparison of Spawner–Recruit Forecast Models. As a fair com-

10

0 1950

1960

1970

1980 Year

1990

2000

2010

Fig. 2. Combined returns of Fraser River sockeye salmon. Total returns (Dataset S1) for Fraser River sockeye salmon combined across stocks (1954 cycle line in black). Although not all stocks exhibit cyclic dominance, and those that do are not synchronized, cycles are still visible in the aggregated returns.

Ye et al.

parison with the standard Ricker model where spawner abundance is used to predict recruitment, we examine an equivalent EDM spawner–recruit model, but which actually has fewer fitted parameters (Materials and Methods). Fig. 4 shows that this simple EDM model has significantly higher accuracy (ρ, correlation between observations and predictions) than the Ricker model, with more accurate forecasts in eight of nine cases and significantly lower error overall [mean absolute error (MAE); SI Appendix, Fig. S3]. Nonetheless, predictions for several stocks (Birkenhead, Chilko, Stellako, and Weaver) are not very skillful (ρ < 0.3), suggesting that in these cases, there is no simple spawner–recruit relationship (parametric or otherwise). Instead, environmental factors (e.g., SST, food availability) may dominate, and better performance can be obtained by accounting for these external drivers. PNAS | Published online March 2, 2015 | E1571

ECOLOGY

Fraser River Sockeye Salmon In this work, we perform a real-world test comparing EDM and the standard parametric paradigm, by forecasting returns for the nine most historically abundant stocks of sockeye salmon from the Fraser River system in British Columbia, Canada (Fig. 2), of significance to Canada’s iconic fisheries. Total returns in this system are highly variable and can span over an order of magnitude: a record low of 1.6 million in 2009 was followed by a record high of 28.3 million in 2010 (Fig. 3). Although some of this variability occurs because of cyclic dominance (25, 26), large interannual fluctuations in mortality and productivity (recruits-perspawner) are difficult to predict, leading to considerable uncertainties in current parametric forecast models (27). This is suggestive of nonlinear dynamics in this fishery, and indeed, a Canadian federal inquiry (13, 14) concluded that recent declines in productivity could not be attributed to any single mechanism but were likely caused by the interaction of multiple stressors (e.g., predators, food availability, environment). Applying a simple S-map test (P = 0.002) (SI Appendix, Fig. S2), we confirm the presence of nonlinear dynamics among returns of Fraser River sockeye salmon. Thus, we apply EDM methods to unravel the mechanisms by which the environment may affect sockeye salmon recruitment. First, we compare the classical Ricker spawner–recruit model with equivalent EDM spawner–recruit models. With nearly all adults returning as age 4 or age 5 fish, we can consider the total returns in a single calendar year to be composed of age 4 and age 5 recruits from different spawning broods. Following ref. 16, we predict annual returns by first estimating total recruitment for each spawning brood year. This recruitment is then partitioned by age, and the age 4 and age 5 estimates from separate brood years combined appropriately to forecast returns (Materials and Methods). Note that the time series of spawning abundance and recruitment already account for the effects of the fishery (this information is contained within the time series; Materials and

Fraser River mouth

Early Stuart Late Stuart

Incorporating Environmental Influences. As in the actual forecast models (16), we further consider three environmental variables [the Pacific Decadal Oscillation (PDO), SST, and Fraser River discharge] observed at different times and locations (12 time series in total). Each of these factors is believed to have a potential effect on recruitment, although significance has yet to be demonstrated in practice. For each stock, we compare the relative performance of the extended Ricker and corresponding multivariate EDM models that incorporate these environmental variables (Table 1; Materials and Methods). Fig. 4 shows that multivariate EDM is consistently and significantly better at forecasting than the extended Ricker model for all nine stocks, and is true for both accuracy and precision metrics (SI Appendix, Fig. S3). Here, the relevant causal influence of these environmental variables is verified by the fact that multivariate EDM models that include them perform significantly better than their simple EDM spawner– recruit counterparts. By contrast, the extended Ricker models show no significant improvement over the simple Ricker models in any of the stocks. The difference between EDM and Ricker is especially visible for Late Stuart, Quesnel, Stellako, and Weaver, indicating that these particular environmental factors (currently considered in assessments) can explain much of the variability in these stocks, provided they are incorporated reasonably (i.e., with the minimal assumptions of EDM). For Birkenhead and Chilko, however, multivariate EDM models performed no better than the simplified stock– recruit versions, hinting that variables other than these are required to understand the dynamics of those stocks.

Discussion Nonlinearity of Fraser River Sockeye Salmon. Similar to many marine species (8, 9), Fraser River sockeye salmon show strong evidence for nonlinear dynamics (SI Appendix, Fig. S2 and Table S1).

0.8

Ricker extended Ricker simple EDM multivariate EDM

0.6

0.4

0.2

0.0

Birkenhead Early Stuart Late Stuart Seymour Weaver Chilko Late Shuswap Quesnel Stellako Significant simple EDM vs. Ricker > (P = 0.039)

multivariate EDM vs. extended Ricker > (P = 0.014)

Non-significant extended Ricker vs. Ricker (P = 0.10)

multivariate EDM vs. simple EDM > (P = 0.0024)

Fig. 4. Comparison of forecast accuracy. Comparisons between equivalent EDM and Ricker models show better forecast accuracy for the EDM models [simple EDM vs. Ricker, t(492) = 1.77, P = 0.039; multivariate EDM vs. extended Ricker, t(492) = 2.20, P = 0.014]. Additionally, including environmental data significantly improves accuracy for EDM [t(492) = 2.83, P = 0.0024], but not for the Ricker models [t(492) = 1.26, P = 0.10].

E1572 | www.pnas.org/cgi/doi/10.1073/pnas.1417063112

Table 1. Forecast skill of models incorporating the environment Stock

Model

Birkenhead

Ricker EDM Ricker EDM Ricker EDM Ricker EDM Ricker EDM Ricker EDM Ricker EDM Ricker EDM Ricker EDM

Chilko Early Stuart Late Shuswap Late Stuart Quesnel Seymour Stellako Weaver

Predictors S, S S, S S, S, S, S, S, S, S, S, S, S, S, S, S, S,

ETjun ETmay ETapr Dapr, Djun ETjun Dmay, PTjul Djun Djun, ETapr ETjun PTmay, PDO PTapr PTjul ETmay PTapr, PDO PDO Dapr, Dmay

No. predictions

ρ

MAE

57 57 57 57 57 57 57 57 56 56 57 57 57 57 57 57 39 39

−0.111 0.156 0.268 0.264 0.737 0.878 0.875 0.923 0.552 0.783 0.387 0.861 0.571 0.734 0.191 0.531 −0.094 0.573

0.251 0.259 0.825 0.839 0.172 0.140 0.842 0.821 0.423 0.250 2.057 0.729 0.076 0.065 0.250 0.217 0.215 0.176

D, Fraser River discharge; ET, Entrance Island SST; PDO, Pacific Decadal Oscillation; PT, Pine Island SST.

Thus, it should not be surprising that a simple EDM model, which accommodates nonlinearity, would outperform the assumed spawner–recruit equation of the Ricker model. Furthermore, because sockeye salmon are exposed to different sources of environmentally driven mortality and because it is likely that they integrate these effects in a nonlinear fashion, it should not be surprising that multivariate EDM models that explicitly accommodate relevant environmental factors would show dramatically improved performance. In contrast, the extended Ricker model cannot resolve the nonlinear effect of the environment and shows only nonsignificant improvements (that might be expected from having additional degrees of freedom). Identifying Environmental Drivers. It is believed that growth during the early marine stage for Pacific salmon is a critical period that determines subsequent mortality and recruitment (28, 29). Thus, it is reasonable to expect that including related environmental variables into models should improve predictions. However, the extended Ricker model did not improve when river discharge, SST, or the PDO (Fig. 4 and SI Appendix, Fig. S3) were included. Rather than suggesting that these variables have no effect, it is more likely that the extended Ricker model is incorrectly specified. This is borne out by the fact that these factors produce improved forecasts for many stocks when included nonparametrically in EDM (Fig. 4 and SI Appendix, Fig. S3). Thus, our analysis suggests that the tested variables are indeed informative about the relevant environmental conditions experienced by juvenile sockeye salmon. For example, river discharge and SST may indicate primary productivity in the Strait of Georgia and other areas through which juveniles migrate (Fig. 2) (15, 30, 31), and the associations between large-scale oceanic climate indicators such as the PDO and Pacific salmon productivity are well known from other studies (32, 33). Although these variables do not reflect direct causal mechanisms, they may be useful as simple indicators of processes that influence salmon survival, thereby improving forecasts when included in the EDM approach. Although individual stocks appear to be sensitive to different environmental factors (Table 1), we did observe some general patterns: for example, two of the nine stocks (Stellako and Quesnel) identified the PDO as an informative variable (the first-ranked EDM models for these stocks include the PDO as a coordinate), Ye et al.

Nonuniqueness of Models. We note that, for a given stock, different EDM models can show similar performance (SI Appendix, Table S4). Although somewhat counterintuitive, this phenomenon is expected, because the tested variables (river discharge, SST, the PDO) are proxy indicators of the environment. Thus, they may contain redundant information such that different variable combinations are equally informative even as they represent alternative perspectives on the system. This reflects a fundamental property of EDM in that forecast performance depends solely on the information content of the data rather than on how well assumed equations match reality. To clarify the concept of nonuniqueness, consider the canonical Lorenz attractor (SI Appendix, Fig. S1A). The behavior of this system is governed by three differential equations (SI Appendix, Eq. S1). However, the axes can be rotated to produce three new coordinates, x′, y′, and z′, and the equations rewritten in terms of these new coordinates, allowing the system to be described using either representation (x, y, and z or x′, y′, and z′) as well as mixed combinations (e.g., x, y, and z′). Thus, with an infinite number of ways to rotate the system, there are an unlimited number of “true variables” and “true models.” In the case of sockeye salmon, the similar performance of different models (SI Appendix, Table S4) does not mean that one or the other model is incorrect; instead, it reflects the fact that the environmental variables are indicators of the same general mechanism, and so different variable combinations can be equally informative for forecasting recruitment. Again, we emphasize that including a variable does not imply a direct causal link—variables in an EDM model improve forecasts because they are informative; it does not mean that the included variables are proximate causes. Importantly, the converse does not hold either: a variable could be causal and yet not appear in the multivariate EDM; this might occur when multiple stochastic drivers affect recruitment in an interdependent way, necessitating that a model include measurements of all of the drivers to account for their combined effect. For example, although none of the tested variables seem to improve forecasts for Ye et al.

Data Requirements of EDM. Using EDM is fundamentally a datadriven approach: thus, it is important to ensure that time series are of sufficient length to recover dynamics. For example, Sugihara et al. (6) suggest that at least 35–40 points might be necessary as a rough minimum, although methods exist for using dynamically similar replicates in cases where time series are shorter (36). For many systems, however, the data requirements of EDM mean that increased budgets and additional sampling effort will be important to support long-term continuous observations and generate sufficient time series. We note, although, that it is not necessary to sample all putatively relevant drivers, because different measurements are often substitutable as proxies for true proximal causes. When data requirements are met, however, we note that collecting additional data can further improve accuracy and precision of EDM models. Consider the simplex projection method, which uses nearest-neighbor analogs to approximate system behavior. With each new data point, more analogs are available (the reconstructed manifold becomes denser), and so these approximations become more precise. Thus, EDM models will improve with longer time series. In contrast, a parametric model will benefit from more data only when the assumed equations are essentially correct. In the case of the classic Ricker model in Fig. 1A, it is clear that similar levels of spawner abundance yield very different levels of recruitment, and so any simple function relating the two cannot fully explain the scattered observations. Adding more data may result in more “precise” parameter estimates, but individual errors will remain large when the underlying process is more complex than the assumed model can portray. Alternative Parametric Models. In this work, we use the classical and extended Ricker models as examples of the parametric approach, but acknowledge that there are alternative models considered by Fisheries and Oceans Canada (DFO) for Fraser River sockeye salmon (37–40). Although some of these models may fit the data better, this does not always reflect a model’s true performance in out-of-sample forecasting. For example, a modified Ricker model that allows parameters to randomly drift over time (37) will explain variations in the data better than the static alternative, because doing so can indirectly track nonlinear state dependence. However, instead of a mechanism for why parameters change, such models based on the Kalman filter (41) typically use forward information (i.e., observations at time t + 1 help to estimate the growth rate at time t) and thus do not actually “predict.” Consequently, the actual forecast performance of such models will be overestimated by their fit to historical data. A more fundamental concern with the parametric approach is that it requires explicit equations to model the effects of included variables. Such equations may be overly simplified (e.g., linear correlations) and unable to accommodate the state-dependent effects that occur in nonlinear systems. Final Remarks. EDM addresses two important challenges for modeling natural systems. First, EDM identifies relevant variables and interactions empirically and dynamically (6); this is in contrast to the conventional approach where the use of parametric equations poses the dual risks of model misspecification (22) as well as variable misidentification (6, 7, 12). Importantly, EDM allows proxy variables to be used, which can be a boon when observations on key processes (e.g., mortality) are lacking but indirect measurements (e.g., SST) are available. Second, the equation-free approach of EDM produces more accurate forecasts than equivalent parametric models using the same data. As Perretti et al. (2) have shown, even when a correct parametric PNAS | Published online March 2, 2015 | E1573

PNAS PLUS SEE COMMENTARY

the Birkenhead stock (SI Appendix, Table S4), this does not mean that these sockeye salmon are insensitive to SST, river discharge, and the PDO. Rather, it suggests that the effect of these variables may be modulated by other factors not considered here.

ECOLOGY

yet the predictability for these stocks is further improved when other variables (river discharge or SST) are included in addition to the PDO. This suggests that the PDO is an incomplete observation on the relevant environment for sockeye, and that local-scale measures of the environment can enhance the information in the PDO index (an ocean basin-scale indicator) (see SI Appendix for details). Although our models confirm a general influence of the environment on sockeye salmon recruitment, some stocks appear to be skillfully predicted using only spawner abundance. One explanation for this is that the stocks experience unique environments: they are exposed to different freshwater conditions in their respective nursery lakes, and they exhibit different timings and migration routes as they travel through the Fraser River, the Strait of Georgia (34), and along the west coast of North America (35). Even with shared environmental influences (e.g., food availability in the Strait of Georgia), nonlinear state dependence can produce dynamics unique to each stock. Consequently, if these myriad effects are strongly density dependent, recruitment could be successfully predicted using just spawner abundance. However, if these effects are stochastic (i.e., environmentally driven), then it will be necessary to include informative indicator variables to improve forecasts. Apart from multivariate models, an alternative approach to determine causal environmental variables would be to apply the method of convergent cross-mapping (CCM) (6). However, due to data limitations (in particular, the absence of annual monitoring of each cycle line including the oceanic phase), CCM may not be sufficiently sensitive to resolve causality here (see SI Appendix for details).

model is known, fitting parametric models can be problematic and is an important concern with many systems exhibiting nonlinear behavior (8, 9). In contrast, EDM models can capture dynamic information and explain behavior that may be misclassified as random by parametric fitting procedures. Consequently, the dynamic perspective of EDM has much to offer for modeling nonlinear systems, representing a viable framework (with minimal assumptions) for system identification and robust forecasting. When parametric models are required, EDM can also be used in a complementary role to identify causal links, recover variable relationships, and even guide the construction of reliable equation-based models (42). This represents a practical way to perform data-driven modeling instead of starting with complex parametric models [end-to-end ecosystem models, such as Ecopath with Ecosim (43)], which often make strong assumptions and require large amounts of data to parameterize. Moreover, EDM models can also serve as direct substitutes for their parametric equivalents. Here, our simple and multivariate EDM models are formulated similarly to their Ricker-based counterparts: using spawner abundance (and the environment) to forecast returns. Some simple extensions to the methods, such as the development of uncertainty estimates (SI Appendix, Fig. S5), will enable these models to be integrated into the current management framework that uses parametric models. Thus, we believe that EDM has great potential as a tool for understanding and forecasting nonlinear ecosystems: by operating without assumed equations, it can be beneficial when exact mathematical descriptions are not available. Materials and Methods Data. We analyze yearly time series data for the nine historically most abundant stocks (Birkenhead, Chilko, Early Stuart, Late Shuswap, Late Stuart, Quesnel, Seymour, Stellako, and Weaver) of sockeye salmon from the Fraser River system (Dataset S2). Data span brood years 1948–2005, except for Late Stuart and Weaver, where data begin in 1949 and 1966, respectively. We consider only single-stock models, so notation and equations are given as for a single stock. St is the number of effective female spawners in brood year t, and Rt is the corresponding recruitment (returning adults). Recruitment is partitioned by age: Ra,t is the number spawned in year t and returning at age a in year t + a. Following ref. 16, total recruitment is the sum of age 4 and age 5 recruits: Rt = R4,t + R5,t. In contrast, total returns, Ny, are the adults that return to spawn in calendar year y, and computed as Ny = R4,y-4 + R5,y-5. As explained below, recruitment is forecast from spawner abundance, and age 4 and age 5 recruits (from different brood years) are summed to estimate total returns in a given calendar year. Note that both recruitment and returns are computed as catch plus escapement plus en route loss, whereas spawner abundance is based on observations of escapement and egg production (27). Thus, both spawner abundance and recruitment account for the effects of catch, and the models we consider here focus just on the population dynamics of this system. We investigate three environmental variables: the PDO, SST, and Fraser River discharge (Dataset S3). For the PDO, one annual time series is constructed as the average of monthly values from November to March (32). SST measures are monthly averages from two lighthouse stations (Entrance Island: April to June; Pine Island: April to July). River discharge is measured at Hope; we include peak daily flow and monthly averages (April to June). Fraser River sockeye salmon enter the ocean at age 2, so the environmental data are lagged 2 years to line up with ocean entry time. Attractor Reconstruction. The goal of attractor reconstruction is to approximate the originating dynamic system using time series data. The simplest construction uses successive lags of a single time series (23, 44): given time series {xt}, E-dimensional vectors xt are composed of E lags of x, each separated by a time step τ: x t = Æxt , xt−τ , . . . , xt−ðE−1Þτ æ. Generalizations of Takens’ theorem (24, 45) permit attractor reconstructions using multiple time series. For example, with {xt} and {yt} observed from the same system, one possible reconstruction forms vectors as Æxt , yt , yt−τ æ. To account for different scaling between variables, each time series is first linearly transformed to have mean = 0 and variance = 1. Simplex Projection and S-Map. Simplex projection estimates the trajectory (i.e., forecasts) of a novel system state by computing a weighted average of

E1574 | www.pnas.org/cgi/doi/10.1073/pnas.1417063112

the trajectories of that state’s nearest neighbors (19). Given an attractor reconstruction, and a novel state xs, we first find the b nearest neighbors (typically setting b = E + 1) that are closest to xs: these neighbors are the vectors xn(s,i ), where n(s,i ) designates the time index of the ith closest neighbor to xs. So, xn(s,1) is the closest neighbor to xs, xn(s,2) is the second closest neighbor, etc. We then evolve the neighbors forward, and compute a weighted average of the forward evolutions (h time steps into the future) to estimate xs+h: ^ s+h = x

b X

!, wi ðsÞx nðs,iÞ+h

i=1

b X

wi ðsÞ:

[1]

i=1

The weights, wi(s), are based on the distance between xs and its ith neighbor, xn(s,i), scaled to the distance to the nearest neighbor: wi ðsÞ = expð−dðx s ,x nðs,iÞ Þ=dðx s ,x nðs,1Þ ÞÞ, and d(xs, xt) is the Euclidean distance between the vectors xs and xt. In most cases, we desire forecasts of a scalar value rather than of the full system state. This is possible when the variable to be forecast, y, is an observation on the same dynamic system. As such, there will be a correspondence between xt and the scalar value of yt, and we can adjust Eq. 1 to compute a weighted average of the corresponding values of y: y^s+h =

b X

!, wi ðsÞynðs,iÞ+h

i=1

b X

wi ðsÞ:

[2]

i=1

The S-map procedure computes a local linear map between laggedcoordinate vectors and a target variable and is often used to test for nonlinear state dependence (22). It includes a tuning parameter, θ, that controls the weights associated with individual vectors: θ = 0 reduces the S-map to a linear autoregressive model of order E, whereas θ > 0 gives more weight to nearby states when computing the local linear map, thus allowing for nonlinear behavior. Following refs. 9 and 46, we test for nonlinearity by computing the decrease in forecast error (MAE) as θ is tuned to be greater than 0 (see SI Appendix for details). Model Descriptions. We formulate EDM models to forecast recruitment from spawner abundance, combining age 4 and age 5 recruits (from different brood years) to estimate total returns in a given calendar year. Acknowledging the persistent 4-year quasicycle, the time series of recruits and spawners are scaled so that each cycle line has mean 0 and variance 1: S′t = ðSt − μk ðSÞÞ=σ k ðSÞ and R′a,t = ðRa,t − μk ðRa ÞÞ=σ k ðRa Þ, where k = 1, 2, 3, or 4, depending on cycle line and can be computed as k = 1 + ((t − 1) mod 4). σ k and μk are the mean and SD, respectively, for the kth cycle line. The simple EDM model approximates the system state with 1 lag of the transformed spawner abundance: x t = ÆS′t æ:

[3]

Forecasts of the age 4 and age 5 recruits, R′4,t and R′5,t , are made using simplex projection. Here, the two nearest neighbors of S′t are identified, and the corresponding values of R′4,t (or R′5,t ) are combined in a weighted average to produce a forecast. These forecasts are transformed back into raw values, ^ a:t = R ^ a,t ′ · σ k ðRa Þ + μk ðRa Þ, and age 4 and age 5 recruits are combined to R ^y = R ^ 4,y−4 + R ^ 5,y−5 . produce a forecast of total returns, N The multivariate models combine spawner data with up to two environmental indicators: x t = ÆS′t , U′t +2 æ x t = ÆS′t , U′t+2 , U″t+2 æ,

[4]

where U′t+2 (or U″t+2 ) is one of the environmental time series described previously, normalized to have mean = 0 and variance = 1. Just as in the simple EDM model, forecasts of the age 4 and age 5 recruits are made using simplex projection and combined to produce a forecast of total returns. However, because including environmental variables increases the embedding dimension, three nearest neighbors are used for models that include one environmental coordinate, and four nearest neighbors for models that include two environmental coordinates. Following ref. 16, we use the standard Ricker model to estimate total recruitment and then partition it into age 4 and age 5 fish. Age 4 and age 5 fish from separate brood years are combined to forecast the number of returns:

Ye et al.

where p4 is the average fraction of recruits that return as age 4 fish. The extended Ricker model is similar but includes an additional term in the exponent for an environmental covariate: ^ t = St expðα − βSt + γUt+2 Þ: R

[6]

Fitting Procedure and Performance Measures. To avoid testing all combinations of (and overfitting) the environmental variables in the EDM model, we sequentially add the environmental variable that most improves forecast accuracy (ρ, the correlation between observed and predicted values). If none of the variables improves forecasts when added, then no further environmental variables are included. Thus, the best EDM model for some stocks may have only 0 or 1 environmental variable (Table 1). Similarly, for the extended Ricker model, we choose the environmental variable that gives the highest ρ. The Ricker models were fit using R 3.0.2 (www.r-project.org/), the Rjags package (cran.r-project.org/web/packages/rjags/index.html), and JAGS 3.2.0 (Just Another Gibbs Sampler; mcmc-jags.sourceforge.net/) following the procedure outlined in ref. 16. Medians of the posterior distribution are used to obtain point estimates suitable for comparison. The EDM models were constructed using R 3.0.2 and the rEDM package (https://github.com/ha0ye/rEDM). The package can be installed with the following lines of R code: > library(devtools) > install_github(“ha0ye/rEDM”) 1. Dixon PA, Milicich MJ, Sugihara G (1999) Episodic fluctuations in larval supply. Science 283(5407):1528–1530. 2. Perretti CT, Munch SB, Sugihara G (2013) Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data. Proc Natl Acad Sci USA 110(13):5253–5257. 3. Wood SN, Thomas MB (1999) Super-sensitivity to structure in biological models. Proc R Soc Lond B Biol Sci 266(1419):565–570. 4. Walters CJ (1987) Nonstationarity of production relationships in exploited populations. Can J Fish Aquat Sci 44(S2):156–165. 5. Beamish RJ, Schnute JT, Cass AJ, Neville CM, Sweeting RM (2004) The influence of climate on the stock and recruitment of pink and sockeye salmon from the Fraser River, British Columbia, Canada. Trans Am Fish Soc 133(6):1395–1412. 6. Sugihara G, et al. (2012) Detecting causality in complex ecosystems. Science 338(6106): 496–500. 7. Deyle ER, et al. (2013) Predicting climate effects on Pacific sardine. Proc Natl Acad Sci USA 110(16):6430–6435. 8. Hsieh CH, Glaser SM, Lucas AJ, Sugihara G (2005) Distinguishing random environmental fluctuations from ecological catastrophes for the North Pacific Ocean. Nature 435(7040):336–340. 9. Glaser SM, et al. (2014) Complex dynamics may limit prediction in marine fisheries. Fish Fish 15(4):616–633. 10. Murdoch WW, Avery S, Smyth MEB (1975) Switching in predatory fish. Ecology 56(5): 1094–1105. 11. Engle R, Granger C (1987) Co-integration and error correction: Representation, estimation, and testing. Econometrica 55(2):251–276. 12. Myers RA (1998) When do environment-recruitment correlations work? Rev Fish Biol Fish 8(3):285–305. 13. Peterman RM, Dorner B (2011) Fraser River Sockeye Production Dynamics (The Cohen Commission of Inquiry into the Decline of Sockeye Salmon in the Fraser River, Vancouver), Technical Report 10. 14. Cohen BI (2012) The Uncertain Future of Fraser River Sockeye (Commission of Inquiry into the Decline of Sockeye Salmon in the Fraser River, Ottawa), Final Report, Vol 2. 15. Thomson RE, et al. (2012) Anomalous ocean conditions may explain the recent extreme variability in Fraser River sockeye salmon production. Mar Coast Fish 4(1):415–437. 16. Grant SCH, Michielsens CGJ, Porszt EJ, Cass A (2010) Pre-season Run Size Forecasts for Fraser River Sockeye Salmon (Oncorhynchus nerka) in 2010 (Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2010/042. 17. MacDonald BL, Grant SCH (2012) Pre-season Run Size Forecasts for Fraser River Sockeye Salmon (Oncorhynchus nerka) in 2012 (Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2012/011. 18. Grant S, MacDonald B (2013) Pre-season Run Size Forecasts for Fraser River Sockeye (Oncorhynchus nerka) and Pink (O. gorbuscha) Salmon in 2013 (Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 145. 19. Sugihara G, May RM (1990) Nonlinear forecasting as a way of distinguishing chaos from measurement error in time series. Nature 344(6268):734–741. 20. Liu H, et al. (2012) Nonlinear dynamic features and co-predictability of the Georges Bank fish community. Mar Ecol Prog Ser 464:195–207.

Ye et al.

ACKNOWLEDGMENTS. We thank Michael Fogarty, Terry Beacham, Brad Werner, Ethan Deyle, Charles Perretti, and two anonymous reviewers for their feedback on this work. This research is supported by a National Science Foundation Grant DEB-1020372 (to G.S. and H.Y.), Foundation for the Advancement of Outstanding Scholarship and Ministry of Science and Technology of Taiwan (C.-h.H.), National Science Foundation–National Oceanic and Atmospheric Administration Comparative Analysis of Marine Ecosystem Organization (CAMEO) Program Grant NA08OAR4320894/ CAMEO (to G.S.), a National Science Foundation Graduate Research Fellowship (to H.Y.), the Sugihara Family Trust (G.S.), the Deutsche Bank– Jameson Complexity Studies Fund (G.S.), and the McQuown Chair in Natural Science (G.S.).

21. Glaser SM, Ye H, Sugihara G (2014) A nonlinear, low data requirement model for producing spatially explicit fishery forecasts. Fish Oceanogr 23(1):45–53. 22. Sugihara G (1994) Nonlinear forecasting for the classification of natural time series. Philos Trans Phys Sci Eng 348(1688):477–495. 23. Takens F (1981) Detecting strange attractors in turbulence. Dynamical Systems and Turbulence. Lecture Notes in Mathematics (Springer, Berlin Heidelberg), Vol 898, pp 366–381. 24. Deyle ER, Sugihara G (2011) Generalized theorems for nonlinear state space reconstruction. PLoS One 6(3):e18295. 25. Ricker WE (1950) Cycle dominance among the Fraser sockeye. Ecology 31(1): 6–26. 26. Cass AJ, Wood CC (1994) Evaluation of the depensatory fishing hypothesis as an explanation for population cycles in Fraser River sockeye salmon (Oncorhynchus nerka). Can J Fish Aquat Sci 51(8):1839–1854. 27. Grant SCH, et al. (2011) Evaluation of Uncertainty in Fraser Sockeye (Oncorhynchus nerka) Wild Salmon Policy Status Using Abundance and Trends in Abundance Metrics (Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2011/087. 28. Beamish RJ, Mahnken C (2001) A critical size and period hypothesis to explain natural regulation of salmon abundance and the linkage to climate and climate change. Prog Oceanogr 49(1):423–437. 29. Beamish RJ, Mahnken C, Neville CM (2004) Evidence that reduced early marine growth is associated with lower marine survival of coho salmon. Trans Am Fish Soc 133(1):26–33. 30. Beamish RJ, Neville CEM, Thomson BL, Harrison PJ, John MS (1994) A relationship between Fraser River discharge and interannual production of Pacific salmon (Oncorhynchus spp.) and Pacific herring (Clupea pallasi ) in the Strait of Georgia. Can J Fish Aquat Sci 51(12):2843–2855. 31. Preikshot D, Beamish RJ, Sweeting RM, Neville CM, Beacham TD (2012) The residence time of juvenile Fraser River sockeye salmon in the Strait of Georgia. Mar Coast Fish 4(1):438–449. 32. Mantua N, Hare S, Zhang Y, Wallace J (1997) A Pacific interdecadal climate oscillation with impacts on salmon production. Bull Am Meteorol Soc 78(6):1069–1079. 33. Beamish RJ, Neville CEM, Cass AJ (1997) Production of Fraser River sockeye salmon (Oncorhynchus nerka) in relation to decadal-scale changes in the climate and the ocean. Can J Fish Aquat Sci 54(3):543–554. 34. Beacham TD, et al. (2014) Stock-specific size of juvenile sockeye salmon in British Columbia Waters and the Gulf of Alaska. Trans Am Fish Soc 143(4):876–889. 35. Tucker S, et al. (2009) Seasonal stock-specific migrations of juvenile sockeye salmon along the West Coast of North America: Implications for growth. Trans Am Fish Soc 138(6):1458–1480. 36. Hsieh CH, Anderson C, Sugihara G (2008) Extending nonlinear analysis to short ecological time series. Am Nat 171(1):71–80. 37. Peterman RM, Pyper BJ, Grout Ja (2000) Comparison of parameter estimation methods for detecting climate-induced changes in productivity of Pacific salmon (Oncorhynchus spp.). Can J Fish Aquat Sci 57(1):181–191. 38. Haeseker SL, Dorner B, Peterman RM, Su Z (2007) An improved sibling model for forecasting chum salmon and sockeye salmon abundance. N Am J Fish Manage 27(2): 634–642. 39. Haeseker SL, Peterman RM, Su Z, Wood CC (2008) Retrospective evaluation of preseason forecasting models for sockeye and chum salmon. N Am J Fish Manage 28(1): 12–29.

PNAS | Published online March 2, 2015 | E1575

PNAS PLUS

R scripts for the models can be found in Dataset S4. Data files can be found in Datasets S1–S3. All forecasts are made using a fourfold cross-validation procedure. To quantify model performance, we use Pearson’s correlation coefficient (ρ) between observed and predicted returns as a measure of accuracy and MAE as a measure of error. Comparisons of ρ between models uses a one-sided t test with SE calculated using the HC4 estimator from (47) and with adjusted degrees of freedom as suggested by ref. 48. Improvement in MAE is computed using a one-sided paired t test for the difference, treating each forecast as an independent sample. To compute an aggregate statistic combining all nine stocks, we first scale the observations and predictions for each stock so that the observed returns have mean = 0, variance = 1, and then combine the normalized values across stocks. The comparisons of ρ and MAE are done using this combined set of observations and predictions (Fig. 4 and SI Appendix, Fig. S3).

SEE COMMENTARY

[5]

ECOLOGY

^ t = St expðα − βSt Þ, R ^ y−4 · p4 + R ^ y−5 · ð1 − p4 Þ, ^y = R N

40. Grant SCH, MacDonald BL (2012) Pre-season Run Size Forecasts for Fraser River Sockeye (Oncorhynchus nerka) and Pink (O. gorbuscha) Salmon in 2011 (Canadian Science Advisory Secretariat, Ottawa), DFO Canadian Science Advisory Secretariat Research Document 2011/134. 41. Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45. 42. Crutchfield JP, McNamara BS (1987) Equations of motion from a data series. Complex Systems 1:417–452. 43. Christensen V, Walters CJ (2004) Ecopath with Ecosim: Methods, capabilities and limitations. Ecol Modell 172(2):109–139.

E1576 | www.pnas.org/cgi/doi/10.1073/pnas.1417063112

44. Packard NH, Crutchfield JP, Farmer JD, Shaw RS (1980) Geometry from a time series. Phys Rev Lett 45(9):712–716. 45. Sauer T, Yorke JA, Casdagli M (1991) Embedology. J Stat Phys 65(3-4):579–616. 46. Hsieh CH, Ohman MD (2006) Biological responses to environmental forcing: The linear tracking window hypothesis. Ecology 87(8):1932–1938. 47. Cribari-Neto F (2004) Asymptotic inference under heteroskedasticity of unknown form. Comput Stat Data Anal 45(2):215–233. 48. Wilcox RR (2009) Comparing Pearson correlations: Dealing with heteroscedasticity and nonnormality. Comm Stat Simul Comput 38(10):2220–2234.

Ye et al.

Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling.

It is well known that current equilibrium-based models fall short as predictive descriptions of natural ecosystems, and particularly of fisheries syst...
900KB Sizes 0 Downloads 5 Views