Hypothesis testing on the fractal structure of behavioral sequences: the Bayesian assessment of scaling methodology.

Psychological Methods 2013, Vol. 18, No. 4, 514 –534

© 2013 American Psychological Association 1082-989X/13/$12.00 DOI: 10.1037/a0025812

Hypothesis Testing on the Fractal Structure of Behavioral Sequences: The Bayesian Assessment of Scaling Methodology Fermıń Moscoso del Prado Martıń

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Centre National de la Recherche Scientifique at Aix-Marseille Universite´ I, Centre National de la Recherche Scientifique at Universite´ de Lyon II, and Institut Rhoˆne-Alpin des Syste`mes Complexes I introduce the Bayesian assessment of scaling (BAS), a simple but powerful Bayesian hypothesis contrast methodology that can be used to test hypotheses on the scaling regime exhibited by a sequence of behavioral data. Rather than comparing parametric models, as typically done in previous approaches, the BAS offers a direct, nonparametric way to test whether a time series exhibits fractal scaling. The BAS provides a simpler and faster test than do previous methods, and the code for making the required computations is provided. The method also enables testing of finely specified hypotheses on the scaling indices, something that was not possible with the previously available methods. I then present 4 simulation studies showing that the BAS methodology outperforms the other methods used in the psychological literature. I conclude with a discussion of methodological issues on fractal analyses in experimental psychology. Keywords: Bayesian method, fractal scaling, Hurst exponent, hypothesis contrast, stochastic time series Supplemental materials: http://dx.doi.org/10.1037/a0025812.supp

when they are separated by a large number of intervening trials. The correlations do decrease with increasing numbers of intervening trials, but they do so very slowly, following an inverse power law on the magnitude of the separation. This decrease is substantially slower than the exponential decrease in the correlations between successive responses that is generally assumed in psychological experiments. Traditionally, in our field, the dynamic aspects of responses, their trends, and their fine temporal interrelations were considered mostly spurious, and therefore, great effort was devoted to averaging them away. However, evidence for fractal structure in behavioral sequences suggests that those dynamic properties are not at all spurious but rather are fundamental properties of the system under study and, thus, most informative about its nature and organization (Gilden, 2001; Slifkin & Newell, 1998; Riley & Van Orden, 2005; but see also, Farrell, Wagenmakers, & Ratcliff, 2006b; Wagenmakers, Farrell, & Ratcliff, 2004, 2005, for a more skeptical opinion). Since then, fractal scaling (FS) has been observed in a wide range of psychological tasks (Collins & De Luca, 1993; Gilden, 1997, 2001; Gilden et al., 1995; Kello, Anderson, Holden, & Van Orden, 2007, 2008; Moscoso del Prado Martıń, 2011; Shelhamer & Joiner, 2003; Schmidt et al., 1991; Thornton & Gilden, 2005; Van Orden, Holden, & Turvey, 2003, 2005). As was first pointed out by Wagenmakers et al. (2004), many studies arrived at the conclusion that sequences exhibit fractal properties by estimating the value of a parameter, usually the spectral slope, the Hurst exponent, a fractal dimension, and so on. However, something that is well-established in the tradition of experimental psychology is that parameter estimation is not to be taken as hypothesis testing or contrast. For instance, few researchers would accept that the estimated mean reaction time elicited by two cognitive manipulations is different unless some statistical test on the reliability of this difference is provided. However, with

In the last 2 decades, the study of behavior has been enriched by the progressive incorporation of techniques from the nascent field of the theory of complexity. In the early results, researchers noticed that sequences of behavioral events often exhibited the infinite autocorrelation structure that is the hallmark of fractals (Collins & De Luca, 1993; Gilden, 1997; Gilden, Thornton, & Mallon, 1995; Gottschalk, Bauer, & Whybrow, 1995; Hausdorff, Peng, Ladin, Wei, & Goldberger, 1995; Schmidt, Beek, Treffner, & Turvey, 1991). These findings are of notable importance in that they draw attention to the somewhat overlooked issue that behavior is a dynamic process that unfolds in time. Fractal processes (also variously referred to as 1/f␣ noise or long-memory processes) are a type of stochastic processes that exhibits self-similarity: These processes result in sequences whose appearance and statistical properties are preserved irrespective of the time scale at which the sequences are being considered. Such self-similarity implies the presence of long-range correlations; that is, the values of any two observations (i.e., responses) tend to be correlated, even

Fermıń Moscoso Del Prado Martıń, Laboratoire de Physique Corpusculaire, Unite´ Mixte de Recherche 6146, Centre National de la Recherche Scientifique at Aix-Marseille Universite´ I, Marseille, France; Laboratoire Dynamique du Langage, Unite´ Mixte de Recherche 5596, Centre National de la Recherche Scientifique at Institut de Sciences de l’Homme, Universite´ de Lyon II, Lyon, France; and Institut Rhoˆne-Alpin des Syste`mes Complexes, Lyon, France. I am indebted to L. B. Feldman, I. J. Grainger, and B. Tuller for helpful suggestions on a previous version of this article. R code to perform the BAS analyses is provided in the supplemental materials. Correspondence concerning this article should be addressed to Fermıń Moscoso del Prado Martıń, who is now at the Department of Linguistics, Campus Mail Code 3100, University of California–Santa Barbara, Santa Barbara, CA 93106. E-mail: [email protected] 514


BAYESIAN ASSESSMENT OF SCALING

respect to fractals, many of the arguments are made without much statistical backing other than the estimation itself. More importantly, as also noticed by Wagenmakers et al. (2005) in most cases, the results are not systematically compared with any alternative hypothesis that is psychologically plausible. As a solution, Wagenmakers et al. (2005) proposed a model comparison methodology for assessing whether one could accept the hypothesis that FS was present in a time series or whether simple, short autocorrelations could account for the observations just as well. In the same spirit, while refuting some of the postulates of Wagenmakers and his coworkers, Thornton and Gilden (2005) proposed an alternative Bayesian Monte Carlo method for testing the FS hypothesis. Both of these methods constitute advances in terms of improving hypothesis testing for claims on fractal structure in behavior. As theoretical and experimental work on the fractal properties of behavior becomes more common, the issues will soon go further than arguing whether FS is present or not, so as to include specific predictions about the particular values of the scaling parameters across different types of experimental manipulations. Here, the request by Wagenmakers et al. (2005) for explicit hypothesis testing mechanisms will become even more urgent. At present, the few current techniques enabling hypothesis contrast—that of Wagenmakers et al. (2005) and that of Thornton and Gilden (2005)—are not sufficiently developed to contrast hypotheses other that the presence or absence of FS. In this study, I develop a specific methodology—the Bayesian assessment of scaling (BAS)—for deciding whether a time series exhibits FS in a simple, objective, and accurate way. The methodology enables not only assessments of whether FS is present or absent in a series but also contrasts between more finely specified hypotheses on the specific scaling regime exhibited by a behavioral sequence. The BAS relies on developing a complete Bayesian model comparison framework around the variation in values of the Hurst exponent (Hurst, 1951; Mandelbrot & Van Ness, 1968; Mandelbrot & Wallis, 1969). For this, I exploit a fundamental scaling property of fractal time series that I refer to as the scaling property (cf. Scafetta & Grigolini, 2002, and references therein). The BAS is simple and efficient to compute (code is provided in the supplemental materials to perform all the necessary computations) and is at least as accurate as the existing options. The whole derivation of the BAS is fully analytical; therefore, in contrast with previous methods, all computations required are expressed in a simple closed-form (i.e., a set of formulae), the full derivation of which is provided in the Appendix. The methodology can be used generally, without particular assumptions about the nature of the process or the need for parameter tuning or subjective decisions at any stage. In what follows, I present a brief overview of some basic concepts of stochastic fractals and the methods that exist for identifying them. This is followed by a description from a user’s point of view—the mathematical derivation of the expressions is provided in the Appendix— of the computations that are necessary for applying the BAS method. Importantly, I also discuss here some methodological aspects because some recommendations in the psychological literature are incorrect and can thus lead to the incorrect psychological inferences. Next, I discuss three simulation studies investigating the accuracy and reliability of the new method using artificially generated sequences.

515

These are useful for evaluating the method since one can know their properties a priori, including a direct comparison with the other two existing hypothesis contrast methods used in psychology. Finally, I discuss further methodological issues on the assessment of FS in behavioral data.

Overview of the Basic Concepts of FS This section provides an overview of the most basic concepts and relations that are crucial for the fractal analysis of stochastic time series. This summary is necessarily superficial, limited in scope to the concepts and techniques that will be used in the rest of the article. More detailed, but still accessible, introductions to this topic can be found in the articles of Gisiger (2001) and Eke, Hermaán, Kocsis, and Kozak (2002). For detailed comparisons of the different estimators from a psychological perspective, see Delignie`res et al. (2006) and Stroe-Kunold, Stadnytska, Werner, and Braun (2009).

Self-Affinity and the Hurst Exponent The most salient property of deterministic fractals is their selfsimilarity, how their shape does not depend on the scale at which the fractal is represented. In stochastic time series, this fractal self-similarity is relaxed to account for the intrinsically random nature of the data. In this domain, the strict self-similarity property is replaced by a lighter self-affinity. This property was formalized by Mandelbrot and Van Ness (1968), such that for a time series starting at zero [x(t0 ⫽ 0) ⫽ 0], d

x共ht兲 ⫽ hHx共t兲,

(1)

d for any h ⬎ 0, where ⫽ denotes equality in distribution (i.e., the probability distributions of the random variables at the two sides of the equation are identical). Therefore, self-affinity refers to selfsimilarity of probability distributions, which is the most that one can hope for in a variable that is random. The value 0 ⬍ H ⬍ 1 is referred to as the Hurst exponent (Hurst, 1951; Mandelbrot & Van Ness, 1968; Mandelbrot & Wallis, 1969), and it provides an index of the fractal properties of the series. Under the assumption that the series is stationary, that is, that its statistical properties (e.g., mean, variance, etc.) remain constant in time or have stationary increments, the value of H captures the correlational structure in the series. Series with H ⫽ .5 exhibit no memory; each element depends at most on a small number of previous elements. On the other hand, series with .5 ⬍ H ⬍ 1 are fractal series with infinite, persistent autocorrelations, meaning that each element is correlated with every single previous or future element in the series. Notice here that for a stationary and ergodic Gaussian process, there is no question about the existence of a Hurst exponent; the question is whether its value is different from .5 and, thus, fractal. Fractal processes are of increasing importance in many fields of science and have recently received wide attention in behavioral research. They are thought to signal important physical properties of a system in a critical state. These processes are variously said to exhibit FS, long-range memory (LRM), or 1/f␣ spectra, each term stressing one of the characteristic properties that these processes share (i.e., scaling in distribution, infinite autocorrelations, and

MOSCOSO DEL PRADO MARTIN

516

inverse power law Fourier spectra, respectively). For the purposes of this article, I regard the three terms above as interchangeable. H is therefore a diagnostic parameter to determine whether, and in what way, a series exhibits fractal structure. For that reason, much effort has been devoted to developing techniques for estimating its value from empirical data. In stochastic processes for which the variance is defined, Mandelbrot and Van Ness (1968) noted that the self-affinity property should produce a power-law scaling in the variance of the cumulative process, leading to a curious property that had been observed empirically by Hurst (1951) when studying the yearly fluctuations of the river Nile, This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

具y共t兲2 典 ⬀ t2H,

that the property of Equation 4 can be directly used as a Bayesian hypothesis contrast method, hence providing a strict objective criterion for deciding whether an observed sequence exhibits FS.

Common Estimators of Scaling Indices There are a large number of different methods for estimating the scaling regime of a time series (see, Eke et al., 2002, for a comprehensive review). Among these, the most commonly used ones in the psychological literature are • rescaled range analysis (R/S; Hurst, 1951; Mandelbrot & Wallis, 1969);

(2)

where the angular brackets 具䡠典 denote the expected value of a function, and ⬀ denotes directly proportional to. By cumulative process, I refer to the process that results from computing the cumulative sum of a stationary process (note here that a cumulative sum is usually not stationary; the more elements one adds, the higher the variability). In sum, the most salient property of stochastic fractals is that power-law scaling relations, such as those found in the variance, arise in most other statistics of the sequence, such as the autocorrelation function or the spectral density.

• the spectral power density family of estimators (SPD; cf. Eke et al., 2002); • scaled windowed variance (SWV; Peng et al., 1994) and dispersion analysis (DA; Bassingthwaighte & Raymond, 1995); • detrended fluctuation analysis (DFA; Peng et al., 1994) and signal summation conversion (SSC; Eke et al., 2002), which are modifications of SWV and DA particularly suited for dealing with nonstationary sequences; and

The Scaling Property: The Crucial Relation Exploited by the BAS As noticed by Peng et al. (1994), the elements in a stationary time series can be thought of as the fluctuations of a particle following a diffusion process. From this perspective, for any process X(␶) generating values with a stationary distribution fX, one can consider the diffusion process Y(␶) associated with X(␶), which is defined by dY(␶)/d␶ ⫽ X(␶). As what is normally available is not the continuous evolution of X but rather a discretely sampled time series x ⫽ {x(1),. . ., x(N)}, one can instead define the corresponding random walk y on discrete time t as

冘 i

yt ⫽

x共i兲 , t ⫽ 1, . . . , N.

In general, except for those that rely on fitting autoregressive models, all these methods rely on computing a statistic of the sequence across multiple scales (either on the time or on the frequency domain) and then performing a regression in the double logarithmic plane between the value of the statistic and the scale at which it was computed. The slopes of such regressions are directly related to the Hurst exponent (see, Eke et al., 2002, for a conversion table between the multiple scaling indices).

(3)

The Direct Assessment of Scaling

i⫽1

If the generating process X is stationary, what I refer to as the scaling property (Scafetta & Grigolini, 2002) indicates that for large times, the probability density of the walk is p共 y t 兲 ⫽

• methods based on studying autoregressive models, such as the local whittle estimator (cf. Taqqu, Teverovsky, & Willinger, 1995).

冉冊

yt 1 f , t␦ L t␦

(4)

where ␦ is the scaling exponent associated with the time series and fL is a limiting, often unknown, probability density function.1 The scaling exponent is very closely related to the Hurst exponent H, with the particular relation being determined by the shape of the probability density functions fX and fL (Scafetta & Grigolini, 2002). When X is a stationary Gaussian process with long correlations (fX is the normal distribution), it is referred to as fractional Gaussians noises (fGn; Mandelbrot & Van Ness, 1968; Mandelbrot & Wallis, 1969). In contrast, processes that are the cumulative sum of an fGn, such as Y, are referred to as fractional Brownian motions (fBm). For these two types of series, the scaling exponent is identical to the Hurst exponent (␦ ⫽ H). In more complex series, the relation between ␦ and H may be different. In this study, I show

A recently developed group of methods (see Ignaccolo, Allegrini, Grigolini, Hamilton, & West, 2004, and references therein) make direct use of the scaling property in Equation 4. These methods are less broadly used out of the physics domain, but they are important here as the method that I propose relies heavily on them. In Ignaccolo and colleagues’ (Ignaccolo et al., 2004) approach, multiple methods are used for providing candidate values for the scaling parameter ␦. These candidate scaling values do not necessarily reflect a true scaling. To determine whether a scaling regime is real, they proposed a detailed methodology combining multiple techniques. After the estimation step, a final verification is performed using the direct assessment of scaling method (DAS; already used by Peng et al., 1994, but not referred to by any particular name). Unlike the previous methods that test for the 1

This property derives directly from generalized versions of central limit theorem (Gnedenko & Kolmogorov, 1954). The usual version of the central limit theory corresponds to the case of a Gaussian distribution and a scaling exponent of ␦ ⫽ .5.



scaling in different descriptive statistics of the distribution, the DAS method directly investigates the equivalence of the distribution at different time lags; that is, it is a direct test of Equation 4. The method consists of selecting a reference time t0 and overplotting the estimated distribution of the cumulative process at different times; t0 ⬍ t1 ⬍ . . . ⬍ tk. Before plotting, the distribution at each time ti is rescaled at both axes by a factor Ri ⫽ (t0/ti)␦. For time ti, the scale of the horizontal axis is multiplied by Ri and that of the vertical axis is divided by Ri. A scaling is determined to be real if the resulting plot consists of a set of lines that show a good overlap; however, this decision has a large subjective component to it. Ignaccolo and his colleagues (Ignaccolo et al., 2004) argued that the DAS is the most powerful method for testing the scaling property but added that its general application requires the previous identification of scaling candidates. The method I propose is simply a Bayesian formalization of the DAS, removing the subjective decisions on whether the distributions overlap.

Psychological Methods With Explicit Hypothesis Testing As part of a discussion on the presence or absence of FS in human reaction times, Wagenmakers et al. (2004) pointed out that many arguments were being made on the sole basis of the apparent straightness of the Fourier spectrum in double log scale of sequences of reaction times. They argued that comparing the FS hypothesis only to the uncorrelated white noise represents rather weak evidence in support of FS. They argue that instead it is necessary to compare the FS hypothesis with more psychologically plausible ones, particularly allowing the presence of transient correlations with finite correlation length (i.e., the elements of the sequence at different lags are correlated between them up to a finite lag, as opposed to completely uncorrelated white noise). As a candidate alternative model of this type, they proposed using autoregressive moving average models (ARMA). ARMA models exhibit finite correlation lengths (this is, they are not fractal) but can nevertheless often appear to mimic 1/f ␣ Fourier spectra, especially so for sequences of limited length, as is often the case in experimental psychology (but see also Thornton & Gilden, 2005, for arguments that the ARMA model may not be very plausible even as a psychological model). In contrast to the transient correlations, Wagenmakers et al. (2004) proposed using as a model of a fractal a fractional autoregressive integrated moving average (fARIMA)2 model to represent the FS case. A fARIMA model is similar to an ARMA, with an additional parameter d that controls the degree of long-range correlations present in the data (see Haslett & Raftery, 1989, for details). For Gaussian processes— corresponding to fARIMA (0, d, 0)—the d parameter is directly related to the Hurst exponent. Wagenmakers and his colleagues proposed that in order to decide whether FS is really present in a time series, one should fit ARMA and fARIMA models for a fixed range of their parameters and conclude that FS is present if the best fitting model is of the fARIMA type; it is taken to be absent if the best fit is provided by an ARMA model. In their suggestion, the goodness of the fits is assessed by evaluating either Akaike’s information criterion (AIC; Akaike, 1974) or the Bayesian information criterion (BIC; Schwarz, 1978). Between any two models, the one with the lowest criterion (either AIC or BIC) is considered the better one.

517

Thornton and Gilden (2005) remarked that the test of Wagenmakers et al. for the FS hypothesis was too stringent. They argued that a candidate model to provide a less biased test of FS should take into account that real psychological data will be contaminated by white thermal noise of many origins; thus, it will be rarely a pure case of FS. Therefore, they proposed using the whitened fractional Brownian motion (wfBm) model. The wfBm model assumes that the series of interest is a weighted sum between an uncorrelated white noise thermal contamination and an fGn, when the relative amplitude of the fGn is controlled as a parameter. This type of process gives rise to Fourier spectra that in double logarithmic scale show a straight line part at the low frequencies but that are attenuated toward a flat line in the higher frequencies. The point where the attenuation begins is governed by the relative amplitude of the two components. In this way, the comparison is now between two 2-parameter models. In order to compare both types of models, they introduced the spectral classifier method. This method has a machine learning approach to the problem. From an artificially generated set of examples, two large probability distributions (approximated as multidimensional Gaussians) are fitted to approximate the distribution of possible shapes of fractal and nonfractal Fourier spectra. Classification of a new time series is then performed by comparing the log-likelihoods of the Fourier spectrum of the new series according to the two general distributions. While constituting significant advances in the direction of hypothesis testing, the methods of Wagenmakers et al. (2005) and Thornton and Gilden (2005) retain some difficulties. Wagenmakers et al.’s (2005) ARMA/fARIMA contrasts depend on the success of many autoregression model fits, and the algorithms to fit these are known to be quite unstable, often leading to optimization failures and making the technique difficult for general use. In addition, the technique requires defining a priori the possible structures of both fractal and nonfractal series, not leaving much room for other models that either are unknown or just have more parameters. With respect to the parameters, the model represents fractal processes as having one more degree of freedom than nonfractal ones. Of course, as the AIC and BIC weight heavily each additional degree of freedom, this may result in a bias against fractal processes. In fact, it is not clear that it is statistically justified to assume that fractal processes are more complex than nonfractal ones in terms of degrees of freedom. Also, as noted by Thornton and Gilden (2005), this method assumes that the only type of FS structure that can be present in a psychological series is of the kind that fARIMA series can generate, but this is not necessarily the case. Note here that Farrell et al. (2006b) lodged the same criticism against Thornton and Gilden’s (2005) spectral classifier: Why would all psychologically plausible fractals need to be wfBm? For this reason, in both cases, it seems that the methods suffer from rigidity to their own assumptions. Their performance will depend on the sequences under study approximating the specific models on which the methods were explicitly trained or tested. Finally, both techniques are rather complex to implement and have a substantial number of free parameters (which distributions to include, which models to fit, how to fit them, how many points to include in the spectra, etc.). 2

Also referred to as ARFIMA in the literature.


518 The BAS Method

Here I introduce the basic idea of the BAS, and I describe the computations that are necessary for applying it to a behavioral sequence. For the sake of clarity, the core of the derivations that lead to these expressions is omitted here but appears in full detail in the Appendix. R code implementing the methods introduced here and instructions for using it are provided in the supplemental materials.


Determination of the Evidence The main problem is to determine whether a time series of N discrete samples, x ⫽ {x(1), . . ., x(N)}, exhibits FS. In a more general formulation, the question that an experimenter will face is to objectively determine which hypothesis, or , better describes the scaling exponent of one or more given sequences of behavioral measurements. The hypotheses themselves can be defined either as single values of the scaling exponent (simple hypotheses) or as intervals of possible values (composite hypotheses). This amounts to determining log ratios between the probabilities for each of the hypotheses given the data (denoted by ␭1,2). If this is greater than zero, then the data support hypothesis more strongly than hypothesis , and the reverse holds if it is smaller than zero. By Bayes’ theorem, ␭1,2 can be decomposed into the sum of the a priori information ␣1,2 (obtained by previous experience), and the evidence can be provided by the actual sequence under study ᐉ1,2共x兲; p共 ␭ 1,2 共x兲 ⫽ ln p共

p共x/ 兲 ⫹ ln 兲 p共x/

) ⫽ ␣1,2 ⫹ ᐉ1,2 共x兲. )

(5)

In this equation, the key term is the log-likelihood ratio ᐉ1,2 共x兲, which represents the amount of support for either hypothesis that the data themselves provide. The term ␣1,2 corresponds to our prior preference for either hypothesis, which on the absence of knowledge we can set to zero (as I explain below, we do not even need to worry about the complexity or degrees of freedom of the hypotheses, the method automatically takes this into account). Therefore, we concentrate only on estimating ᐉ1,2 共x兲. The first step is to compute the running sums yt of all possible orders t ⫽ 2, . . ., N.3 The tth order running sum of x is

冘

ᐉ i, j (x) ⫽

x共 j兲.

冘 N

ᐉi, j 共yt 兲.

(7)

t⫽2

冘冉冊

N⫺t⫹1

ᐉ 1,2 (x兩t) ⬇

N共t⫺2␦ 1 ⫺ t⫺2␦ 2兲 2t共N ⫺ t ⫹ 1兲

i⫽1

N yt 共i兲2 ⫹ 共␦2 ⫺ ␦1 兲ln t, t (8)

which is just computed for each summing order and plugged into Equation 7 to solve the problem. One simple and one composite hypothesis. In a more general case when a candidate value for the scaling exponent ␦ is not available but rather a range of values can be specified, our task consists of comparing the simple hypothesis ⬅ ␦ ⫽ ␦0 and the composite hypothesis ⬅ ␦1 ⱕ ␦ ⱕ ␦2. Note that this includes the typical case of an hypothesis that the data exhibit long persistent correlations ( ⬅ .5 ⬍ ␦ ⱕ 1) versus a null hypothesis that they are just not fractal ( ⬅ ␦ ⫽ .5). In this case, for a Gaussian process, the expression for the evidence in favor of provided by summing order t ⬎ 1 is well approximated by

冘

N⫺t⫹1

(6)

j⫽i

For this, we slide a window of width t through the original series, and we sum the values within each application of the window. Repeating this process for t ⫽ 2, . . . , N, we obtain N ⫺ 1 new series yt each of length N ⫺ t ⫹ 1. This process, for a series of 10 elements, is illustrated in Table 1. The columns yt represent the running sums at different orders corresponding to the original sequence x. These running sums can be taken as samples at time t of an ensemble of N ⫺ 1 trajectories (the rows in Table 1) from a diffusion process (Scafetta & Grigolini, 2002; Ignaccolo et al., 2004), and the scaling condition in Equation 4 should apply to them. By exploiting this property, the expressions below for each particular case provide a direct analytical estimate of the log-likelihood ratio provided by each

2 N⫺1

From here, one just needs to compute the ᐉi, j 共yt 兲 for all summing orders. The expression for ᐉ1,2 共x兩t兲 depends on the type of hypotheses being compared. There are three possibilities: Two simple hypotheses. The simplest case is when one is interested in comparing two a priori candidate values for the scaling exponent. This is, we want to compare two simple hypotheses, ⬅ ␦ ⫽ ␦1 and ⬅ ␦ ⫽ ␦2. As is detailed in the Appendix, in a standard Gaussian process (i.e., the data follow a standard normal distribution of zero mean and unit variance), the log-likelihood ratio in this case is very well approximated by

ᐉ 1,0 (yt ) ⬇

i⫹t⫺1

yt共i兲 ⫽

summing order, which I denote by ᐉ i, j 共x兩t兲 ⫽ ᐉ i, j 共yt 兲. Then, after computing the likelihood ratio from each summing order, all orders can be recombined to obtain the final log-likelihood ratio provided by the series using the expression (the derivations for this expression and the ones to follow are provided in the Appendix)

N t共N ⫺ t ⫹ 1兲

i⫽1

ln

⌽关 yt 共i兲t⫺␦ 1兴 ⫺ ⌽关 yt 共i兲t⫺␦ 2兴 yt 共i兲␾关 yt 共i兲t⫺␦ 0兴 ln t ⫹

N t␦ 0 , ln t ␦2 ⫺ ␦1

(9)

where ␾ is the standard, normal density function and ⌽(x) ⫽ x 兰⫺⬁ ␾(z)dz is its cumulative probability function. Once more, this is computed for all summing orders and used in Equation 7. Two composite hypotheses. Finally, the fully general case is when both hypotheses are given by intervals, that is ⬅ ␦1 ⱕ ␦ ⱕ ␦2 and ⬅ ␦3 ⱕ ␦ ⱕ ␦4. For a Gaussian series, the log-likelihood in favor of over provided by order t is approximated by 3

In practice, it is usually sufficient to compute a sample of these across the full range and then interpolate the results.


519

Table 1 Illustration of the Computation of the Running Sums


‘ x

y2

y3

y4

y5

y6

y7

y8

y9

y10

5 ⫺4 ⫺4

1 ⫺8 ⫺18

⫺3 ⫺22 ⫺10

⫺17 ⫺14 1

⫺9 ⫺3 5

2 1 21

6 17 17

22 13 43

18 39 —

44 —

ⴚ14

⫺6

5

9

25

21

47

—

8 11 4 16 ⴚ4 26

19 15 20 12 22 —

23 31 16 38 —

39 27 42 —

35 53 —

61 —

—

Note. The first column (x) is the original sequence. The remaining columns are the running sums (yt) of order t. The rows of the matrix can be taken as an ensemble of diffusion trajectories, which are just cumulative sums of the original sequence starting at each of its elements; for instance, the row highlighted between thin lines is obtained as the cumulative sum of the elements in bold type.

冘

N⫺t⫹1

N ᐉ 1,2 (yt ) ⬇ t共N ⫺ t ⫹ 1兲

i⫽1

⌽关 yt 共i兲t⫺␦ 1兴 ⫺ ⌽关 yt 共i兲t⫺␦ 2兴 ln ⌽关 yt 共i兲t⫺␦ 3兴 ⫺ ⌽关 yt 共i兲t⫺␦ 4兴 ⫹

N ␦4 ⫺ ␦3 , ln t ␦2 ⫺ ␦1

(10)

which is, as before, used in combination with Equation 7 to obtain the log-likelihood ratio across all orders. Equation 10 illustrates how the log-likelihood ratios already incorporate a natural penalty for hypothesis complexity. Generally, the wider the interval that a hypothesis predicts, the worse or less precise is the prediction associated with that hypothesis. As is made evident in the Appendix, for composite hypotheses, the expressions for the log-likelihoods are obtained by integrating across the whole interval. This integration averages the loglikelihood across the whole interval. Therefore, a large interval will be heavily penalized, even in the case when the interval actually contains a good fit, by the parts of the interval that provide a worse fit. This is what the last term in Equation 10 expresses. Notice that the term inside the logarithm corresponds to the reciprocal of the ratio between the interval widths (i.e., the width corresponding to is in the numerator). Therefore, the wider hypothesis will be penalized, and this penalty will increase linearly with sample size. This is a more detailed equivalent of the penalty terms for the degrees of freedom that is used in the AIC and BIC criteria; in our case, both models have equal degrees of freedom, but the wider interval models are certainly more complex.

order (ᐉ关yt 兩␦兴) as a function of ␦. It is shown in the Appendix that ᐉ关yt 兩␦兴 peaks around a single maximum for all values of t ⬎ 1. For standard Gaussian processes, the location of this maximum, which provides the maximum likelihood (ML) estimator for ␦0, is accurately approximated by

冉冘

ln ␦ˆ 0 共t兲 ⫽

冊

y (i)2 ⫺ln共N ⫺ t兲

N⫺t⫹1 t i⫽1

⫽

2 ln t

ln ␴ˆ t , ln t

(11)

where ␴ˆ t denotes the estimated population standard deviation of the tth order running sum, 1 s ⫽ N⫺t⫺1 2 t

冘

N⫺t⫹1

y t 共i兲 2 .

(12)

i⫽1

The error of the estimator is approximated by ε共t兲 ⫽

冑

t 1 . ln t 2N

(13)

As is shown in the Appendix, the estimated error is always minimal at t ⫽ 7. Hence, it suffices with computing these expressions for the seventh running sum, and the ML estimator for ␦ is ␦ˆ 0 (7), with an error of ε (7).4 The ␦ estimate is the estimation of the Hurst exponent H provided by the method.

Practical Issues Estimation of the Scaling Exponent When one has sufficient evidence for the presence of FS, it is often necessary to estimate what the most likely value of the scaling exponent will be. A simple way to obtain such an estimate is to find the value ␦ˆ that maximizes the posterior probability of the exponent given the observed data. The simplest is then to assume a uniform prior for ␦, and the problem reduces to finding the value ␦ 0 that maximizes the logllikelihood of the data given an exponent. The scaling condition provides an expression of the log-likelihood for each summing

Units of evidence and significance. Something that has often deterred psychologists from using Bayesian inference methods is the different way in which the evidence for an hypothesis is presented, in relation to the still more commonly used frequentist methods. This is often characterized by questions like “yeah sure, 4

A more standard approach could have been to combine the ML estimators obtained from all summing orders. However, I experimented with this, and it gave considerably worse results than the approach used here.


520


but where are the p values?” (Unfortunately, often such questions come from journal editors and reviewers, creating an addition hurdle for publication that discourages potential users of Bayesian methods in our field). A useful criterion for evaluating the significance of the evidence favoring one hypothesis5 is to use the simplified version of Jeffreys’s evidence scale provided by Kass and Raftery (1995). The first step is to convert the evidence values into decibel units (dB).6 If Ee is the evidence obtained in natural logarithm units that is provided by the equations above, the same evidence in dB is given by EdB ⫽ 10Ee/ln 10 ⬇ 4.3429Ee. In dB units, the scale given by Kass and Raftery (1995) is • 兩EdB兩 ⱕ 5 dB: not worth more than a bare mention, • 5 dB ⬍ 兩EdB兩 ⱕ 10 dB: positive, • 10 dB ⬍ 兩EdB兩 ⱕ 20 dB: strong, and • 兩EdB兩 ⬎ 20 dB: very strong. Of these, the third level (strong) can serve the role of the usual p ⬍ .05 significance level, and the second (positive) would be used in the place of the marginally significant level of p ⬍ .1. Notice the absolute value signs of the evidence. By this, I am signaling that this applies to evidence for either of the hypotheses (i.e., positive or negative values). When 兩EdB兩 ⱕ 5 dB, we can only conclude that the data are not sufficient to draw a strong conclusion in either direction. Stationarity and normalization. The method described here assumes stationary stochastic time series following a normal distribution. If the assumptions are not met, the data first needs to be preprocessed with methods to ensure stationarity (see, e.g., Gilden et al., 1995; Ignaccolo et al., 2004; von Bu¨nau, Meinecke, Kira´ly, & Mu¨ller, 2009, for some methodologies). Using this preprocessing, the method can be applied to most time series that occur in behavioral sciences. Particularly important types of nonstationarities are those processes that correspond to cumulative sums (i.e., fractional Brownian motions instead of fractional Gaussian noises; Mandelbrot & Van Ness, 1968). The BAS method is only valid for fractional noises. In the cases in which the process in question is a cumulative sum (i.e., a motion; the value of each element is the sum of all preceding values plus some additional increment), before any further processing (i.e., before removing other nonstationarities), one would have to differentiate the process, that is, substitute the original sequence by the sequence of the differences between consecutive elements (see, Eke et al., 2002, for a detailed discussion on how to distinguish a noise from a motion). For instance, if one is analyzing sequences of eye movements, one should not use the BAS to analyze the raw sequence of eye positions; instead, one should take the actual movements (i.e., the differences between one position and the next) as the sequences to be analyzed by the BAS. Psychological data rarely follow a simple normal distribution, so coercing the sequences into normality will often be a necessary preprocessing step. One can construct a surrogate series that preserves the autocorrelation structure and possible FS of the data but that is normally distributed. A simple way to accomplish this is to use the first step of the amplitude adjusted

Fourier transform method of generating surrogates (AAFT; Theiler, Eubank, Longtin, Galdrikian, & Farmer, 1992).7 This method constructs a surrogate whose Fourier spectrum is matched to the original after rescaling to a desired distribution. It consists in sampling N points from a standard normal distribution and rearranging them according to the ordering of the original data. For this, one computes the ranks of the original series and reorders the normal sample according to the ordering of these ranks. The resulting series preserves the structure of the original but is normally distributed, constituting a standard fGn on which the expressions above can be safely applied. This tranformation enables the use of the BAS method in a nonparametric manner. As a general rule, I would recommend blindly applying this transformation to the data, even when one thinks that they are normally distributed (in any case, for normally distributed data, the transformation is harmless). Bias. It is important to note here that the expressions provided above for estimating the evidence and the exponent are both slightly biased against departures from the white noise situation. The reason is that in all cases, the limit distribution fL of Equation 4 has been approximated by the actual distribution of the data. In Gaussian cases, this is nearly exact for ␦ ⱕ .5, and its error increases with ␦ for ␦ ⬎ .5, giving rise to a bias against fractal noises. This bias is however very small, as I show in the simulations; for values of ␦ ⬍ .9, the BAS estimates, although biased, are more accurate than those provided by the other known methods. For .9 ⱕ ␦ ⱕ 1, the performance of the BAS deteriorates, but its accuracy is equivalent to other common methods. In fact, I take this bias as a providing a slight conservative tendency to the technique, which is from some perspectives desirable. Finally, as the cases in which performance deteriorates are rather far from the white noise situation, this bias will not affect the criterion for deciding whether a series exhibits FS in any substantial way but will only result in a slight underestimation of its scaling exponent. Dependence on prior assumptions. The accuracy of the method depends to some degree on the choice of prior for the values of ␦[p(␦) ⫽ 1/(␦2 ⫺ ␦1)]. However, if the analyst cannot 5

In fact, something similar to a p value can be obtained from the 1 evidence in natural units Ee. The relation p ⫽ indicates the exp (Ee)⫹1 support for an hypothesis in terms of probability, and 1 ⫺ p indicates the support for the other. However, I strongly recommend using the evidence values directly; these p values have a wholly different interpretation than the traditional ones, as in this case, both tails of the distribution are meaningful in opposite ways (i.e., they can actually provide support for the “null” hypothesis). Therefore, the values obtained by this method should not be directly equated to traditional p values. 6 Decibels are sometimes also referred to as decibans, the name used by the Bletchley Park group during World War II. Both terms refer to the same thing, dimensionless magnitudes expressed in tenths of a base 10 logarithmic scale. The origin of the decibel scale is actually psychophysical; half a decibel is thought to approximate the minimum contrast discernible by humans (cf. Jaynes, 2003). 7 As noted by Schreiber and Schmitz (2000), this method may have a small bias toward flat spectra in some cases. If this were suspected to affect the results, one can revert to the iterative method of Schreiber and Schmitz to correct for this.



provide any better definition about the hypothesis he or she wants to test on the value of ␦ further than it lying within some interval, this is the exact mathematical transcription of that information. If, on the other hand, the analyst had given more detail on the hypothesis, saying, for instance, that it not only lies within an interval but also is most likely to be around the center of that interval with some expected error, one can incorporate this knowledge into the equations by setting a normal prior centered in the middle of the interval, and the integrals would still be solvable (but a bit more involved). In any case, even with a moderate amount of data, the choice of prior—as long as it is not zero—is not going to have any noticeable effect on the final results, as it will be very quickly overwhelmed by the data.

Summary of the BAS Method I provide here a summary of the steps that need to be followed for analyzing a sequence using the BAS. These steps are implemented in the code provided in the supplemental materials. Step 0. By theoretical predictions or previous experimental results, define the hypotheses one wants to contrast. Step 1. Decide whether the sequences correspond to motions or to noises. If they are motions, differentiate them. See Eke et al. (2002) for a methodology. Step 2. Remove further nonstationary trends from the sequences. See Ignaccolo et al. (2004) for a methodology. Step 3. Coerce the sequences into a normal distribution with zero mean and unit standard deviation. Step 4. Compute the running sums (usually, for long sequences, it suffices to compute them for a grid of the possible orders and interpolate the results at the remaining points). Step 5. Compare the hypotheses using the equations provided. Step 6. If one has found support for a FS hypothesis, then compute the ML estimator of the exponent (otherwise, one should directly conclude that most likely H ⫽ .5).

Simulations Simulation 1: Accuracy of the BAS Relative to Other Methods In order to explore the power and accuracy of the BAS criterion and the associated H estimator, I generated 2,500 artificial fGn series8 of lengths randomly chosen between 128, 256, 512, 1,024, and 4,096 elements and real values of H ⴝ ␦ uniformly sampled from the (0, 1) interval. Figure 1 plots the results of using the BAS criterion to compare the theory that FS is present or absent. For persistent series, I tested the hypothesis of a persistent fractal process (.5 ⬍ ␦ ⬍ 1) with the null hypothesis of ␦ ⫽ .5, and for antipersistent sequences, I tested 0 ⬍ ␦ ⬍ .5 versus the white noise hypothesis. The figure shows that when the real value of ␦ was

521

either lower than .32 or higher than .63 (these limits are plotted by the dashed vertical lines), the BAS clearly detects the FS. However, for values of ␦ within the interval [.32, .61] (see the focus on the right panel), the BAS strongly supports the null hypothesis. This means that in this region, the theory of ␦ ⫽ .5 presents a better description of the data than stating that nothing is known about ␦. Notice that this does not imply that ␦ ⫽ .5 is the correct scaling but rather that it is the best among the theories compared. Choosing a better specified theory to test against would enable contrasting hypotheses with higher resolution. This shows one of the main advantages of the BAS: It permits the comparison of theories of different complexities and naturally weighs the theories by their quality (i.e., stating that nothing is known about ␦ is a very bad theory, so any not too bad approximation should be considered an improvement). Figure 2 compares, for a random sample of 500 of the sequences above, the accuracy of the estimator proposed here with some of the most commonly used estimators,9 including DFA (Peng et al., 1994), SWV, and SPD (in their respective improved variants SSC and lowSPDw,e, as recommended by Eke et al., 2002). Already for series of mild length, the BAS estimator shows a substantially more accurate estimation than all other methods—in terms of both error and consistency— across all values. Only for H ⬎ .9 did the performance of the BAS estimator decrease, and even then, it was not significantly worse than the other estimators.

Simulation 2: Robustness to Other Noises With Transient Correlations The next question that arises is that real world time series are hardly ever “clean” examples of FS; rather, the FS structure is usually corrupted by noise of different types. Of especial importance are short-term correlations that sometimes can give the false impression of FS or can mask it away. To investigate the robustness of the BAS to this type of signal contamination, I generated 1,000 fGn time series of length 1,000 with H ⫽ .75 that were contaminated by random samples from a simulated ARMA (2, 1) process. The proportion of signal amplitude driven by ARMA was randomly sampled from (0, 1). Figure 3 plots the resulting evidence (comparing .5 ⬍ ␦ ⬍ 1 with ␦ ⫽ .5) and the error in estimating ␦ for each series. The BAS robustly detects the presence of FS up to the point when the ARMA component accounts for more than half of the signal power. The contamination drives down ␦ˆ , and when this estimator goes below the .61 limit mentioned above, a plain short-term correlation provides a better account of the data.

8 All fGn series in the three simulations were generated directly from the FFT by the algorithm of Beran (1994), using the R (R Development Core Team, 2005) implementation available in function fgnSim of package fArma. Briefly, the method samples frequencies from the theoretical spectrum of a fractal process, adds noise to them, and transform the frequencies into a sequence in the time domain using the reverse FFT algorithm. 9 A wider range of estimators were tested; I only report the three that performed best on this data set.


522


Figure 1. Performance of the BAS criterion in detecting FS for 2,500 artificially generated fGn series of increasing length (from top to bottom). The left panel plots the overall results, the right panel zooms into the results for the lower values of the Hurst exponent. The solid lines plot nonparametric regression smoothers. The horizontal dashed black lines indicate the ⫾10 dB “significance” threshold, and the grey dashed vertical lines highlight where the BAS method never detects the scaling (between .32 and .61). BAS ⫽ Bayesian assessment of scaling; db ⫽ decibel; FS ⫽ fractal scaling; fGn ⫽ fractional Gaussians noise.

Simulation 3: Comparison With the Other Hypothesis Contrast Methods Above, I have showed that the BAS detects the presence of FS in fGns with or without the presence of noise, even when this noise exhibits transient correlations. Also, the H estimator is more accurate than the estimates obtained by other methods. The reliability of the BAS as a criterion for diagnosing the presence or absence of FS remains to be compared with the other two hypothesis testing mechanisms that have been proposed, Wagenmakers et al.’s (2005) fARIMA/ARMA regression contrast and Thornton and Gilden’s (2005) spectral classifier. To investigate this, I generated 1,000 random sequences of 1,024 elements, 500 of which were not fractal (i.e., H ⫽ .5) and 500 were fractal (i.e., H ⫽ .75). The fractal and nonfractal subsets were further subdivided into five groups of 100 sequences: 100 were pure fGn (just plain white noises in the H ⫽ .5 cases), 100 were generated from an ARMA (2, 1) process in the nonfractal case or from a fARIMA(2, 1) process in the fractal case (the coefficients of the ARMA and fARIMA models were the same pairwise), and 100 were generated from the noisy fractal wfBm process of Thornton and Gilden (2005; which, in the nonfractal case, is again a plain white noise) randomly sampling the white noise amplitude ␤ from the (0, 1) interval. The parameters ␣2 and

b1 for each ARMA and fARIMA process were randomly sampled from the (⫺.5, 0) uniform interval, and the coefficient ␣1 was sampled from the (0, 1) interval. Finally, to investigate the performance of the ARMA/fARIMA contrast and the spectral classifier when dealing with the types of series in which they have not been explicitly trained or tested, I generated two types of sequences designed to create problems for those methods. First, I generated 100 sequences (each fractal and nonfractal) from an fGn, and I transformed them by taking the reciprocal of the values. This transformation produces a distribution that models well human reaction times (the recinormal distribution; cf. Reddi & Carpenter, 2000). This produces series that are without any transient correlations, that is, ARMA (0, 0) or fARIMA (0, .25, 0), but that do not conform to the distributions that ARMA or fARIMA models generate. On the other hand, as the spectral classifier only relies on the spectrum of the series, its performance should not be particularly affected by this transformation. A second group of series were generated from a fractal fARIMA (4, .25, 2) or a nonfractal ARMA (4, 2) process, and its elements were transformed by taking their cube. Furthermore, to make all these sequences (both the fractal and the nonfractal) appear to be just nonfractal, transientcorrelation series, I chose the parameters of the ARMA/fARIMA as those that would fit best the ARMA (2, 1) series generated



523

Figure 2. Comparison of the error of Hurst exponent estimators for 500 artificially generated fGn series of increasing length (from top to bottom). The boxes plot the interquartile range, with the median marked, and the whiskers cover 95% of the observed values. BAS ⫽ Bayesian assessment of scaling; FS ⫽ fractal scaling; DFA ⫽ detrended fluctuation analysis; SDF ⫽ spectral density function; SWV ⫽ scaled windowed variance.

above. This distribution is problematic for the ARMA/fARIMA contrast for two reasons. The first is that as above, the nonlinear transformation generates a distribution that cannot be generated by a linear autoregressive model. The second is that even in the cases in which the series is fractal, it comes from a more complex model than those tested by the method, and it masks as a nonfractal ARMA process. Therefore, the preference for smaller and nonfractal models of the contrast should lead to trouble with these sequences. These latter series are also problematic for the spectral classifier, as their spectral properties do not fit well with those on which it was trained. In addition, all test sequences were centered in zero and scaled to have unit standard deviation. To each of these series, following Wagenmakers et al. (2005), I fitted10 18 regression models—nine ARMA (p, q) and nine fARIMA (p, q)—with all possible combinations of p and q values

10

The fARIMA processes were fitted using the method of Haslett and Raftery (1989) implemented in the R (R Development Core Team, 2005) package fracdiff. As Wagenmakers et al. (2005) did, the ARMA models were fitted by ML, using the function ARIMA from the R statistical environment (Wagenmakers et al., 2005, used a different implementation of the same method). In addition, as the ML method very often crashed due to singularities, in these cases the method reverted to using fracdiff, fitting a fARIMA but forcing the d parameter to keep values below 10⫺8, as was done by Farrell et al. (2006a). A reviewer has suggested that the results could be better using the commercial package Ox, and this may well be the case. In my opinion, that the results can vary so significantly (even when using the same algorithm) from one software package to another is a witness to the relative complexity and instability of the ARMA/fARIMA contrast method.


524


Figure 3. Robustness of the Hurst estimation (left) and BAS criterion (right) for 1,000 artificial fGn series (H ⫽ .75) of length 1,000 with different proportions of contamination by an ARMA(2, 1) process. The solid lines are nonparametric smoothers, and the dashed horizontal lines plot the ⫾ 10 dB threshold. BAS ⫽ Bayesian assessment of scaling; fGn ⫽ fractional Gaussians noise; FS ⫽ fractal scaling; ARMA ⫽ autoregressive moving average; dB ⫽ decibel.

ranging from 0 to 2. For each of the regressions, I computed both the AIC and BIC criteria. As indicated by Wagenmakers et al. (2005), I classified the series according to each information criterion (IC); a series was labeled as fractal if the model exhibiting the lowest IC value was of the fARIMA type and was labeled as not fractal if the lowest IC was obtained by an ARMA regression. The series were also classified with the implementation of Thornton and Gilden’s (2005) spectral classifier made available by Farrell, Wagenmakers, and Ratcliff (2006a).11 Finally, each series was also labeled as fractal/nonfractal according to the BAS, comparing the hypotheses H0 ⬅ ␦ ⫽ .5 and ⬅ .5 ⬍ ␦ ⬍ 1. Prior to applying the BAS, all series were normalized by the AAFT method described above. Table 2 summarizes the results, given in percentage of correct classification, of the model comparisons. Both the BAS, and the ARMA/fARIMA contrast (using the BIC criterion) performed rather well, with a slight advantage for the BAS method (90% vs. 88%) that was significant according to a logistic regression model (Wald’s z ⫽ ⫺10.11, p ⬍ .0001). On the other hand, the performance of the spectral classifier (69%) or of the ARMA/fARIMA contrast using the AIC criterion (72%) were less than acceptable. Separate examination of the results for the fractal and nonfractal sequences reveals that the spectral classifier has a bias against labeling a sequence as fractal, whereas the ARMA/fARIMA contrast tends to be too liberal in classifying sequences as fractal. The mild difference between the performance of the BAS and the ARMA/fARIMA contrast arises mostly from their performance on the two additional types of series that cannot be modeled by plain ARMA or fARIMA processes. On the recinormal series, the ARMA/fARIMA contrast showed a slightly worse performance (99% vs. 87%) but was still on more than acceptable levels. The very complex cubic ARMA/fARIMA (4, 2) sequences were problematic for both methods, again with an advantage for the BAS

(75% vs. 65%). On the other hand, the ARMA/fARIMA contrast did exhibit a slight advantage over the BAS for the particular type of series on which this contrast was tested, pure ARMA/fARIMA (2, 1) sequences (93% vs. 90%), but the performance of both methods was excellent for those sequences. An additional advantage of the BAS method is that it not only provides a classification with respect to the hypotheses compared but also provides an indication of how certain we should be on our assessment of the hypotheses. As discussed above, values of evidence below 20 dB should not be taken as conclusive but rather indicate that the evidence provided by the data is weak, and one should perhaps refrain from making strong inferences. This is illustrated by Figure 4. The figure plots the results of the classification, separating those cases when the evidence is too weak (labeled “indecisive”). Notice that except for the nonfractal cubic process, the indecisive responses account for almost all cases that were labeled as misses or positives in Table 2. This is to say, if one had only made inferences when sufficient evidence was available, the classification results would have been virtually perfect. However, the nonfractal cubic processes still show a considerable amount of false positives, indicating that with sufficient complexity (abundance of short-term correlations of many types, paired with unusual probability distribution), it is still possible to fool the method. The performance of the spectral classifier was remarkably bad outside the domain of sequences on which it had explicitly been trained, and even for those, its performance lagged behind that of the other methods; I should note that the comparison with the spectral classifier presented here should be taken more as illustrative rather than as strict. As in the ARMA/fARIMA contrast, there 11

The code is available from http://seis.bristol.ac.uk/⬃pssaf


525

Table 2 Results From Simulation 3 ARMA vs. fARIMA


H

Type of sequence

BAS %

AIC %

BIC %

Spectral classifier %

99 85 100 100 58

64 71 59 63 59

91 95 97 83 73

87 100 89 83 85

.5

fGn (white noise) ARMA(2, 1) wfBm (white noise) Recinormal white noise Cubic ARMA(4, 2)

.75

fGn fARIMA(2,1) wfBm recinormal fGn cubic fARIMA(4,2)

99 95 77 99 92

87 95 87 66 69

99 92 98 92 57

69 22 67 69 19

Total Total Total Total

92 8 88 12

81 19 63 37

88 12 88 12

49 51 89 11

90

72

88

69

hits (% of fractals) misses (% of fractals) correct rejections (% of non-fractals) false positives (% of nonfractals)

Total correct

Note. Summary of the accuracy (percentage of sequences correctly classified) of the different hypothesis testing methods in classifying whether a series is fractal. BAS ⫽ Bayesian assessment of scaling; ARMA ⫽ autoregressive moving average; fGn ⫽ fractional Gaussians noise; fARIMA ⫽ a fractional autoregressive integrated moving average; wfBm ⫽ whitened fractional Brownian motion; AIC ⫽ Akaike information criterion; BIC ⫽ Bayesian information criterion.

are a large number of free parameters in the method itself, and its implementation is rather complex and computationally very expensive.12 I therefore suspect that the results in Table 2 may not do full justice to this method. The theoretical inference tools are the same in the BAS and the spectral classifier, and hence, I should expect that with the adequate choice of training parameters, one could eventually obtain very similar results to those of the BAS, at least for sequence of the wfBm type.

Simulation 4: Repeated Measurements In most experimental situations, common in psychological research, it is difficult if not outright impossible to collect sequences as long as 1,000 elements. However, it is often possible to collect multiple realizations of the same type of sequence, either from the same person or from multiple people. Here, I investigate whether the BAS is suitable to analyze such types of repeated measurements designs. The simulations above suggest that although the BAS is generally accurate, for series shorter than 500 elements, its performance decreases. As I discussed when introducing the method, the evidence provided by the BAS is additive; the results obtained for two sequences can be summed to reach a joint conclusion from both of them. To investigate the performance of such type of joint inference, I generated 400 sequences of 4,096 elements each. One hundred of these corresponded to plain white noises, while the remaining 300 ones were fGns. The fGns were evenly split with regard to their H index: one third had an index of .65, one third had .75, and one third had .85. For each of the generated sequences, I computed the evidence for FS provided by the BAS (comparing ⬅ .5 ⬍ ␦ ⬍ 1 with ⬅ ␦ ⫽ .5). Then, the original sequences were progressively split into smaller and smaller subsequences,

descending in powers of two; I split the series first into two sequences of 2,048 elements, then into four subsequences of 1,024 elements, then into eight subsequences of 512, and so on, up to splitting them into 64 subsequences of 64 elements each.13 On each level of splitting, I computed the sum of the evidence contrasting and obtained by the BAS from each of the subsequences. Figure 5 plots the distribution of the summed evidence for each type of series at each level of splitting. Two issues need to be noted in the figure. First, the overall degree of positivity or negativity of the evidence values reflects our certainty about the conclusion taken. Second, the location of the boxes above or below the significance lines gives the correctness of the classification outcome; if a box and its whiskers are fully above or below the significance thresholds, it should be taken that the classification performance is in all respects perfect, irrespective of the actual degree of positivity or negativity of the median. Table 3 provides the average values (across the 100 realizations) of the summed evidence obtained at each level. As one can see, the BAS is very accurate in deciding that FS is present if it really is in the longer sequences, but the performance of the method decreases as one splits the original into many shorter sequences. The reason for this is that shorter sequences lose the information provided by the lower frequencies in the original sequence, and their inference is based only on the higher frequencies. Nevertheless, the method is 12 Before any classification can be performed, one has to train the classifier on thousands of time series (of the exact same length as the target) from each model and compute four-dimensional covariance matrices for the spectra of the training signals. 13 I am indebted to E.-J. Wagenmakers for suggesting this design to me.


526


Figure 4. Classification results of the BAS method in Simulation 3, separating the proportions on indecisive responses (those with an absolute value of evidence below 20 dB.). BAS ⫽ Bayesian assessment of scaling; ARMA ⫽ autoregressive moving average; fGn ⫽ fractional Gaussians noise; fARIMA ⫽ a fractional autoregressive integrated moving average; wfBm ⫽ whitened fractional Brownian motion; Ev ⫽ evidence.

remarkably robust; for clearly fractal sequences (H indices of .75 or .85), the performance of the method does not deteriorate significantly as long as one considers subsequences of at least 100 elements. The summed evidence obtained from the split sequences adds up to the evidence provided by the original long sequences. Moreover, there appears to be even a slight beneficial effect in some moderate splitting; splitting down as many as 256 elements actually increases the combined evidence for FS. The reason for this is that in a single realization of a sequence, the evidence from the lower frequencies is based on a very small sample and is thus less reliable. By splitting up, these lower frequencies are not considered for the conclusions. The situation is very different for the series that are only mildly fractal (H ⫽ .65), close to the detectability limit that was found in Simulation 1. In this case, with the full-length sequences, the evidence supporting FS is significant, but it very quickly deteriorates in the splitting process, to the point that sequences of 512 elements already provide evidence against FS. This indicates that for these type of “close call” sequences, it is crucial to have access to the lower frequencies. Note here, that the average evidence value is negative implies that if one considered many sequences of the same length, the result would become even more negative. In short, for shorter sequences, the area where FS is not detectable becomes broader. This is in fact a reasonable result. Making the claim that a sequence exhibits FS amounts to stating that the elements in the sequence are positively correlated up to infinite lags. Of course, to make such a strong claim one needs relatively strong evidence as well. If the deviation from the central limit theorem (CLT; which is represented by H ⫽ .5) is not that big and one does not have access to elements at very long lags, the conclusion that FS is present is not justified; one would rather just say that there

are some transient correlations in the data causing a slight deviation from the CLT. When one looks at longer and longer sequences and they still follow the regular scaling law of a fractal— even if this is just a slight deviation from the CLT situation—the explanation by transient correlations becomes weaker. One can however still reach reliable conclusions even for these very short, only slightly fractal sequences. Crucially, this one requires very precisely specified theories. As discussed above, stating only that a sequence is fractal is a very poor theory, and the BAS method penalizes such broad theories. However, if one had more precise hypotheses stating that the FS regime, if present, is rather mild (say, in the region from 0.6 to 0.7), then there is more power to differentiate this hypothesis from the hypothesis that H ⫽ .5. To illustrate this point, I repeated the computations above for the H ⫽ .65 sequences, reformulating the FS hypothesis as ⬘ ⬅ .6 ⬍ ␦ ⬍ .7. The results are now radically different. As is shown in Figure 6, at all but the very short 64 element sequences, one finds significant evidence in favor of the more accurately defined FS hypothesis. Even with this very finely specified hypothesis, at the shortest, 64 element sequences, one fails to detect the scaling. This is sensible. It would not be reasonable to make any inferences on the power-law scaling of a sequence when one examines only one order of magnitude of the possible scales. At least two orders of magnitude (i.e., more than 100 elements) are necessary for obtaining any meaningful results on the long-term scaling of sequences.

General Discussion I have introduced a simple and accurate method for deciding whether—and to what degree—it is justifiable to claim that an



Figure 5.

527

Repeated measurements. BAS ⫽ Bayesian assessment of scaling; dB ⫽ decibel.

observed series indicates FS and, more generally, to contrast specific hypotheses about the values of the scaling exponent. The method is quite easy to apply (R code to perform the techniques developed here is provided in the supplemental materials), in fact the BAS calculations were among the fastest of all those studied here. Part of this speed is because the method is fully analytical; the criteria and the estimator are obtained directly from evaluating a mathematical expression. A further advantage of the BAS is that it enables a single pass analysis of time series, thereby avoiding the problem of multistep methods that rely on successive approximations. Each of these levels introduces error, and the error compounds from level to level. Instead, the method I propose relies only on computing the running sums, from which in a single approximation one obtains either the relative evidence supporting one of two hypotheses or an estimate of the scaling exponent.

The BAS directly fulfills the justified requirement of Wagenmakers et al. (2004) that explicit hypothesis testing and not just parameter estimation is necessary for deciding whether a series exhibits FS. In this respect, the BAS is—in my opinion—simpler than both of the alternatives that exist in the literature, the ARMA/ fARIMA technique of Wagenmakers et al. (2005) and the spectral classifier (Thornton & Gilden, 2005). With respect to the former, the results of Simulation 3 make it clear that the BAS shows equivalent accuracy in its classification of series as does the ARMA/fARIMA, even when these series belong to the domain to which the ARMA/fARIMA contrast is explicitly fine-tuned. However, when one considers “neutral” types of series—less specific to the particular contrasts of the previous models—the BAS is more robust than either the ARMA/fARIMA or the spectral classifier, as it does not depend on testing a particular group of models. Of

Table 3 Results of Simulation 4 Number of repetitions

Length

Nonfractal (H ⫽ .5)

Fractal (H ⫽ .65)

Fractal (H ⫽ .75)

Fractal (H ⫽ .85)

Fractal (H ⫽ .65) ( ⬘ ⬅ .6 ⬍ ␦ ⬍ .7)

64 32 16 8 4 2 1

64 128 256 512 1,024 2,048 4,096

⫺516.5 dB ⫺371.5 dB ⫺251.5 dB ⫺163 dB ⫺101 dB ⫺61.5 dB ⫺38 dB

⫺253.5 dB ⫺133.5 dB ⫺46.5 dB ⫺5 dB 20 dB 31 dB 24.5 dB

7.5 dB 130.5 dB 222.5 dB 225 dB 209 dB 185 dB 152 dB

284.5 dB 468.5 dB 572.5 dB 592 dB 607 dB 561.5 dB 455 dB

⫺18.5 dB 23 dB 52 dB 60.5 dB 59 dB 52 dB 39.5 dB

Note. Summed evidence in favor of the FS hypothesis obtained by progressively splitting a sequence into shorter and shorter subsequences. The last column repeats the test for the weaker fractal signal using a better specified theory to describe the FS situation FS ⫽ fractal scaling; dB ⫽ decibel.


528


course, one can always increase the training corpus of the spectral classifier or the number of models fitted in the ARMA/fARIMA contrast to include more cases, but the issue is that the range of possible types of series is very large and unknown a priori, thus one cannot explicitly include all possible models ad hoc. Therefore, a method that depends on training in, or explicit testing of, any specific models runs a heavy risk of coming against a type of series that was not thought of when implementing the method. The BAS methodology is fully analytical; its application depends only in the application of a few mathematical expressions that are given in a closed form; this is to say, I provide direct formulae for obtaining the results. This contrasts with the previous methods in that it does not have parameters that need to be adjusted or that involve subjective decisions at any stage. In this sense, the method has basically no degrees of freedom; the method could be described as being nonparametric, which I think adds significant value by the generality it implies. Despite its simplicity, the BAS enables one to compare finely defined hypotheses on both individual values and ranges of values of the scaling exponent, something that is not possible— or at least whose possibility is not yet described—in the methods of Thornton and Gilden (2005) or of Wagenmakers et al. (2004, 2005). Also, the method can both deal with and separate persistent (H ⬎ .5) fractals, as do the previous hypothesis methods; it can also deal with antipersistent (H ⬍ .5) options, which the ARMA/ fARIMA method does not cover but which has also been reported in cognitive processes (Engbert & Kliegl, 2004; Liang et al., 2005; Mergenthaler & Engbert, 2007). I now return to two important methodological issues that were mentioned in passing in the text. The validity of the conclusions achieved by the BAS depends completely on the stationarity assumption. In fact, this assumption is shared with nearly all methods for the analysis of stochastic fractal series. A partial exception to this assumption are the methods that as DFA (Peng et al., 1994) or SSC (Eke et al., 2002) do, include an explicit step of detrending in the process, but problems with nonstationarity remain even in these methods (Chen, Ivanov, Hu, & Stanley, 2002; Hu, Ivanov, Chen, Carpena, & Stanley, 2001). As part of an argument on the presence or absence of FS in series of reaction times, Van Orden et al. (2005) argued that detrending techniques are damaging for the detection of fractal structure in time series. Therefore, Van Orden and colleagues (Van Orden et al., 2005) concluded that at least in some situations, one can and should avoid detrending behavioral sequences prior to their fractal analysis. Their argument is that detrending procedures filter out much of the low frequencies contained in the original sequences. Given that behavioral sequences are very short, this makes it more difficult to detect FS when, in fact, what appears as a trend in a short series could correspond to a fractal oscillation in a scale longer than the sequence. Van Orden et al. (2005) are correct in this point. It is indeed possible that what appears as a trend is part of a longer period oscillation, but it is equally possible that it is not. The point is that it is simply not possible to make reasonable inferences on things that happen at scales longer than one has actually observed. This is very important. Stationarity is not only an assumption of the methods but, more fundamentally, is an assumption of the meaning of the estimators themselves. The Hurst exponent (or its equivalent spectral index, as used by Van Orden et al., 2005, and others) indicates unambiguously the presence of FS and LRM in series that either are stationary or have stationary increments (i.e., they are the cumulative sum of a stationary series). As a response to similar arguments in the financial domain,

Bassler, Gunaratne, and McCauley (2006) showed that one can easily construct a Markovian (i.e., without any memory) process whose Hurst exponent is different from .5. This is, it is not that the process masks itself as having a fractal like Hurst exponent, it is that such is the actual value of the real Hurst exponent of the process. In terms of the spectra, it is not that the spectrum appears as a power law; in fact, it is a power law, but just not one generated by a fractal. However, for building such types of process, one needs to use nonstationary series. The implication is that if the series under study is not stationary, the value of the Hurst coefficient is wholly uninformative with respect to the fractal nature of the series. Hence, one must accept the loss in inferential power that may come from detrending procedures; if finding fractallike indices of a sequence depends on not detrending, one cannot reach any safe conclusions on whether these indices reflect the fractal or nonfractal nature of the process that generated it. In essence, the loss caused by detrending is a necessary price to pay for being able to draw inferences. Of course, one can get, as much as possible, more data to help with this problem. This brings me to the second methodological issue. As made clear by Simulations 1 and 2, the BAS is not able to reach the conclusion that a series is fractal in nature either when the value of its Hurst exponent is in the close vicinity of .5 or when the fractal component of the sequence accounts for less than one half of its power. These are the fundamental limits of detectability in the method. Actually, I see this as an advantage rather than an inconvenience; it is beneficial for theory building; it forces us to make well-specified theories. What the limit on the Hurst exponent values close to .5 means is that if one’s theory is so vaguely specified as to predict only that the value of the exponent will be greater than .5, a more parsimonious theory stating that the value is .5 is always a better approximation when the deviation is small. This does not mean that one cannot obtain solid evidence for small deviations from .5. What it does imply is that in order to make such inferences, one needs very well-specified theories; one cannot just “go on a fishing trip.” In the frequent case when what we know about the system does not enable us to make very specific predictions, the data can be used as a guide in the search. For instance, consider that one has a set of behavioral blocks from an experiment and a vague hypothesis that they might be fractal. One can test the fractal hypothesis directly, and as shown above, when the deviation from white noise is small, one is most likely to fail. However, one can do parameter estimation on one half of the data. This will provide a candidate value of the exponent. As was done in Simulation 4, this estimate can then be used for building a finer hypothesis that the true exponent lies within a small interval of the estimate. This finer hypothesis can then be tested on the second part of the data with improved chances of success. If the original estimate reflected just spurious variation, the hypothesis testing will still reject it. A final property of the BAS that requires comment is the simple additivity of the evidence estimates. This is one of the main strengths of Bayesian methods, as they enable the cumulation of evidence in a straightforward way. If two different experiments by different groups give opposite conclusions as to the contrast between two theories, the evidence values obtained by each group can be added to get a reflection of what one should conclude from jointly considering both results. As mentioned, this can also be exploited within the typical psychological experiments. Although it is generally difficult or even impossible to obtain very long sequences of behavioral events to enable detailed inference, the usual repeated measures approach can be used to obtain many short sequences. Once more, now one can



reach a conclusion about the whole by merely summing (not averaging) the evidence values provided by each short block.14 Notice here that using short blocks is only meaningful up to a limit. If one only has sequences of less than about 100 elements, it is not sensible to make any claims on long-range correlations, as such types of correlations cannot be investigated from such short data. The BAS methodology seems particularly suitable for the typical repeated measures experiments used in our field, as long as one has sequences of at least moderate length available.

14 Notice that for this additivity to hold, it is assumed that the prior for each of the hypothesis was set, as was done throughout this article, to zero. In this way, the results of one experiment serve as a prior to the next one, as what is being added are log-likelihoods. If one had used a prior that gave more initial plausibility to one of the hypotheses, this prior should only be added to the evidence first experiment, so that the rest just perform updates on this initial result. Notice also that in order to use additive inference from different sequences, one has to make the assumption that all sequences have Hurst exponents within the range of the theories tested.

References Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions in Automation and Control, 19, 716 –723. doi:10.1109/ TAC.1974.1100705 Bassingthwaighte, J. B., & Raymond, G. M. (1995). Evaluation of the dispersional analysis method for fractal time series. Annals of Biomedical Engineering, 23, 491–505. doi:10.1007/BF02584449 Bassler, K. E., Gunaratne, G. H., & McCauley, J. L. (2006). Markov processes, Hurst exponents, and nonlinear diffusion equations with application to finance. Physica A: Statistical and Theoretical Physics, 369, 343–353. doi:10.1016/j.physa.2006.01.081 Beran, J. (1994). Statistics for long memory processes. New York, NY: Chapman & Hall. Borland, L. (1998). Microscopic dynamics of the nonlinear Fokker-Planck equation: A phenomenological model. Physical Review E, 57, 6634 – 6642. doi:10.1103/PhysRevE.57.6634 Cajueiro, D. O., & Tabak, B. M. (2007). Is the expression H ⫽ 1/(3 ⫺ q) valid for real financial data? Physica A: Statistical and Theoretical Physics, 373, 593– 602. doi:10.1016/j.physa.2006.05.054 Chen, Z., Ivanov, P. C., Hu, K., & Stanley, H. E. (2002). Effect of nonstationarities on detrended fluctuation analysis. Physical Review E, 65, 041107. doi:10.1103/PhysRevE.65.041107 Collins, J. J., & De Luca, C. D. (1993). Open-loop and closed-loop control of posture: A random-walk analysis of center-of-pressure trajectories. Experimental Brain Research, 95, 308 –318. doi:10.1007/BF00229788 Delignie`res, D., Ramdani, S., Lemoine, L., Torre, K., Fortes, M., & Ninot, G. (2006). Fractal analyses for “short” time series: A re-assessment of classical methods. Journal of Mathematical Psychology, 50, 525–544. doi:10.1016/j.jmp.2006.07.004 Eke, A., Hermaán, P., Kocsis, L., & Kozak, L. R. (2002). Fractal characterization of complexity in temporal physiological signals. Physiological Measurement, 23, R1–R38. doi:10.1088/0967-3334/23/1/201 Engbert, R., & Kliegl, R. (2004). Microsaccades keep the eyes’ balance during fixation. Psychological Science, 15, 431– 436. doi:10.1111/ j.0956-7976.2004.00697.x Farrell, S., Wagenmakers, E.-J., & Ratcliff, R. (2006a). Methods for detecting 1/f noise. Unpublished manuscript, School of Experimental Psychology, University of Bristol, Bristol, England. Retrieved from http://seis.bristol.ac.uk/⬃pssaf Farrell, S., Wagenmakers, E.-J., & Ratcliff, R. (2006b). 1/f noise in human

529

cognition: Is it ubiquitous, and what does it mean? Psychonomic Bulletin & Review, 13, 737–741. doi:10.3758/BF03193989 Gilden, D. L. (1997). Fluctuations in the time required for elementary decisions. Psychological Science, 8, 296–301. doi:10.1111/j.1467-9280.1997.tb00441.x Gilden, D. L. (2001). Cognitive emissions of 1/f noise. Psychological Review, 108, 33–56. doi:10.1037/0033-295X.108.1.33 Gilden, D. L., Thornton, T., & Mallon, M. W. (1995). 1/f noise in human cognition. Science, 267, 1837–1839. doi:10.1126/science.7892611 Gisiger, T. (2001). Scale invariance in biology: Coincidence or footprint of a universal mechanism? Biological Reviews of the Cambridge Philosophical Society, 76, 161–209. doi:10.1017/S1464793101005607 Gnedenko, B. V., & Kolmogorov, A. N. (1954). Limit distributions for sum of independence random variables. Reading, MA: Addison-Wesley. Gottschalk, A., Bauer, M. S., & Whybrow, P. C. (1995). Evidence of chaotic mood variation in bipolar disorder. Archives of General Psychiatry, 52, 947–959. doi:10.1001/archpsyc.1995.03950230061009 Haslett, J., & Raftery, A. E. (1989). Space-time modelling with longmemory dependence: Assessing Ireland’s wind power resource. Applied Statistics, 38, 1–50. doi:10.2307/2347679 Hausdorff, J. M., Peng, C. K., Ladin, Z., Wei, J. Y., & Goldberger, A. R. (1995). Is walking a random walk? Evidence for long-range correlations in stride interval of human gait. Journal of Applied Physiology, 78, 349–358. Hu, K., Ivanov, P. C., Chen, Z., Carpena, P., & Stanley, H. E. (2001). Effect of trends on detrended fluctuation analysis. Physical Review E, 64, 011114. doi:10.1103/PhysRevE.64.011114 Hurst, H. E. (1951). Long-term storage capacity of reservoirs. Transactions of the American Society of Civil Engineers, 116, 770 – 808. Ignaccolo, M., Allegrini, P., Grigolini, P., Hamilton, P., & West, B. J. (2004). Scaling in non-stationary time series (I & II). Physica A: Statistical and Theoretical Physics, 336, 595– 622. doi:10.1016/ j.physa.2003.12.034 Jaynes, E. T. (2003). Probability theory: The logic of science. Cambridge, England: Cambridge University Press. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795. doi:10.1080/ 01621459.1995.10476572 Kello, C. T., Anderson, G. G., Holden, J. G., & Van Orden, G. C. (2007). The pervasiveness of 1/f scaling in speech reflects the metastable basis of cognition. Cognitive Science, 32, 1217–1231. doi:10.1080/ 03640210801944898 Kello, C. T., Anderson, G. G., Holden, J. G., & Van Orden, G. C. (2008a). The emergent coordination of cognitive function. Journal of Experimental Psychology: General, 136, 551–568. doi:10.1037/0096-3445.136.4.551 Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86. doi:10.1214/aoms/1177729694 Liang, J. R., Moshel, S., Zivotofsky, A., Caspi, A., Engbert, R., Kliegl, R., & Havlin, S. (2005). Scaling of horizontal and vertical fixational eye movements. Physical Review E, 71, 031909. doi:10.1103/PhysRevE.71.031909 Mandelbrot, B. B., & Van Ness, J. W. (1968). Fractional brownian motions, fractional noises, and applications. SIAM Reviews, 10, 422– 437. doi:10.1137/1010093 Mandelbrot, B. B., & Wallis, J. R. (1969). Computer experiments with fractional Gaussian noises. Water Resources Research, 5, 228 –241. doi:10.1029/WR005i001p00228 Mergenthaler, K., & Engbert, R. (2007). Modeling the control of fixational eye movements with neurophysiological delays. Physical Review Letters, 98, 138104. doi:10.1103/PhysRevLett.98.138104 Moscoso del Prado Martıń, F. (2011). Causality, criticality, and reading words: Distinct sources of fractal scaling in behavioral sequences. Cognitive Science, 35, 785– 837. doi:10.1111/j.1551-6709.2011.01184.x Peng, C. K., Buldyrev, S. V., Havlin, S., Simons, M., Stanley, H. E., & Goldberger, A. L. (1994). Mosaic organization of DNA nucleotides. Physical Review E, 49, 1685–1689. doi:10.1103/PhysRevE.49.1685



530

R Development Core Team. (2005). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Reddi, B. A. J., & Carpenter, R. H. S. (2000). The influence of urgency on decision time. Nature Neuroscience, 3, 827– 830. doi:10.1038/77739 Riley, M. A., & Van Orden, G. C. (Eds.). (2005). Tutorials in contemporary nonlinear methods for the behavioral sciences. Retrieved from National Science Foundation website: http://www.nsf.gov/sbe/bcs/pac/nmbs/nmbs.jsp Scafetta, N., & Grigolini, P. (2002). Scaling detection in time series: Diffusion entropy analysis. Physical Review E, 66, 036130. doi:10.1103/ PhysRevE.66.036130 Schmidt, R. C., Beek, P. J., Treffner, P. J., & Turvey, M. T. (1991). Dynamical substructure of coordinated rhythmic movements. Journal of Experimental Psychology: Human Perception and Performance, 17, 635– 651. doi:10.1037/0096-1523.17.3.635 Schreiber, T., & Schmitz, A. (2000). Surrogate time series. Physica D: Nonlinear Phenomena, 142, 346–382. doi:10.1016/S0167-2789(00)00043-9 Schwarz, G. (1978). Estimating the dimension of a model. Annals Statistics, 6, 461– 464. doi:10.1214/aos/1176344136 Shelhamer, M., & Joiner, W. (2003). Saccades exhibit abrupt transition between reactive and predictive, predictive saccade sequences have long-term correlations. Journal of Neurophysiology, 90, 2763–2769. doi:10.1152/jn.00478.2003 Sivia, D. S., & Skilling, J. (2006). Data analysis: A Bayesian tutorial. Cambridge, England: Cambridge University Press. Slifkin, A. B., & Newell, K. M. (1998). Is variability in human performance a reflection of system noise? Current Directions in Psychological Science, 7, 170 –177. doi:10.1111/1467-8721.ep10836906 Stroe-Kunold, E., Stadnytska, T., Werner, J., & Braun, S. (2009). Estimat-

ing long-range dependence in time series: An evaluation of estimators implemented in R. Behavior Research Methods, 41, 909 –923. doi: 10.3758/BRM.41.3.909 Taqqu, M. S., Teverovsky, V., & Willinger, W. (1995). Estimators for long-range dependence: An empirical study. Fractals, 3, 785–798. doi: 10.1142/S0218348X95000692 Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., & Farmer, J. D. (1992). Testing for nonlinearity in time series: The method of surrogate data. Physica D: Nonlinear Phenomena, 58, 77–94. doi:10.1016/0167-2789(92)90102-S Thornton, T. L., & Gilden, D. L. (2005). Provenance of correlations in psychological data. Psychonomic Bulletin & Review, 12, 409 – 441. doi:10.3758/BF03193785 Van Orden, G. C., Holden, J. G., & Turvey, M. T. (2003). Selforganization of cognitive performance. Journal of Experimental Psychology: General, 132, 331–350. doi:10.1037/0096-3445.132.3.331 Van Orden, G. C., Holden, J. G., & Turvey, M. T. (2005). Human cognition and 1/f scaling. Journal of Experimental Psychology: General, 134, 117–123. doi:10.1037/0096-3445.134.1.117 von Bu¨nau, P., Meinecke, F. C., Kira´ly, F. C., & Mu¨ller, K. (2009). Finding stationary subspaces in multivariate time series. Physical Review Letters, 103, 214101–214104. doi:10.1103/PhysRevLett.103.214101 Wagenmakers, E.-J., Farrell, S., & Ratcliff, R. (2004). Estimation and interpretation of 1/f␣ noise in human cognition. Psychonomic Bulletin & Review, 11, 579 – 615. doi:10.3758/BF03196615 Wagenmakers, E.-J., Farrell, S., & Ratcliff, R. (2005). Human cognition and a pile of sand: A discussion on serial correlations and self-organized criticality. Journal of Experimental Psychology: General, 134, 108 –116. doi:10.1037/0096-3445.134.1.108

Appendix Derivation of the Expressions for the BAS Criteria and ML Estimator To obtain the final evidence comparing the hypotheses, it is necessary to combine the evidence values from all the summing orders. Notice that all the running sums were computed from a single realization of the series, so a plain sum would exaggerate the evidence. In order to rescale the evidence, one should consider that the tth running sum was computed on t ⫺ 1 elements. However, our initial sample had just N elements. Therefore, the rescaling should be a factor ␳N taking the sum from considering

冘

N

共t ⫺ 1兲 elements into consid-

t⫽1

ering just N. For simplicity, we consider here also the first order running sum (t ⫽ 1), which provides no evidence, ᐉ1,2共x兩1兲 ⫽ 0, but simplifies the calculations; this leaves ␳N ⫽

冘

N N t⫽1

共t ⫺ 1兲

⫽

vides will be directly proportional to the number of points considered. Note, however, that splitting one long sequence into many shorter ones will result in a loss of information. Although the combined length of the series is the same, all information contained in the lower frequencies will be lost. Although the overall contribution to the evidence of the lower frequencies is relatively small, their per-item contribution is usually greater than that of the higher frequencies, making the whole series more informative than the sum of its parts. Nevertheless, this additivity enables the use of a repeated-measures approach when it is not possible to collect longer sequences.

Two Simple Hypotheses The first issue is to compute ᐉ1, 2 共x兩t兲. One can write this in terms of the tth running sum yt,

2 , N⫺1

ᐉ 1,2 共x兩t兲 ⫽ ᐉ1,2 共yt 兲 ⫽ ᐉ1,2 共yt 兩␦1 兲 ⫺ ᐉ共yt 兩␦2 兲,

so that the combined evidence is

冘 N

ᐉ 1,2 (x) ⫽ ␳N

t⫽1

2 ᐉ1,2 (x兩t) ⫽ 共N ⫺ 1兲

冘 N

ᐉ1,2 共x兩t兲.

(A1)

t⫽1

This rescaling is also important because it enables results from different experiments to be directly combined; the evidence each pro-

(A2)

where ᐉ refers to the log-likelihood. Therefore, one needs an expression of the log-likelihoods of x as a function of ␦, which is usually not available. One can make use of the scaling property in Equation 4 to obtain such an expression. If yt is the tth order running sum of x, ᐉ 1,2 关yt 共i兲兩␦兴 ⫽ ln p关yt 共i兲兩␦兴 ⫽ ⫺ ␦ ln t ⫹ ln fL 关yt 共i兲兩t⫺␦ 兴

(Appendix continues)

(A3)


which is a function of ␦. On the other hand, the overlap between running sum windows introduces much redundancy in the combined evidence; that is, the elements of the running sum are far from being independent. A simple way to correct this is to rescale the contribution of each sum to the log-likelihood from its length of N ⫺ t ⫹ 1 elements, down to its maximum effective length of N/t independent elements. The combined evidence is therefore

冘

N⫺t⫹1

ᐉ 1,2 共yt 兲 ⬇

N 共N ⫺ t ⫹ 1兲t

兵ᐉ关yt 共i兲兩␦1 兴 ⫺ ᐉ关yt 共i兲兩␦2 兴其.

i⫽1


(A4)

N N ␦ ln t ⫹ t 共N ⫺ t ⫹ 1兲t

⫽

(A7a)

1 ␦2 ⫺ ␦1

冕

␦2

p共yt 兩␦兲d␦.

(A7b)

␦1

Assuming nothing about the value of ␦ apart from it falling in the interval [␦1, ␦2] requires the use of an uninformative uniform prior for it, p(␦兩t) ⫽ 1/(␦2 ⫺ ␦1), changing Equation A7a into Equation A7b. For each individual point in the running sum, the scaling condition enables writing the likelihood as p共 yt 共i兲兩␦1 ⱕ ␦ ⱕ ␦2 兲 ⫽

⫽ ln fL 关 yt 共i兲t⫺␦ 兴.

1 ␦2 ⫺ ␦1

冕

␦2

p共 yt 共i兲兩␦兲d␦

(A8a)

␦1

1 共␦ 2 ⫺ ␦ 1 兲

冕冋册 ␦2

␦1

1 yt 共i兲 f d␦. t ␦ L t␦

i⫽1

(A8b) (A5)

Now, if we change the variables in Equation A8b by

Finally, using Equation A5 in Equation A2 gives the expression for the log-likelihood ratio

冘

N⫺t⫹1

ᐉ 1,2 共xt 冦t兲 ⬇

p共yt 兩␦兲p(␦兩t)d␦;

␦1

N⫺t⫹1

ᐉ共yt兩␦兲 ⬇ ⫺

冕

␦2

p共yt 兩␦1 ⱕ ␦ ⱕ ␦2 兲 ⫽

This step relies on an approximation by an assumption of independence (i.e., part of the dependence is already dealt with by the rescaling; this independence is not guaranteed by fractals, but the approximation does not have too strong effects on the results).15 Combining Equation A3 with Equation A4, the log-likelihood is

冘

531

N N 共␦ ⫺ ␦1 兲 ln t ⫹ t 2 共N ⫺ t ⫹ 1兲t

i⫽1

ln

fL关 yt 共i兲t⫺␦ 1兴 . fL关 yt 共i兲t⫺␦ 2兴 (A6)

The limit distribution fL can sometimes be difficult to know. Further, the limit distribution is only observed in the infinite time limit. The original distribution in the data fX is progressively being changed into fL at higher diffusion times (details below). Therefore, one can approximate fL using fX without much error (the error bounds are provided below). In a Gaussian process, fX is the normal density function. Putting the normal density function into Equation A6 and simplifying, one obtains Equation 8 from the main text.

Including Composite Hypotheses In a more general case, when a candidate value for the scaling exponent ␦ is not available but rather a range of possible values is provided, our task consists in comparing the simple hypothesis ⬅ ␦ ⫽ ␦0 and the composite hypothesis ⬅ ␦1 ⱕ ␦ ⱕ ␦2. As before, the evidence for each hypothesis is expressed in terms of its t-order running sum. In the likelihood for the composite hypothesis , ␦ is a now a nuisance parameter, and the estimation of the likelihood requires integrating it away;

z ⫽ y t 共i兲t ⫺ ␦, d␦ ⫽ ⫺

dz t␦ dz ⫽ , z ln t yt 共i兲ln t

we obtain p共yt 共i兲兩␦1 ⱕ ␦ ⱕ ␦2 兲 ⫽

冕

1 共␦2 ⫺ ␦1 兲

y t 共i兲t ⫺␦ 1

y t 共i兲t ⫺␦ 2

1 f 共z兲dz ⫽ yt 共i兲ln t L (A9a)

⫽

再

冎

1 F 关 y 共i兲t⫺␦ 1兴 ⫺ FL 关 yt 共i兲t⫺␦ 2兴 , 共␦ 2 ⫺ ␦ 1 兲 y t 共i兲ln t L t (A9b)

⬁ where FL 共x兲 ⫽ 兰⫺⬁ fL 共z兲dz is the cumulative probability function of fL. Combining the logarithm of Equation A9b with the loglikelihood for a simple hypothesis given in Equation A5, the expression for the log-likelihood ratio in favor of the composite hypothesis provided by summing order t ⬎ 1 can be written as

15 The use of this independence approximation introduces bias against fractal scaling whose maximal expected magnitude is of around 1% of the magnitude of the total evidence against fractal scaling.



532 ᐉ共x兩t兲 ⫽ ᐉ共yt 兩␦1 ⬍ ␦ ⬍ ␦2 兲 ⫺ ᐉ共yt 兩␦ ⫽ ␦0 兲 ⫽

冘

ε共t兲 2 ⫽

N⫺t⫹1

⫽

⫽

N t共N ⫺ t ⫹ 1兲

N t共N ⫺ t ⫹ t兲

ln关p共yt 共i兲兩␦1 ⱕ ␦ ⱕ ␦2 兲兴 ⫺ ᐉ共yt 兩␦ ⫽ ␦0 兲 ⫽

i⫽1

冘再

N⫺t⫹1

ln

i⫽1

1 共␦2 ⫺ ␦1 兲yt 共i兲ln t

冎


⫻ 共FL 关 yt 共i兲t⫺␦ 1兴 ⫺ FL 关 yt 共i兲t⫺␦ 2兴兲

再

N N ⫺ ⫺ ␦0 ln t ⫹ t 共N ⫺ t ⫹ 1兲t ⫽

冘

N⫺t⫹1

冎

ln fL 关 yt 共i兲t␦ 0兴 ⫽

i⫽1

N t␦0 ln t ␦2 ⫺ ␦1

册冏 ⫺1

⳵ 2 ᐉ共yt 兩␦兲 ⳵␦2

.

␦ ⫽ ␦ˆ

␦ˆ 0 ⫽ ␦ˆ 0 共t夹兲, t夹 ⫽ argt min兵ε共t兲其.

冘

i⫽1

ln

FL 关 yt 共i兲t⫺␦ 1兴 ⫺ FL 关 yt 共i兲t⫺␦ 2兴 . yt 共i兲ln共t兲fL 关 yt 共i兲t⫺␦ 0兴

(A10)

If one uses fX in place of fL, then in the Gaussian case, this leads to Equation 9 in the main text. The case comparing two composite hypotheses can be derived trivially by subtracting two different instances of Equation A10 relative to an arbitrary common baseline, ␦0 , with the Gaussian case leading to Equation 10.

(A14)

If fx is a Gaussian distribution with zero mean and unit variance, putting the standard normal in Equation A5 gives the log-likelihood

冘冋

N⫺t⫹1

N 共N ⫺ t ⫹ 1兲t

(A13)

0

Using the approximation of fL by fx described above, the explicit expressions for ␦ˆ 0 共t兲 and ε(t) can be derived analytically for some distributions. A problem that arises is that by this method one obtains N ⫺ 1 different estimators ␦ˆ 0 共t兲, and there is still the need to choose a single value. A simple— but somewhat heuristic— method to choose the best value of ␦ˆ given many estimates is to choose the one that is the most accurate, that is, the one with the smallest estimation error;

ᐉ共yt 兩␦兲 ⫽ N⫺t⫹1

⫹

冋

N N␦ ln t ⫺ t k共N ⫺ t ⫹ 1兲t

i⫽1

册

yt 共i兲2 ⫹ ln 冑2␲ . 2t2␦ (A15)

Differentiating by ␦ and equating to zero, ⳵ᐉ共yt 兩␦兲 N ln t ⫽ ⳵␦ t

冋

冘

册

N⫺t⫹1

t⫺2␦ N⫺t⫹1

⫺1⫹

yt 共i兲2 ⫽ 0,

i⫽1

(A16) whose solution is the explicit (uncorrected) expression for the estimator

Estimation of the Scaling Exponent

冉冘冊 N⫺t⫹1

In the case when one has sufficient evidence for the presence of FS, it is often necessary to determine what this scaling exponent is. A simple way to obtain such an estimate of ␦ is to find the value ␦ˆ that maximizes the log posterior probability of the exponent given the observed data:

yt 共i兲2 ⫺ ln共N ⫺ t ⫹ 1兲

ln ␦ˆ 0 共t兲 ⫽ ε共t兲 2 ⫽

␦ˆ 0 ⫽ arg␦ max兵ᐉ共␦兩x兲其.

i⫽1

共N ⫺ t ⫹ 1兲t 2␦ˆ 0共t兲⫹1

␦ˆ 0 ⫽ arg␦ max兵ᐉ共x兩␦兲 ⫹ ᐉ共␦兲其 ⫽ arg␦ max兵ᐉ共x兩␦兲其. Note that I have omitted the normalization constant, as it does not affect the location of the maximum or its error estimation. The ML method relies on finding the extrema of the log-likelihood function. The extrema of this function are found where its first derivative with respect to ␦ is zero, and they correspond to maxima if the second derivative at those points is negative. If ␦ˆ 共t兲 is a maximum of the likelihood then, assuming that the loglikelihood is peaked around the single maximum ␦ˆ 共t兲 with a shape that is roughly Gaussian, the standard error of the estimate is approximated by

冘

N⫺t⫹1

2N ln 共t兲 2

By application of Bayes’ theorem, and as before, assuming a uniform prior for ␦,

⫽

2 ln t ⫽

ln st ln t

t , 2 N ln2 t

(A17)

(A18)

yt 共i兲

2

i⫽1

where Equation A18 is obtained by applying Equation A13 and simplifying using Equation A17, and st refers to the sample standard deviation of the tth summing order: s t2 ⫽

1 共N ⫺ t ⫹ 1兲

冘

N⫺t⫹1

y t 共i兲 2 .

(A19)

i⫽1

Notice that the expression for the error in Equation A18 is independent of the actual data. One can thus find where its minimum t夹 lies in all cases, ⳵ε共t兲 ⳵t


冏

⫽ t ⫽ t夹

冑2

1 夹 t

N ln t

夹

冉

冊

1 1 ⫽ 0, t夹 ⱖ 2, ⫺ 2 lnt夹

(A20)



533

Figure A1. Deviation from normality, numerically estimated KLD between the Gaussian distribution (fX) and the distribution of the running sums of increasing order. For each value of H, the points were estimated using simulated fGn of 5 䡠 105 elements. KLD ⫽ Kullback-Leibler divergence; fGn ⫽ fractional Gaussians noise.

whose only solution is t夹 ⫽ e2 ⬇ 7.39. Comparing the values of the error in the two closest integers to tⴱ, ε(7) and ε(8), one finds that the expected minimum error will always be at the seventh running order, and therefore one can directly set ␦ˆ 0 ⫽ ␦ˆ 0(7). Not surprisingly, from the expression of the estimator given in Equation A17, one nearly recovers directly the variance scaling relation given in Equation 2, corresponding to the properties of stochastic fractals originally detected by Hurst (1951) and Mandelbrot and Van Ness (1968). One can then correct ad hoc the small difference between both equations. If one replaces the sample variance by the estimated population variance, one obtains

冉冘冊

␦ˆ 0 共t兲 ⫽

ln ␴ˆ t ⫽ ln t

yt 共i兲

ln

冓冋冏册冔冕冉冊冉冊 ᐉ yt N,t,␦夹

⫽

fL

N⫺t⫹1

2

transformed into the limit distribution fL. Figure A1 illustrates this point. It plots the departure from normality of the diffusion associated to an fGn as time increases. As expected, one finds a progressive increase in the Kullback-Leibler divergence (KLD; Kullback & Leibler, 1951) between fx and the observed distribution. Even for this simple fGn case, full convergence to fL is not attained at very large summing orders, but the divergence from normality is rather small.16 By the scaling property, if ␦夹 is the “true” scaling exponent, the expected maximum log-likelihood of the sum yt is N t

⬁

⫺⬁

1 yt yt N 夹 ␦ ln t 夹 f 夹 ln fL 夹 dyt ⫺ t t␦ L t␦ t␦

⫺ ln共N ⫺ t兲

i⫽1

2ln t

,

(A22a) (A21)

which is valid for all t ⬍ N. This corrected estimator is therefore very similar to what one would obtain through the slope of a regression between ln t and ln ␴ˆ t , which is used in many methods. The difference is that instead of the regression across many values of t, in this method, the value of t that will provide the best estimator is directly selected analytically. This difference leads to a slight improvement over the regression based estimator and also provides a direct estimation of the expected error.

⫽ ⫺

冕

N N 夹 ␦ ln t ⫹ t t

⬁

fL共z兲ln fX 共z兲dz,

⫺⬁

(A22b) where the change from Equation A22a into Equation A22b is obtained by a change of variable z ⫽ yt /t␦夹 ⴱ, dy ⫽ t␦夹 dz. Using fx in place of fL to estimate the likelihood would have resulted in an estimate at the maximum,

Error in Approximating fL by fX in the Gaussian Case Calculating the evidences in expressions above requires an analytical expression for the limit distribution fL. Strictly speaking, the scaling condition applies exactly only in the infinite time limit. As mentioned above, the process of convergence to this limit is one by which the original distribution of the data fx is gradually being

16 In fact, for the Gaussian case, one could perhaps get a slightly better estimator using for fL the Tsallis generalized maximum entropy distribution, and setting Tsallis’ entropic index to q ⫽ 3 ⫺ 1/␦ (cf. Borland, 1998; Cajueiro & Tabak, 2007). In practice, this would at best introduce a very small improvement for very long summing orders but would greatly complicate all derivations.



534

冓冋冏册冔 ᐉ yt N,t,␦夹

fX

⫽ ⫺

冕

N 夹 N ␦ ln t ⫹ t t

fL共z兲ln fX共z兲dz,

so that the estimation error would have been

冓冋冏册冔冓冋冏册冔


⫽

N t

冕

⬁

⫺⬁

N fL 共z兲 dz ⫽ K关fL 兩兩fX 兴, fL共z兲ln fX 共z兲 t

⫽ ln

⬇ ᐉ yt N,t,␦夹

p ␦ N,t d␦

p yt N,t,␦

␦1

fL

fL

1 ⫹ ln ε(t) ⫹ ln 2␲ ⫺ ln (␦2⫺␦1) 2 (A25)

⫺ ᐉ yt N,t,␦夹

fL

ᐉ yt N,t,

fL

⫺⬁

(A23)

⌬共t兲 ⫽ ᐉ yt N,t,␦夹

冓冋冏册冔冕冓冋冏册冔冋冏册冓冋冏册冔 ␦2

⬁

fX

(A24)

where K[䡠㛳䡠] refers to the KLD between two distributions. Equation A24 reflects the expected error in estimating the log-likelihood. In general, for two summing orders t1 ⬍ t2 , it should hold that 0 ⱕ t1 ⌬共t1 兲/N ⱕ t2 ⌬共t2 兲/N ⱕ K关 fL 兩兩fx 兴. Notice that as the likelihood is always sharply peaked around ␦夹 , if the true maximum lies within the studied (␦1 ⬍ ␦夹 ⬍ ␦2 ), one can very well approximate the integral of the likelihood as an unnormalized Gaussian distribution with standard deviation ε共t兲 from Equation A18 (this is sometimes referred to as Laplace’s approximation, and it follows directly from a second order Taylor expansion of the log-likelihood around ␦夹 ; see, e.g., Sivia & Skilling, 2006). In this case, with the usual uniform prior on ␦, the log integrated likelihood is accurately approximated as

On the other hand, as the ML-estimate is going to be extremely close to the true maximum (as mentioned, it recovers exactly the variance scaling relation of Equation 2), the log-likelihood is most likely going to be underestimated. Using fX instead of fL to estimate the log-likelihood of the running sums should result in an underestimation of the likelihood, with the amount of underestimation bound by the KLD between the two distributions (the KLD between two distributions is always nonnegative; Kullback & Leibler, 1951). In the fGn case, fX is the Gaussian distribution, and the approximation is exact for ␦ⴱ ⫽ .5, and otherwise, the underestimation increases with ␦夹 . Therefore, comparing the approximated log-likelihoods for ␦夹 ⫽ .5 against the hypothesis ␦夹 ⫽ .5 will be slightly biased in favor of the latter null hypothesis (which is perhaps desirable). Received March 4, 2010 Revision received June 28, 2011 Accepted July 11, 2011 䡲

CausalTrail: Testing hypothesis using causal Bayesian networks.

Commentary: The Need for Bayesian Hypothesis Testing in Psychological Science.

Assessment of clinical testing methodology.

The complexity of sequences generated by the arc-fractal system.

The fractal lung: universal and species-related scaling patterns.

Hypothesis Testing of Inclusion of the Tolerance Interval for the Assessment of Food Safety.

The analysis of the influence of fractal structure of stimuli on fractal dynamics in fixational eye movements and EEG signal.

Testing the Penrose Hypothesis-Reply.

Random effects structure for confirmatory hypothesis testing: Keep it maximal.

Chemical structure informing statistical hypothesis testing in metabolomics.

Scaling hypothesis for the Euclidean bipartite matching problem.

The mechanism of auriculotherapy: a case report based on the fractal structure of meridian system.

Transitions in skin blood flow fractal scaling: the importance of fluctuation amplitude in microcirculation.

Conceptualizing a Novel Quasi-Continuous Bayesian Phylogeographic Framework for Spatiotemporal Hypothesis Testing.

The methodology of testing naive beliefs in the physics classroom.

Testing the adaptive radiation hypothesis for the lemurs of Madagascar.

Testing the individual effective dose hypothesis.

Is There a Scale-up Penalty? Testing Behavioral Change in the Scaling up of Parent Management Training in Norway.

Testing the risk of predation hypothesis: the influence of recolonizing wolves on habitat use by moose.

Testing the encoding elaboration hypothesis: The effects of exemplar ranking on recognition and recall.

A default Bayesian hypothesis test for mediation.

Testing the implicit processing hypothesis of precognitive dream experience.

Exploring the ontogenetic scaling hypothesis during the diversification of pollination syndromes in Caiophora (Loasaceae, subfam. Loasoideae).

Exploring the link between multiscale entropy and fractal scaling behavior in near-surface wind.