Downloaded from http://jnnp.bmj.com/ on April 13, 2015 - Published by group.bmj.com

JNNP Online First, published on March 26, 2015 as 10.1136/jnnp-2015-310307 Neurosurgery

RESEARCH PAPER

Improving fMRI reliability in presurgical mapping for brain tumours M Tynan R Stevens,1,2 David B Clarke,3,4 Gerhard Stroink,1 Steven D Beyea,1,2 Ryan CN D’Arcy5 1

Department of Physics, Dalhousie University, Halifax, Nova Scotia, Canada 2 Biomedical Translational Imaging Centre, IWK Health Sciences Centre, Halifax, Nova Scotia, Canada 3 Division of Neurosurgery, QEII Health Sciences Centre, Halifax, Nova Scotia, Canada 4 Division of Surgery, QEII Health Sciences Centre, Halifax, Nova Scotia, Canada 5 Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada Correspondence to Ryan CN D’Arcy, Surrey Memorial Hospital, NeuroTech Lab, Barham Building, 1375096th Avenue, Surrey, British Columbia, Canada V3V 1Z2; [email protected] Received 6 January 2015 Revised 17 February 2015 Accepted 27 February 2015

ABSTRACT Purpose Functional MRI (fMRI) is becoming increasingly integrated into clinical practice for presurgical mapping. Current efforts are focused on validating data quality, with reliability being a major factor. In this paper, we demonstrate the utility of a recently developed approach that uses receiver operating characteristic-reliability (ROC-r) to: (1) identify reliable versus unreliable data sets; (2) automatically select processing options to enhance data quality; and (3) automatically select individualised thresholds for activation maps. Methods Presurgical fMRI was conducted in 16 patients undergoing surgical treatment for brain tumours. Within-session test–retest fMRI was conducted, and ROC-reliability of the patient group was compared to a previous healthy control cohort. Individually optimised preprocessing pipelines were determined to improve reliability. Spatial correspondence was assessed by comparing the fMRI results to intraoperative cortical stimulation mapping, in terms of the distance to the nearest active fMRI voxel. Results The average ROC-r reliability for the patients was 0.58±0.03, as compared to 0.72±0.02 in healthy controls. For the patient group, this increased significantly to 0.65±0.02 by adopting optimised preprocessing pipelines. Co-localisation of the fMRI maps with cortical stimulation was significantly better for more reliable versus less reliable data sets (8.3±0.9 vs 29 ±3 mm, respectively). Conclusions We demonstrated ROC-r analysis for identifying reliable fMRI data sets, choosing optimal postprocessing pipelines, and selecting patient-specific thresholds. Data sets with higher reliability also showed closer spatial correspondence to cortical stimulation. ROC-r can thus identify poor fMRI data at time of scanning, allowing for repeat scans when necessary. ROC-r analysis provides optimised and automated fMRI processing for improved presurgical mapping.

INTRODUCTION Presurgical mapping validity and reliability

To cite: Stevens MTR, Clarke DB, Stroink G, et al. J Neurol Neurosurg Psychiatry Published Online First: [ please include Day Month Year] doi:10.1136/ jnnp-2015-310307

Functional MRI (fMRI) is increasingly being used to map eloquent cortex prior to surgical treatment for brain tumours.1 2 The goal of presurgical mapping is to identify functional brain regions near the tumour, to plan surgical approach, identify risks and potentially render intraoperative electrocortical stimulation (CS) unnecessary. fMRI is attractive for this purpose due to non-invasiveness, repeatability, high spatial resolution and broad availability.3 4

Validated presurgical fMRI protocols were demonstrated first for sensory-motor function,5 followed by language localisation6 7 and more recently memory mapping.8 These validation studies compare fMRI localisation with a gold standard measure such as CS. The concordance of fMRI and CS is influenced by the matching criteria used,9 field strength,2 presurgical tasks employed10 and by the threshold used during fMRI analysis.11 A recent review asserts that this heterogeneity has led to widely varying estimates of the accuracy of fMRI compared with CS.2 fMRI results have a high degree of variability,12–14 and although intersubject variability is higher than intrasubject variability, a single scan fMRI experiment includes a substantial amount of false positives and false negatives. Repeating scans is thus useful in order to produce more reliable activation maps for an individual patient. For example, Beisteiner et al15 restricted fMRI activity to only those voxels that survived high correlation thresholds in all repetitions of a motor task. This resulted in fewer active voxels, with improved reliability and closer spatial correspondence to CS results. Variability in fMRI can also be mitigated by using individualised data-driven postprocessing strategies. Gonzalez-Ortiz et al16 showed that builtin scanner analysis software was often sufficient for presurgical mapping, but third party packages offered superior flexibility, reduced noise and was preferred by radiologists. However, they were unable to provide quantitative guidelines for determining the best pipeline for a given fMRI data set. Quality assessment tools are clearly needed in order to objectively determine the optimal processing settings on a case-by-case basis. In this context, tools such as NPAIRS (Non-parametric Prediction, Activation, Influence and Reproducibility reSampling),17–19 and empirical ROC (receiver operating characteristic) analysis 20–22 are used to determine optimised, subject-specific processing pipelines. These techniques are especially important in patient populations, as clinical disorders generally decrease fMRI reliability.12 23 24

Thresholds for presurgical mapping In fMRI analysis, statistical thresholds are used to estimate the extent of activation, impacting reliability and accuracy of the resulting maps.25 It has been argued that fixed statistical thresholds do not account for individual variability, differences in scanning hardware or software strategies, functional tasks or modalities or habituation to testing

Stevens MTR, et al. J Neurol Neurosurg Psychiatry 2015;0:1–8. doi:10.1136/jnnp-2015-310307

Copyright Article author (or their employer) 2015. Produced by BMJ Publishing Group Ltd under licence.

1

Downloaded from http://jnnp.bmj.com/ on April 13, 2015 - Published by group.bmj.com

Neurosurgery conditions.2 10 11 26 Our group also showed that fixed error rate thresholds do not provide optimal reliability for individual participants,22 and validation studies of concordance with CS have demonstrated that optimal thresholds vary between individuals and functional tasks.10 11 In practice manual adjustment of threshold levels is often used, with implicit risks of inter-rater differences. Data-driven thresholds address this problem by using quantitative and reproducible methods.20 22 27 These methods are sensitive to variations in fMRI activation levels, and have demonstrated reliable fMRI results across a variety of experimental conditions. Crucially, these approaches can be applied at the individual patient level.

The ROC-reliability framework We recently introduced a ROC-reliability (ROC-r) analysis framework,22 which summarises test–retest reliability through plots of the area under curve the (AUC) ROC versus analysis threshold (figure 1C). We demonstrated that ROC-r is useful for assessing fMRI reliability, selecting processing pipelines and determining optimal analysis thresholds. The ROC-r method is uniquely capable of automating the production of activation maps, producing reliable push-button results. In this study, we will demonstrate the application of ROC-r fMRI analysis to a group of patients who also received intraoperative CS mapping. The study was designed to address the following three hypotheses: (1) ROC-r reliability will be lower for patients compared to healthy controls;12 (2) reliability of singlesubject maps will be improved by optimising analysis pipelines; and (3) comparing the ROC-r optimised fMRI activations with CS mapping results, we expect higher spatial correspondence in data sets with higher reliability. We expect ROC-r to be beneficial for presurgical mapping, by combining clinically relevant quality assurance with push-button activation map production in a single framework.

METHODS Participants

Point

Age

Sex

Hand

Tumour type

Tumour location

CS

1

35

F

Rt

Left inferior frontal

X

2

46

F

Rt

24

F

Mx

4

62

M

Lt

5

20

M

Rt

6

26

F

Rt

7

51

F

Rt

Mixed oligoastrocytoma (Gr. IV) Meningioma

Left anterior temporal Left anterior temporal Left inferior frontal/ anterior parietal Left inferior parietal Right inferior frontal

X

3

Anaplastic oligoastrocytoma (Gr. III) Glioblastoma multiforme (Gr. IV) Glioblastoma multiforme (Gr. IV) Oligodenroglioma (Gr. II) Cavernous angioma

8

47

F

Lt

Ruptured cyst

9

48

F

Rt

10

45

F

Rt

11

22

M

Rt

12

32

M

Rt

13

44

M

Rt

14

23

M

Rt

15

59

F

Rt

16

41

M

Rt

Glioblastoma multiforme (Gr. IV) Glioblastoma multiforme (Gr. IV ) Diffuse astrocytoma (Gr. II) Glioblastoma multiforme (Gr. IV) Oligoastrocytoma (Gr. II) Dysembryoplastic neuroepithelial Glioblastoma multiforme (Gr. IV) Pleomorphic xanthoastrocytoma (Gr. II)

X X X X

Right frontal/ central Right superior-posterior temporal Left inferior frontal

X

Left inferior frontal

X

Left frontal/ temporal Left anterior temporal Left anterior temporal Left inferior temporal Right inferior frontal Left middle temporal

X X X X X

CS, cortical stimulation; F, female; Gr. grade; Lt, left; M, male; Mx, mixed; Rt, right.

Sixteen patients (39±13 years of age; 9 female, 7 male; 13 right-handed, 2 left-handed, 1 mixed-handed) receiving surgical intervention for brain tumours volunteered for the fMRI study. All volunteers underwent presurgical fMRI, and most received CS during surgery (n=13). This study was done in compliance with the local research ethics board (CDHA REB, Halifax, Nova Scotia, Canada), and participants provided informed consent prior to enrolment. Tumour types and locations were heterogeneous. For a complete list of age, sex, handedness, tumour location and type refer to table 1. A control group for this study was previously described.22

MRI acquisition details All 16 volunteers were scanned using a 4 T scanner (Varian INOVA). Structural images were collected with an MP-FLASH sequence: TI (inversion time) =500 ms, TR (repetition time) =10 ms, TE (echo time)=5 ms, α=11°, 256×256 matrix, 190 slices, and 0.94×0.94×1 mm voxels (FOV=24×24×19 cm). Functional images were collected with a two-shot spiral out sequence, using TR=2 s, TE=15 ms, α=90°, 64×64 matrix, 22–25 slices, and 3.75×3.75×4.0 mm voxels, with a 0.5 mm gap (FOV (field-of-view) =24×24×10–11 cm). Test–retest imaging was performed within-session. A variety of tasks were included in this study, depending on the brain tumour location and planned CS investigations for each patient (table 2). 2

Table 1 Characteristics for the 16 patient volunteers included in this study

fMRI analysis fMRI analysis was performed using the AFNI software package,28 in combination with tools written in the Python programming language. Initial preprocessing steps were applied universally, including rigid body motion correction. Segmentation isolated the brain from the functional and anatomical images. Down-sampled anatomical images were registered to the functional image using a 12 parameter affine transformation. The remaining preprocessing options were optimised individually, including: (1) spatial smoothing (3, 6, and 9 mm full width at half maximum (FWHM)), (2) motion parameter regression (MPR: on/off ), and (3) autocorrelation correction (AC: on/off ). All combinations (n=12) of these options were analysed using the ROC-r methodology described below. The default pipeline (6 mm smoothing, no MPR and no AC) was used unless significant reliability improvements were observed with an alternative combination. Statistical analysis was carried out using 3dDeconvolve/3dREMLfit in AFNI. Low frequency signal fluctuations were removed by second order polynomial regression.

ROC-reliability analysis ROC-r analysis measures test–retest reliability in terms of the overlap of active/inactive regions in the activation maps as a

Stevens MTR, et al. J Neurol Neurosurg Psychiatry 2015;0:1–8. doi:10.1136/jnnp-2015-310307

Downloaded from http://jnnp.bmj.com/ on April 13, 2015 - Published by group.bmj.com

Neurosurgery Table 2 Summary of the functional tasks used in this study Task

Type

Active condition

Blocks

Control condition

Blocks

Fixation condition

Blocks

Block length (s)

Total duration (s)

Finger tapping Object naming Sentence Reading

M

Four-finger ascending/descending paced (2 Hz) tapping Overt object naming of 3D colour images, 8/block* Correct/incorrect written sentences (eg, ‘She swept the floor with a sand’), 4/block

4

NA

0

Fixation cross

5

20

180

6

3D non-sense images†

6

Fixation cross

7

16

304

6

Correct/incorrect math statements (eg, ‘2+2=5’0 vs ‘2+2=4’)

6

Fixation cross

7

18

342

L L

A variety of tasks were included in this study, including both language and motor tasks, with various block and scan lengths, using both active and passive control conditions. *http://wiki.cnbc.cmu.edu/Objects. †http://wiki.cnbc.cmu.edu/Novel_Objects. 3D images, three dimensional images; CS, cortical stimulation; L, language; M, motor; NA, not applicable.

function of image thresholds. Briefly, one of the images is designated as the template image, acting as a measure of the true activation pattern. At a fixed threshold on the template image, the retest image is assessed against the template for true and false positive detections, and the resulting true-positive and falsepositive rates are calculated as a function of retest image threshold. This creates an ROC-reliability curve for the retest image, and this is repeated for each template image threshold (in increments of 0.1). From this, the retest AUC is plotted as a function of template threshold, and the procedure is repeated with the roles as template and retest image reversed. Currently, the ROC-r calculation takes only a few seconds on typical fMRI. The ROC-r metric ‘reliable fraction’ was used to measure of overall data set reliability. The reliable fraction is then the proportion of the image t-value range for which the AUC is more than the mid-range (ie, AUC>(AUCmax+AUCmin)/2). The reliable fraction was measured for each processing pipeline, and the best pipeline was identified for each data set. Changes in reliability with pipeline optimisation were assessed with paired t tests at the group level, whereas comparisons between controls and patients used independent samples t tests. Images were divided into ‘reliable’ and ‘unreliable’ categories based on whether the reliable fraction was above or below the group mean, respectively. ROC-r was also used for automated threshold selection. The threshold was set where the AUC curve satisfied two conditions: (1) above average AUC (ie, AUC>α(AUCmax+AUCmin)/2), and (2) below average AUC derivative (ie, dAUC/dt

Improving fMRI reliability in presurgical mapping for brain tumours.

Functional MRI (fMRI) is becoming increasingly integrated into clinical practice for presurgical mapping. Current efforts are focused on validating da...
386KB Sizes 0 Downloads 7 Views