Background

urn:lsid:arphahub.com:pub:8E638694-B4E0-570A-856A-746FF325BF6B

Research Ideas and Outcomes

RIO

2367-7163

Pensoft Publishers

10.3897/rio.3.e12569

12569

7069

Project Report

How much motion is too much motion? Determining motion thresholds by sample size for reproducibility in developmental resting-state MRI

Leonard

Julia

1 Flournoy

John

2 Lewis-de los Angeles

Christine Paula

3 Whitaker

Kirstie

kw401@cam.ac.uk 4 5

Massachusetts Institute of Technology, Cambridge, United States of America

Massachusetts Institute of Technology

Cambridge

United States of America 2

University of Oregon, Eugene, United States of America

University of Oregon

Eugene

United States of America 3

Northwestern, Evanston, United States of America

Northwestern

Evanston

United States of America 4

University of Cambridge, Cambridge, United Kingdom

University of Cambridge

Cambridge

United Kingdom 5

The Alan Turing Institute, London, United Kingdom

The Alan Turing Institute

London

United Kingdom

Corresponding author: Kirstie Whitaker (kw401@cam.ac.uk).

Academic editor:

2017

08 03 2017

e12569

94A1EAB2-4FE6-502E-8E6A-F950B0A69AFE 375859 06 03 2017

Julia Leonard, John Flournoy, Christine Paula Lewis-de los Angeles, Kirstie Whitaker

This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Keywords Head motion Developmental neuroimaging Replicability

This project report refers to work initiated at Neurohackweek 2016. Neurohackweek was supported through a grant from the Gordon & Betty Moore Foundation and the Alfred P. Sloan Foundation to the University of Washington eScience Institute Data Science Environment. KJW is funded by a Mozilla Science Lab fellowship.

Background

A constant problem developmental imagers face is in-scanner head motion (Poldrack et al. 2002, Raschle et al. 2012). Children move more than adults and this has led to concerns that developmental changes in resting-state connectivity measures may be artefactual (Van Dijk et al. 2011, Satterthwaite et al. 2012). Furthermore, typically-developing children and children with developmental disorders are challenging to recruit into studies and researchers may engage in extensive mock scanner motion training with participants and/or may take permissive stance when setting exclusion criteria on head motion (de Bie et al. 2010, Yerys et al. 2009). Yet, no one has systematically examined what motion cutoffs should be used to make reliable inferences in developmental data and how this might vary by both sample size and age range.

Here, we systematically examine the effects of multiple motion exclusion criteria at different sample sizes and age ranges in a large openly available developmental cohort (ABIDE; Di Martino et al. 2013, Cameron et al. 2013;http://preprocessed-connectomes-project.org/abide) on both reliability of resting state functional magnetic resonance imaging (rs-fMRI) pairwise connectivity and Autism/healthy control prediction accuracy.

Methods

In a cohort of 743 children (aged 6 to 18 years, 620 male), we varied motion cutoffs and sample size to explore how these variables impacted both split-half reliability and prediction accuracy of autism diagnosis using machine-learning. Specifically, we adjusted the sample size (from 10 to 100 participants) and the permitted number of volumes that exceeded a displacement from the previous volume by 0.2 mm (from 0 to 100%; details at http://preprocessed-connectomes-project.org/abide/quality_assessment.html). The input data for all analyses were individual pairwise correlation matrices using the 116 regions of interest (ROIs) defined in the Automated Anatomical Labeling (AAL) atlas (Tzourio-Mazoyer et al. 2002). For both analyses described below we selected two matched groups according to our sample size and motion criteria, and ensured they were balanced for age, sex, diagnosis, and scanning site. Data and all code to reproduce the analyses can be found at GitHub (Flournoy and Leonard 2017).

For the split-half reliability analyses, we averaged the individual correlation matrices to give the average connection between each ROI-ROI pair in each group. We computed R-squared values for the fit between all the average pairwise correlations assuming the two groups were equal (Fig. 1) r each sample size and motion cutoff, we ran 100 permutations to identify a median R-squared value and therefore were able to create a value of “reliability” between two samples by motion threshold and sample size.

Another measure of how motion thresholds change the replicability of an analysis is out-of-sample predictive accuracy. We used the participants' resting state functional connectivity matrices as features to predict diagnostic category (Autism spectrum disorder vs typically developing controls). We designated one half of the data to be a training set and reserved the other for testing our model. The training generated a support vector machine (SVM) classifier with an L1 penalty tuned using 10-fold cross-validation (Pedregosa et al. 2011) classifier was then used to predict diagnosis labels in the test set, with classification accuracy as our outcome of interest. Both the test-training split, as well as the 10-fold splits within the training data, were stratified so that the proportion of cases and controls were roughly equivalent in each split. For each sample size and motion cut off we ran 500 permutations. We compared the estimated prediction accuracy to a baseline rate that would be achieved by predicting that all diagnosis labels are the same for whichever diagnostic category is the most prevalent -- that is, if in a sample of 90 controls and 10 cases, one could achieve 90% accuracy by predicting that every participant is labeled a control.

Results

The split-half reliability analysis showed that reliability is primarily sensitive to the number of participants considered, with more participants leading to higher reliability (Fig. 2). Motion cutoffs didn’t seem to have a strong effect on reliability. Although this is comforting, it is important to note that while some studies still average across subjects to look at group differences, many are moving towards predicting individual differences. Our results do not speak to the sensitivity of individual difference analyses to motion.

The results of the out-of-sample predictive accuracy analyses show that prediction accuracy is not only dependent on sample size but also on motion cutoffs. The best prediction was found in larger sample sizes with lower motion thresholds (Fig. 3). In sample sizes of 60 or more, median prediction accuracy is steadily above the baseline of a naive classifier that assumes that all participants share the modal diagnosis (in this case, non-ASD). However, out-of-sample prediction accuracy varies across the different permutations of the data within each sample-size and motion threshold iteration, and a large proportion of classifiers perform worse than baseline. We only tested one machine learning strategy and it is likely that the exact model will also affect the prescribed “best” motion cutoff and sample size.

As expected, larger sample sizes improve both of our reliability measures (R² and prediction accuracy). We found that prediction accuracy decreased when the exclusion criteria for motion was made more lenient.

Conclusions and future directions

While this project is far from complete, we have shown that motion cutoffs, and sample sizes, and age ranges do affect reliability in developmental data. In future work, we would also like to explore how both motion thresholds and sample sizes might affect reliability differently by age range. Our end goal is to provide tool for authors to check their own datasets against our findings to ensure they make informed decisions when designing future developmental neuroimaging studies.

In a larger sense though, we have shown that bringing people together who work in a similar field (cognitive neuroscience) but from diverse backgrounds (developmental psychology, psychiatry, computational modeling, developmental cognitive neuroscience) for a one week hackathon can foster novel solutions to old problems. This cross-pollination of ideas brought a much needed fresh, rigorous methodological approach to developmental imaging and the week of fast learning inspired and prepared the next generation of cognitive neuroscientists to create thoughtful and reproducible work in the future.

Acknowledgements

References

Cameron

Craddock

Yassine

Benhajali

Carlton

Chu

Francois

Chouinard

Alan

Evans

András

Jakab

Budhachandra

Khundrakpam

John

Lewis

Qingyang

Michael

Milham

Chaogan

Yan

Pierre

Bellec

2013

The Neuro Bureau Preprocessing Initiative: open sharing of preprocessed neuroimaging data and derivatives

Frontiers in Neuroinformatics 7 http://dx.doi.org/10.3389/conf.fninf.2013.09.00041

10.3389/conf.fninf.2013.09.00041

de Bie

Henrica M. A.

Boersma

Maria

Wattjes

Mike P.

Adriaanse

Sofie

Vermeulen

R. Jeroen

Oostrom

Kim J.

Huisman

Jaap

Veltman

Dick J.

de Waal

Henriette A. Delemarre-Van

2010

Preparing children with a mock scanner training protocol results in high quality structural and functional MRI scans

European Journal of Pediatrics 169 9 1079 1085 http://dx.doi.org/10.1007/s00431-010-1181-z

10.1007/s00431-010-1181-z

Di Martino

Yan

C-G

Denio

Castellanos

F X

Alaerts

Anderson

J S

Assaf

Bookheimer

S Y

Dapretto

Deen

Delmonte

Dinstein

Ertl-Wagner

Fair

D A

Gallagher

Kennedy

D P

Keown

C L

Keysers

Lainhart

J E

Lord

Luna

Menon

Minshew

N J

Monk

C S

Mueller

Müller

R-A

Nebel

M B

Nigg

J T

O'Hearn

Pelphrey

K A

Peltier

S J

Rudie

J D

Sunaert

Thioux

Tyszka

J M

Uddin

L Q

Verhoeven

J S

Wenderoth

Wiggins

J L

Mostofsky

S H

Milham

M P

2013

The autism brain imaging data exchange: towards a large-scale evaluation of the intrinsic brain architecture in autism

Molecular Psychiatry 19 6 659 667 http://dx.doi.org/10.1038/mp.2013.78

10.1038/mp.2013.78

Flournoy

John

Leonard

Julia

2017

Kids_Rsfmri_Motion: Brainhack Proceedings Submission - 20160201

Zenodo http://dx.doi.org/10.5281/ZENODO.266920

10.5281/ZENODO.266920

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Duchesnay

É.

2011

Scikit-learn: Machine Learning in Python

Journal of Machine Learning Research 12 10 2825 2830 http://www.jmlr.org/papers/v12/pedregosa11a.html

Poldrack

Russell A

Paré-Blagoev

E Juliana

Grant

P Ellen

2002

Pediatric functional magnetic resonance imaging: progress and challenges.

Topics in magnetic resonance imaging : TMRI 13 1 61 70 http://doi.org/10.1097/00002142-200202000-00005

10.1097/00002142-200202000-00005

Raschle

Nora

Zuk

Jennifer

Ortiz-Mantilla

Silvia

Sliva

Danielle D

Franceschi

Angela

Grant

P Ellen

Benasich

April A

Gaab

Nadine

2012

Pediatric neuroimaging in early childhood and infancy: challenges and practical guidelines.

Annals of the New York Academy of Sciences 1252 43 50 http://dx.doi.org/10.1111/j.1749-6632.2012.06457.x

10.1111/j.1749-6632.2012.06457.x

Satterthwaite

Theodore D

Elliott

Mark A

Gerraty

Raphael T

Ruparel

Kosha

Loughead

James

Calkins

Monica E

Eickhoff

Simon B

Hakonarson

Hakon

Gur

Ruben C

Gur

Raquel E

Wolf

Daniel H

2012

An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data.

NeuroImage 64 240 56 http://dx.doi.org/10.1016/j.neuroimage.2012.08.052

10.1016/j.neuroimage.2012.08.052

Tzourio-Mazoyer

Landeau

Papathanassiou

Crivello

Etard

Delcroix

Mazoyer

Joliot

2002

Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain.

NeuroImage 15 1 273 89 http://dx.doi.org/10.1006/nimg.2001.0978

10.1006/nimg.2001.0978

Van Dijk

Koene R A

Sabuncu

Mert R

Buckner

Randy L

2011

The influence of head motion on intrinsic functional connectivity MRI.

NeuroImage 59 1 431 8 http://dx.doi.org/10.1016/j.neuroimage.2011.07.044

10.1016/j.neuroimage.2011.07.044

Yerys

Benjamin E.

Jankowski

Kathryn F.

Shook

Devon

Rosenberger

Lisa R.

Barnes

Kelly Anne

Berl

Madison M.

Ritzl

Eva K.

VanMeter

John

Vaidya

Chandan J.

Gaillard

William D.

2009

The fMRI success rate of children and adolescents: Typical development, epilepsy, attention deficit/hyperactivity disorder, and autism spectrum disorders

Human Brain Mapping 30 10 3426 3435 http://dx.doi.org/10.1002/hbm.20767

10.1002/hbm.20767

Figure 1.

In order to investigate the effects of age range, motion exclusion threshold and sample size on functional connectiivity reliability we split the data into two matched samples. For the reliability analysis we averaged all participants in each sample and then calculated how well aligned the two groups were in terms of each pairwise regional connectivity measure. For the out-of-sample prediction analysis we used one half of the data to train a model and then tested it on the other half.

Figure 2.

Split-half reliability results showing how sample size (N) has a large effect on R squared (median R squared from 100 permutations) while motion threshold does not. Error bars represent average 95% confidence intervals across 100 permutations. Code and output can be found on GitHub (Flournoy and Leonard 2017).

Figure 3.

Out of sample prediction accuracy of autism diagnosis using resting state data as a function of sample size and motion-based exclusion criteria (percentage of fMRI, whole-brain volumes exceeding threshold). Red line is a naive classifier that assumes that all participants share the modal diagnosis (in this case, non-ASD). The black line spans the 5th to 95th percentile accuracy across iterations using a linear SVM, with the black points at the median value. Code and output can be found on GitHub (Flournoy and Leonard 2017).