publications
2025
- NatureLonger scans boost prediction and cut costs in brain-wide association studiesNature, Jul 2025
A pervasive dilemma in brain-wide association studies (BWAS) is whether to prioritize functional magnetic resonance imaging (fMRI) scan time or sample size. We derive a theoretical model showing that individual-level phenotypic prediction accuracy increases with sample size and total scan duration (sample size x scan time per participant). The model explains empirical prediction accuracies well across 76 phenotypes from nine resting-fMRI and task-fMRI datasets (R^2=0.89), spanning diverse scanners, acquisitions, racial groups, disorders and ages. For scans of ≤20 min, accuracy increases linearly with the logarithm of the total scan duration, suggesting that sample size and scan time are initially interchangeable. However, sample size is ultimately more important. Nevertheless, when accounting for the overhead costs of each participant (such as recruitment), longer scans can be substantially cheaper than larger sample size for improving prediction performance. To achieve high prediction performance, 10 min scans are cost inefficient. In most scenarios, the optimal scan time is at least 20 min. On average, 30 min scans are the most cost-effective, yielding 22% savings over 10 min scans. Overshooting the optimal scan time is cheaper than undershooting it, so we recommend a scan time of at least 30 min. Compared with resting-state whole-brain BWAS, the most cost-effective scan time is shorter for task-fMRI and longer for subcortical-to-whole-brain BWAS. In contrast to standard power calculations, our results suggest that jointly optimizing sample size and scan time can boost prediction accuracy while cutting costs. Our empirical reference is available online for future study design (https://thomasyeolab.github.io/OptimalScanTimeCalculator/index.html).
- Med. Image Anal.DeepResBat: deep residual batch harmonization accounting for covariate distribution differencesMedical Image Analysis, Sep 2025
Pooling MRI data from multiple datasets requires harmonization to reduce undesired inter-site variabilities, while preserving effects of biological variables (or covariates). The popular harmonization approach ComBat uses a mixed effect regression framework that explicitly accounts for covariate distribution differences across datasets. There is also significant interest in developing harmonization approaches based on deep neural networks (DNNs), such as conditional variational autoencoder (cVAE). However, current DNN approaches do not explicitly account for covariate distribution differences across datasets. Here, we provide mathematical results, suggesting that not accounting for covariates can lead to suboptimal harmonization. We propose two DNN-based covariate-aware harmonization approaches: covariate VAE (coVAE) and DeepResBat. The coVAE approach is a natural extension of cVAE by concatenating covariates and site information with site- and covariate-invariant latent representations. DeepResBat adopts a residual framework inspired by ComBat. DeepResBat first removes the effects of covariates with nonlinear regression trees, followed by eliminating site differences with cVAE. Finally, covariate effects are added back to the harmonized residuals. Using three datasets from three continents with a total of 2787 participants and 10,085 anatomical T1 scans, we find that DeepResBat and coVAE outperformed ComBat, CovBat and cVAE in terms of removing dataset differences, while enhancing biological effects of interest. However, coVAE hallucinates spurious associations between anatomical MRI and covariates even when no association exists. Future studies proposing DNN-based harmonization approaches should be aware of this false positive pitfall. Overall, our results suggest that DeepResBat is an effective deep learning alternative to ComBat. Code for DeepResBat can be found here: https://github.com/ThomasYeoLab/CBIG/tree/master/stable_projects/harmonization/An2024_DeepResBat
- Nat. Ment. HealthDistinct brain network features predict internalizing and externalizing traits in children, adolescents and adultsNature Mental Health, Feb 2025
The distinction between externalizing and internalizing traits has been a classic area of study in psychiatry. However, whether shared or unique brain network features predict internalizing and externalizing behaviors remains poorly understood. Using a sample of 5,260 children from the Adolescent Brain Cognitive Development study, 229 adolescents from the Healthy Brain Network and 423 adults from the Human Connectome Project, we show that predictive network features are, at least in part, distinct across internalizing and externalizing behaviors. Across all three samples, behaviors within internalizing and externalizing categories exhibited more similar predictive feature weights than behaviors between categories. These data suggest shared and unique brain network features account for individual variation within broad internalizing and externalizing categories across developmental stages.
2024
- PNASIn vivo whole-cortex marker of excitation-inhibition ratio indexes cortical maturation and cognitive ability in youthS. Zhang*, B. Larsen*, V. Sydnor*, and othersPNAS, Apr 2024
A balanced excitation-inhibition ratio (E/I ratio) is critical for healthy brain function. Normative development of cortex-wide E/I ratio remains unknown. Here, we noninvasively estimate a putative marker of whole-cortex E/I ratio by fitting a large-scale biophysically plausible circuit model to resting-state functional MRI (fMRI) data. We first confirm that our model generates realistic brain dynamics in the Human Connectome Project. Next, we show that the estimated E/I ratio marker is sensitive to the gamma-aminobutyric acid (GABA) agonist benzodiazepine alprazolam during fMRI. Alprazolam-induced E/I changes are spatially consistent with positron emission tomography measurement of benzodiazepine receptor density. We then investigate the relationship between the E/I ratio marker and neurodevelopment. We find that the E/I ratio marker declines heterogeneously across the cerebral cortex during youth, with the greatest reduction occurring in sensorimotor systems relative to association systems. Importantly, among children with the same chronological age, a lower E/I ratio marker (especially in the association cortex) is linked to better cognitive performance. This result is replicated across North American (8.2 to 23.0 y old) and Asian (7.2 to 7.9 y old) cohorts, suggesting that a more mature E/I ratio indexes improved cognition during normative development. Overall, our findings open the door to studying how disrupted E/I trajectories may lead to cognitive dysfunction in psychopathology that emerges during youth.
- Commun. Biol.Macroscale intrinsic dynamics are associated with microcircuit function in focal and generalized epilepsiesS. Yang, Y. Zhou, C. Peng, and othersCommunications Biology, Feb 2024
Epilepsies are a group of neurological disorders characterized by abnormal spontaneous brain activity, involving multiscale changes in brain functional organizations. However, it is not clear to what extent the epilepsy-related perturbations of spontaneous brain activity affect macroscale intrinsic dynamics and microcircuit organizations, that supports their pathological relevance. We collect a sample of patients with temporal lobe epilepsy (TLE) and genetic generalized epilepsy with tonic-clonic seizure (GTCS), as well as healthy controls. We extract massive temporal features of fMRI BOLD time-series to characterize macroscale intrinsic dynamics, and simulate microcircuit neuronal dynamics used a large-scale biological model. Here we show whether macroscale intrinsic dynamics and microcircuit dysfunction are differed in epilepsies, and how these changes are linked. Differences in macroscale gradient of time-series features are prominent in the primary network and default mode network in TLE and GTCS. Biophysical simulations indicate reduced recurrent connection within somatomotor microcircuits in both subtypes, and even more reduced in GTCS. We further demonstrate strong spatial correlations between differences in the gradient of macroscale intrinsic dynamics and microcircuit dysfunction in epilepsies. These results emphasize the impact of abnormal neuronal activity on primary network and high-order networks, suggesting a systematic abnormality of brain hierarchical organization.
- Imaging Neurosci.Multilayer meta-matching: translating phenotypic prediction models from multiple datasets to small dataImaging Neuroscience, Jul 2024
Resting-state functional connectivity (RSFC) is widely used to predict phenotypic traits in individuals. Large sample sizes can significantly improve prediction accuracies. However, for studies of certain clinical populations or focused neuroscience inquiries, small-scale datasets often remain a necessity. We have previously proposed a “meta-matching” approach to translate prediction models from large datasets to predict new phenotypes in small datasets. We demonstrated a large improvement over classical kernel ridge regression (KRR) when translating models from a single source dataset (UK Biobank) to the Human Connectome Project Young Adults (HCP-YA) dataset. In the current study, we propose two meta-matching variants (“meta-matching with dataset stacking” and “multilayer meta-matching”) to translate models from multiple source datasets across disparate sample sizes to predict new phenotypes in small target datasets. We evaluate both approaches by translating models trained from five source datasets (with sample sizes ranging from 862 participants to 36,834 participants) to predict phenotypes in the HCP-YA and HCP-Aging datasets. We find that multilayer meta-matching modestly outperforms meta-matching with dataset stacking. Both meta-matching variants perform better than the original “meta-matching with stacking” approach trained only on the UK Biobank. All meta-matching variants outperform classical KRR and transfer learning by a large margin. In fact, KRR is better than classical transfer learning when less than 50 participants are available for finetuning, suggesting the difficulty of classical transfer learning in the very small sample regime. The multilayer meta-matching model is publicly available at https://github.com/ThomasYeoLab/Meta_matching_models/tree/main/rs-fMRI/v2.0.
- Nat. Neurosci.Ventral attention network connectivity is linked to cortical maturation and cognitive ability in childhoodH. Dong, X. Zhang, L. Labache, S. Zhang, and othersNature Neuroscience, Aug 2024
The human brain experiences functional changes through childhood and adolescence, shifting from an organizational framework anchored within sensorimotor and visual regions into one that is balanced through interactions with later-maturing aspects of association cortex. Here, we link this profile of functional reorganization to the development of ventral attention network connectivity across independent datasets. We demonstrate that maturational changes in cortical organization link preferentially to within-network connectivity and heightened degree centrality in the ventral attention network, whereas connectivity within network-linked vertices predicts cognitive ability. This connectivity is associated closely with maturational refinement of cortical organization. Children with low ventral attention network connectivity exhibit adolescent-like topographical profiles, suggesting that attentional systems may be relevant in understanding how brain functions are refined across development. These data suggest a role for attention networks in supporting age-dependent shifts in cortical organization and cognition across childhood and adolescence.
2023
- NeuroImageRelationship between prediction accuracy and feature importance reliability: An empirical and theoretical studyNeuroImage, Apr 2023
There is significant interest in using neuroimaging data to predict behavior. The predictive models are often interpreted by the computation of feature importance, which quantifies the predictive relevance of an imaging feature. Tian and Zalesky (2021) suggest that feature importance estimates exhibit low split-half reliability, as well as a trade-off between prediction accuracy and feature importance reliability across parcellation resolutions. However, it is unclear whether the trade-off between prediction accuracy and feature importance reliability is universal. Here, we demonstrate that, with a sufficient sample size, feature importance (operationalized as Haufe-transformed weights) can achieve fair to excellent split-half reliability. With a sample size of 2600 participants, Haufe-transformed weights achieve average intra-class correlation coefficients of 0.75, 0.57 and 0.53 for cognitive, personality and mental health measures respectively. Haufe-transformed weights are much more reliable than original regression weights and univariate FC-behavior correlations. Original regression weights are not reliable even with 2600 participants. Intriguingly, feature importance reliability is strongly positively correlated with prediction accuracy across phenotypes. Within a particular behavioral domain, there is no clear relationship between prediction performance and feature importance reliability across regression models. Furthermore, we show mathematically that feature importance reliability is necessary, but not sufficient, for low feature importance error. In the case of linear models, lower feature importance error is mathematically related to lower prediction error. Therefore, higher feature importance reliability might yield lower feature importance error and higher prediction accuracy. Finally, we discuss how our theoretical results relate with the reliability of imaging features and behavioral measures. Overall, the current study provides empirical and theoretical insights into the relationship between prediction accuracy and feature importance reliability.
2022
- PLOS BiologyPopulation heterogeneity in clinical cohorts affects the predictive accuracy of brain imagingPLOS Biology, Apr 2022
Brain imaging research enjoys increasing adoption of supervised machine learning for single-participant disease classification. Yet, the success of these algorithms likely depends on population diversity, including demographic differences and other factors that may be outside of primary scientific interest. Here, we capitalize on propensity scores as a composite confound index to quantify diversity due to major sources of population variation. We delineate the impact of population heterogeneity on the predictive accuracy and pattern stability in 2 separate clinical cohorts: the Autism Brain Imaging Data Exchange (ABIDE, n = 297) and the Healthy Brain Network (HBN, n = 551). Across various analysis scenarios, our results uncover the extent to which cross-validated prediction performances are interlocked with diversity. The instability of extracted brain patterns attributable to diversity is located preferentially in regions part of the default mode network. Collectively, our findings highlight the limitations of prevailing deconfounding practices in mitigating the full consequences of population diversity.
- NeuroImagePopulation heterogeneity in clinical cohorts affects the predictive accuracy of brain imagingLQR Ooi, J. Chen, S. Zhang, and othersNeuroImage, Sep 2022
A fundamental goal across the neurosciences is the characterization of relationships linking brain anatomy, functioning, and behavior. Although various MRI modalities have been developed to probe these relationships, direct comparisons of their ability to predict behavior have been lacking. Here, we compared the ability of anatomical T1, diffusion and functional MRI (fMRI) to predict behavior at an individual level. Cortical thickness, area and volume were extracted from anatomical T1 images. Diffusion Tensor Imaging (DTI) and approximate Neurite Orientation Dispersion and Density Imaging (NODDI) models were fitted to the diffusion images. The resulting metrics were projected to the Tract-Based Spatial Statistics (TBSS) skeleton. We also ran probabilistic tractography for the diffusion images, from which we extracted the stream count, average stream length, and the average of each DTI and NODDI metric across tracts connecting each pair of brain regions. Functional connectivity (FC) was extracted from both task and resting-state fMRI. Individualized prediction of a wide range of behavioral measures were performed using kernel ridge regression, linear ridge regression and elastic net regression. Consistency of the results were investigated with the Human Connectome Project (HCP) and Adolescent Brain Cognitive Development (ABCD) datasets. In both datasets, FC-based models gave the best prediction performance, regardless of regression model or behavioral measure. This was especially true for the cognitive component. Furthermore, all modalities were able to predict cognition better than other behavioral components. Combining all modalities improved prediction of cognition, but not other behavioral components. Finally, across all behaviors, combining resting and task FC yielded prediction performance similar to combining all modalities. Overall, our study suggests that in the case of healthy children and young adults, behaviorally-relevant information in T1 and diffusion features might reflect a subset of the variance captured by FC.
2021
- Nat. Commun.Sensory-motor cortices shape functional connectivity dynamics in the human brainNature Commnunications, Nov 2021
Large-scale biophysical circuit models provide mechanistic insights into the micro-scale and macro-scale properties of brain organization that shape complex patterns of spontaneous brain activity. We developed a spatially heterogeneous large-scale dynamical circuit model that allowed for variation in local synaptic properties across the human cortex. Here we show that parameterizing local circuit properties with both anatomical and functional gradients generates more realistic static and dynamic resting-state functional connectivity (FC). Furthermore, empirical and simulated FC dynamics demonstrates remarkably similar sharp transitions in FC patterns, suggesting the existence of multiple attractors. Time-varying regional fMRI amplitude may track multi-stability in FC dynamics. Causal manipulation of the large-scale circuit model suggests that sensory-motor regions are a driver of FC dynamics. Finally, the spatial distribution of sensory-motor drivers matches the principal gradient of gene expression that encompasses certain interneuron classes, suggesting that heterogeneity in excitation-inhibition balance might shape multi-stability in FC dynamics.