Data in Action Spotlight: Advancing the Study of the Body’s Metabolic Systems through Bold Approaches in Data Repository Management

Have you ever wondered how the “normal” resting heart rate for your age group, used by your doctor, was determined or how the American Diabetes Association established the “normal” fasting glucose value? Establishing these “normal” ranges for metabolic data has presented challenges for medical researchers, practitioners, and those that share the information with the public for many years. Understanding these “normal” ranges for metabolism is complex, especially because the human body may contain tens of thousands of metabolites at any one time; each individual molecule could be tied to a variety of processes that together make up the human metabolism. Even so, researchers and practitioners desire a basis of comparison while the field of metabolomics develops. Insights into these processes, combined with advances in genomics, proteomics, the microbiome, and other -omics, are poised to transform human, animal and plant health over the next several years.

Thanks to funding from the National Institutes of Health (NIH), a group of scientists – including San Diego Supercomputer Center’s bioinformatics researcher Eoin Fahy – have recently made a significant leap in ensuring that “normal” numbers are better understood for individuals rather than the human population as a whole. Fahy has been working with UC San Diego Bioengineering Professor Shankar Subramaniam on developing and hosting the National Metabolomics Data Repository (NMDR) for the past several years. Rather than remain content to host data for others’ uses, the NMDR team realized they were in a unique situation – with access to hundreds of metabolomics studies in the repository – and developed a tool, MetStat, that provides summary information for “normal” ranges across more than 1800 metabolomics studies. Instead of examining the “normal” ranges for an overall population, MetStat allows users to look at data regarding specific metabolites within large groups of datasets and to see what is ‘“typical”.

For instance, one of the studies found in the repository encompasses a diabetes study containing data from 12,000 worldwide patient samples. The metadata includes metabolite measurements that can help practitioners better understand individual “typical” ranges for those suffering from diabetes.

Founded on the principles of FAIR (findable, accessible, interoperable, and reusable) data, the National Metabolomics Data Repository has implemented a robust set of studies that provide metadata on a range of worldwide research including 195 cases involving cancer and 66 regarding diabetes. 

“Metabolite markers in blood is our first window into human physiology,” said Subramaniam, principal investigator for the NIH-funded repository. “Our project takes this information and creates the ability to study human metabolomics.”

Not only does the repository allow users to access the metadata, there are multiple user-friendly tools in place for researchers to compare and analyze the data combined with their own data. The project is a true representation of one following the FAIR principles and Fahy further explained how specific scientists are using the repository.

“Having a large open-source repository of studies covering a broad range of species, tissue sources, disease associations and analytical instrumentation provides a valuable resource for metabolomics researchers around the world” said Fahy. “Free access to the underlying raw data and experimental conditions also supports re-analysis and comparative analysis across multiple studies and other ‘omics’ platforms which is likely to lead to key insights in systems biology in the future”.

The NMDR is funded by NIH grant number U2C-DK119886.

About the West Big Data Innovation Hub: 

The West Big Data Innovation Hub is one of four regional hubs funded by the National Science Foundation (NSF) to build and strengthen strategic partnerships across industry, academia, nonprofits, and government. The West Hub community aims to catalyze and scale data science for societal needs – connecting research, education, and practice in thematic areas such as natural resources and hazards, metro data science, health, and data-enabled discovery and learning. Coordinated by UC Berkeley’s Division of Computing, Data Science, and Society, the San Diego Supercomputer Center, and the University of Washington, the West Hub region includes contributors and data enthusiasts from Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, Wyoming, and a global network of partners.