To directly compare the efficacy of these approaches, researchers meticulously designed identical tasks, assigning them to distinct groups. Some teams relied solely on human expertise, meticulously sifting through the data with established methodologies. In contrast, other teams comprised scientists collaborating with sophisticated AI tools. The core challenge presented to both groups was to accurately predict preterm birth using comprehensive data from over 1,000 pregnant women.
Remarkably, even a junior research pair, consisting of UCSF master’s student Reuben Sarwal and high school student Victor Tarca, achieved significant success in developing prediction models with the crucial support of AI. The AI system demonstrated an extraordinary capability, generating functional computer code in mere minutes – a feat that would typically consume several hours, if not days, for experienced human programmers.
This remarkable advantage stemmed from the AI’s inherent ability to generate analytical code based on concise yet highly specific natural language prompts. It is important to note that not all AI systems performed at this exceptional level; only 4 out of the 8 AI chatbots evaluated produced usable code. However, the systems that succeeded did so without requiring extensive guidance from large teams of specialists, further highlighting their efficiency.
The sheer speed afforded by generative AI allowed the junior researchers to swiftly complete their experiments, rigorously verify their findings, and, impressively, submit their results to a scientific journal within a condensed timeframe of a few months. Marina Sirota, PhD, a distinguished professor of Pediatrics and interim director of the Bakar Computational Health Sciences Institute (BCHSI) at UCSF, who also serves as the principal investigator of the March of Dimes Prematurity Research Center at UCSF, emphasized the profound impact of these advancements. She stated, "These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines. The speed-up couldn’t come sooner for patients who need help now." Dr. Sirota is the co-senior author of the study, which was published in the esteemed journal Cell Reports Medicine on February 17th.
The Critical Importance of Preterm Birth Research
Accelerating data analysis in this field holds immense potential for improving diagnostic tools for preterm birth. Preterm birth stands as the leading cause of newborn mortality and is a significant contributor to long-term motor and cognitive challenges in children. In the United States alone, an alarming approximately 1,000 babies are born prematurely every single day, underscoring the urgency of this research.
Despite extensive efforts, the precise causes of preterm birth remain incompletely understood. To delve deeper into potential risk factors, Dr. Sirota’s team meticulously compiled microbiome data from approximately 1,200 pregnant women whose pregnancy outcomes were carefully tracked across nine distinct studies. Dr. Tomiko T. Oskotsky MD, co-director of the March of Dimes Preterm Birth Data Repository, an associate professor in UCSF BCHSI, and a co-author of the paper, highlighted the collaborative nature of this research. "This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers," she remarked.
However, analyzing such a vast and intricate dataset presented substantial challenges. To address this, the researchers turned to a global crowdsourcing competition known as DREAM (Dialogue on Reverse Engineering Assessment and Methods). Dr. Sirota co-led one of three DREAM pregnancy challenges, with a specific focus on vaginal microbiome data. This challenge attracted the participation of over 100 teams worldwide, who developed machine learning models designed to identify patterns associated with preterm birth. While most of these groups successfully completed their work within the three-month competition window, the subsequent consolidation of findings and their publication took nearly two years.
Rigorous Testing of Generative AI on Pregnancy and Microbiome Data
Intrigued by the possibility that generative AI could significantly shorten such lengthy timelines, Dr. Sirota’s group forged a partnership with researchers led by Adi L. Tarca, PhD, a co-senior author and professor in the Center for Molecular Medicine and Genetics at Wayne State University in Detroit, MI. Dr. Tarca had previously led the other two DREAM challenges, which were centered on enhancing methods for estimating pregnancy stage.
In this collaborative effort, the researchers provided instructions to eight distinct AI systems, tasking them with independently generating algorithms using the same datasets from the three DREAM challenges, critically, without direct human coding intervention. The AI chatbots were provided with carefully crafted natural language instructions. Similar to the functionality of platforms like ChatGPT, these systems were guided through detailed prompts specifically designed to steer their analysis of the health data in ways that mirrored the approaches taken by the original DREAM participants.
The objectives assigned to the AI systems were directly analogous to those of the earlier challenges. The AI systems were tasked with analyzing vaginal microbiome data to identify potential indicators of preterm birth. Additionally, they were instructed to examine blood or placental samples to estimate gestational age. Accurately dating a pregnancy is almost always an estimation, yet it profoundly influences the type of care women receive as their pregnancies progress. Inaccurate estimates can significantly complicate preparation for labor and delivery.
Following the AI’s code generation, researchers meticulously ran the generated code using the DREAM datasets. The results were compelling: only 4 out of the 8 AI tools produced models that achieved comparable performance to the human teams, with some AI-generated models even demonstrating superior predictive capabilities. The entire generative AI initiative, from its initial conception to the submission of a research paper, was remarkably completed in just six months.
The scientists involved in this study are keen to emphasize that AI systems still necessitate careful human oversight. These powerful tools can, at times, produce misleading results, and human expertise remains absolutely essential for interpretation and validation. However, by rapidly processing and sorting through massive health datasets, generative AI has the distinct potential to liberate researchers from the often time-consuming tasks of troubleshooting code, allowing them to dedicate more valuable time to interpreting findings and formulating critical scientific questions.
Dr. Tarca articulated the transformative potential of these technologies, stating, "Thanks to generative AI, researchers with a limited background in data science won’t always need to form wide collaborations or spend hours debugging code. They can focus on answering the right biomedical questions."
Contributing Authors and Funding Acknowledgements:
The UCSF authors contributing to this groundbreaking research include Reuben Sarwal, Claire Dubin, Sanchita Bhattacharya, MS, and Atul Butte, MD, PhD. Additional authors include Victor Tarca (Huron High School, Ann Arbor, MI), Nikolas Kalavros and Gustavo Stolovitzky, PhD (New York University), Gaurav Bhatti (Wayne State University), and Roberto Romero, MD, D(Med)Sc (National Institute of Child Health and Human Development (NICHD)).
This vital work was made possible through generous funding from the March of Dimes Prematurity Research Center at UCSF and ImmPort. The data utilized in this study was generated, in part, with crucial support from the Pregnancy Research Branch of the NICHD.

