The research methodology was designed for direct comparison, assigning identical predictive tasks to distinct groups. One set of teams relied exclusively on human expertise, while the other comprised scientists collaborating with advanced AI tools. The core challenge was to develop accurate predictive models for preterm birth, utilizing a comprehensive dataset encompassing information from over 1,000 pregnant women. The results were striking, demonstrating the transformative potential of AI in accelerating complex data analysis.
A particularly compelling outcome emerged from a junior research pair, consisting of Reuben Sarwal, a master’s student at UCSF, and Victor Tarca, a high school student. Empowered by AI support, they successfully developed functional prediction models. The AI system was capable of generating the necessary computer code in a matter of minutes – a task that would typically demand several hours, if not days, from experienced human programmers. This remarkable efficiency stems from the AI’s inherent ability to translate concise yet highly specific natural language prompts into functional analytical code. While not every AI system proved effective – only 4 out of 8 AI chatbots produced usable code – those that succeeded did so without the need for extensive, large teams of human specialists to guide their development.
The sheer speed advantage afforded by generative AI allowed this junior research team to complete their experiments, rigorously verify their findings, and submit their results to a journal within an unprecedentedly short timeframe of a few months. Marina Sirota, PhD, a distinguished professor of Pediatrics and the interim director of the Bakar Computational Health Sciences Institute (BCHSI) at UCSF, who also serves as the principal investigator of the March of Dimes Prematurity Research Center at UCSF, articulated the profound implications of this advancement. She stated, "These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines. The speed-up couldn’t come sooner for patients who need help now." Dr. Sirota is a co-senior author of the study, underscoring her pivotal role in this research.
The significance of accelerating preterm birth research cannot be overstated. Preterm birth is a devastating reality, recognized as the leading cause of newborn death and a major contributor to long-term motor and cognitive challenges in children. In the United States alone, an estimated 1,000 babies are born prematurely each day, highlighting the urgent need for more effective diagnostic tools and preventive strategies. Despite extensive research, the precise causes of preterm birth remain incompletely understood.
To address this knowledge gap, Dr. Sirota’s team undertook the monumental task of compiling microbiome data from approximately 1,200 pregnant women whose birth outcomes had been meticulously tracked across nine separate studies. This endeavor was made possible by the principles of open data sharing and the pooling of expertise from numerous researchers, a sentiment echoed by Tomiko T. Oskotsky MD, co-director of the March of Dimes Preterm Birth Data Repository, associate professor in UCSF BCHSI, and a co-author of the paper. Dr. Oskotsky remarked, "This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers."
However, analyzing such a vast and inherently complex dataset presented a formidable challenge. To surmount this hurdle, the researchers initially engaged with a global crowdsourcing competition known as DREAM (Dialogue on Reverse Engineering Assessment and Methods). Dr. Sirota co-led one of three DREAM pregnancy challenges, specifically focusing on the analysis of vaginal microbiome data. This competition saw the participation of over 100 teams worldwide, all dedicated to developing machine learning models capable of identifying patterns linked to preterm birth. While most teams completed their work within the three-month competition window, the subsequent consolidation of findings and their publication took nearly two years.
Intrigued by the potential of generative AI to drastically shorten such timelines, Dr. Sirota’s group forged a partnership with researchers led by Adi L. Tarca, PhD, a co-senior author and professor in the Center for Molecular Medicine and Genetics at Wayne State University in Detroit, MI. Dr. Tarca, who had previously led the other two DREAM challenges focused on refining methods for estimating pregnancy stage, brought invaluable expertise to the collaboration.
Together, the research teams tasked eight distinct AI systems with independently generating algorithms using the same datasets from the three DREAM challenges. Crucially, this was to be accomplished without direct human coding intervention. The AI chatbots were provided with meticulously crafted natural language instructions, akin to the interaction with systems like ChatGPT. These detailed prompts were designed to guide the AI towards analyzing the health data in a manner comparable to the original DREAM participants.
The objectives assigned to the AI systems mirrored those of the earlier human-led challenges. Specifically, the AI models were instructed to analyze vaginal microbiome data to identify indicators of preterm birth. Additionally, they were tasked with examining blood or placental samples to estimate gestational age. Accurate pregnancy dating is paramount, as it dictates the type of care pregnant women receive throughout their gestation. Inaccurate estimates can significantly complicate preparations for labor and delivery.
Following the AI’s code generation, researchers ran these algorithms using the original DREAM datasets. The results indicated that 4 out of the 8 AI tools successfully produced models that matched, and in some instances, outperformed the results achieved by the human teams. The entire generative AI initiative, from its conceptualization to the submission of a research paper, was completed in a remarkable six months.
Despite these impressive advancements, the scientists involved are keen to emphasize that AI still necessitates careful human oversight. They acknowledge that these systems can, at times, produce misleading or inaccurate results, underscoring the enduring importance of human expertise in the research process. Nevertheless, by rapidly sifting through massive health datasets, generative AI holds the promise of allowing researchers to dedicate less time to the often-tedious task of troubleshooting code and more time to interpreting critical findings and formulating impactful scientific questions.
Dr. Tarca highlighted the democratizing effect of this technology, stating, "Thanks to generative AI, researchers with a limited background in data science won’t always need to form wide collaborations or spend hours debugging code. They can focus on answering the right biomedical questions." This sentiment points towards a future where cutting-edge medical research is more accessible to a broader range of scientists.
The study’s authorship includes Reuben Sarwal, Claire Dubin, Sanchita Bhattacharya, MS, and Atul Butte, MD, PhD, from UCSF. Other contributing authors are Victor Tarca (Huron High School, Ann Arbor, MI), Nikolas Kalavros and Gustavo Stolovitzky, PhD (New York University), Gaurav Bhatti (Wayne State University), and Roberto Romero, MD, D(Med)Sc (National Institute of Child Health and Human Development (NICHD)).
This groundbreaking work was generously funded by the March of Dimes Prematurity Research Center at UCSF and by ImmPort. The data utilized in this study was partially generated with support from the Pregnancy Research Branch of the NICHD, further illustrating the collaborative and well-supported nature of this critical research. The findings represent a significant leap forward in harnessing the power of AI to accelerate medical discovery and address some of the most pressing health challenges facing humanity.

