This groundbreaking study, published in Cell Reports Medicine on February 17th, pitted generative AI against seasoned human researchers in a critical challenge: predicting preterm birth. The findings offer a tantalizing glimpse into a future where the pace of medical discovery could be dramatically accelerated, bringing much-needed relief to patients facing urgent health concerns.
The research team, led by Marina Sirota, PhD, interim director of the Bakar Computational Health Sciences Institute (BCHSI) at UCSF and principal investigator of the March of Dimes Prematurity Research Center at UCSF, and Adi L. Tarca, PhD, professor in the Center for Molecular Medicine and Genetics at Wayne State University, designed a direct comparison to evaluate the performance of both human and AI-driven approaches. Researchers assigned identical, complex tasks to different groups. Some teams relied entirely on human expertise, meticulously crafting analytical code and interpreting results. Others leveraged the power of AI tools, where scientists provided specific prompts to guide the AI in generating code and performing analyses.
The specific challenge involved predicting preterm birth using data from more than 1,000 pregnant women. This is a particularly pressing area of research, as preterm birth is the leading cause of newborn death and a significant contributor to long-term motor and cognitive challenges in children. In the United States alone, approximately 1,000 babies are born prematurely each day, highlighting the urgent need for faster and more effective diagnostic tools and preventative strategies.
One of the most striking outcomes of the study was the success of a junior research pair comprised of Reuben Sarwal, a UCSF master’s student, and Victor Tarca, a high school student. With the assistance of AI tools, they were able to develop functioning prediction models with remarkable speed. The AI system generated the necessary computer code in a matter of minutes, a process that would typically consume several hours, if not days, for experienced human programmers. This dramatic acceleration in code generation is a direct result of AI’s ability to interpret short, highly specific natural language prompts and translate them into functional analytical code.
While not every AI system proved successful – only 4 out of 8 tested AI chatbots produced usable code – those that did perform well did not necessitate large teams of specialists to guide their development. This suggests a potential democratization of data science, where individuals with less extensive coding backgrounds can harness powerful analytical capabilities.
The speed advantage afforded by generative AI allowed the junior research team to not only complete their experiments and verify their findings but also to submit their results to a journal within a mere few months. This stands in stark contrast to the traditional timelines often associated with complex data analysis in health research.
Dr. Sirota emphasized the profound impact of this acceleration: "These AI tools could relieve one of the biggest bottlenecks in data science: building our analysis pipelines," she stated. "The speed-up couldn’t come sooner for patients who need help now." This sentiment underscores the direct benefit to patients, where faster research can translate into quicker development of life-saving interventions and improved patient care.
The study delved into the complexities of preterm birth research, an area where the precise causes remain elusive. To investigate potential risk factors, Dr. Sirota’s team had previously compiled extensive microbiome data from approximately 1,200 pregnant women whose pregnancy outcomes were meticulously tracked across nine separate studies. This endeavor relied heavily on open data sharing and the pooling of expertise from a multitude of researchers, a testament to the collaborative nature of scientific advancement.
"This kind of work is only possible with open data sharing, pooling the experiences of many women and the expertise of many researchers," commented Tomiko T. Oskotsky MD, co-director of the March of Dimes Preterm Birth Data Repository and associate professor in UCSF BCHSI, and a co-author of the paper. However, the sheer scale and complexity of such datasets presented a significant analytical hurdle. To tackle this, researchers had previously engaged in a global crowdsourcing competition known as DREAM (Dialogue on Reverse Engineering Assessment and Methods).
Dr. Sirota co-led one of three DREAM pregnancy challenges, specifically focusing on vaginal microbiome data. This competition saw the participation of over 100 teams worldwide, all tasked with developing machine learning models to identify patterns associated with preterm birth. While most teams completed their analytical work within the three-month competition window, the subsequent consolidation of findings and publication process took nearly two years.
Driven by the question of whether generative AI could significantly shorten such timelines, Dr. Sirota’s group collaborated with Dr. Tarca’s team at Wayne State University. Dr. Tarca had previously led two other DREAM challenges focused on improving methods for estimating pregnancy stage, a critical factor in determining appropriate prenatal care.
Together, the researchers provided eight different AI systems with the same datasets from the three DREAM challenges. The AI systems were instructed to independently generate algorithms without direct human coding, relying instead on carefully crafted natural language prompts. These prompts were designed to guide the AI, much like the instructions given to conversational AI models such as ChatGPT, steering them towards analyzing the health data in ways comparable to the original human participants in the DREAM challenges.
The objectives for the AI systems mirrored those of the earlier human competitions. They were tasked with analyzing vaginal microbiome data to detect early signs of preterm birth and examining blood or placental samples to accurately estimate gestational age. Accurate pregnancy dating is crucial, as it directly influences the type of care a woman receives throughout her pregnancy. Inaccurate estimates can complicate labor preparation and potentially impact maternal and infant outcomes.
Following the AI’s generation of code, researchers ran these algorithms using the original DREAM datasets. The results were compelling: 4 out of the 8 AI tools successfully produced models that matched, and in some instances, even surpassed the performance of the human teams. Crucially, the entire generative AI effort, from the initial concept and prompt design to the final submission of a research paper, was completed in an astonishingly short six months.
The researchers are careful to emphasize that while generative AI offers immense potential, it still requires careful human oversight. These systems, despite their advanced capabilities, can produce misleading or erroneous results, underscoring the continued indispensability of human expertise in interpreting findings and ensuring scientific rigor. However, by dramatically accelerating the process of sifting through massive health datasets, generative AI can liberate researchers from the time-consuming tasks of troubleshooting code and building analytical pipelines. This allows them to dedicate more valuable time to interpreting results, formulating new hypotheses, and pursuing deeper, more meaningful scientific questions.
"Thanks to generative AI, researchers with a limited background in data science won’t always need to form wide collaborations or spend hours debugging code," Dr. Tarca concluded. "They can focus on answering the right biomedical questions." This statement encapsulates the transformative potential of generative AI in democratizing access to advanced data analysis and empowering a broader range of researchers to contribute to critical health advancements.
The study’s authorship includes UCSF contributors Reuben Sarwal, Claire Dubin, Sanchita Bhattacharya, MS, and Atul Butte, MD, PhD. Other authors are Victor Tarca (Huron High School, Ann Arbor, MI); Nikolas Kalavros and Gustavo Stolovitzky, PhD (New York University); Gaurav Bhatti (Wayne State University); and Roberto Romero, MD, D(Med)Sc (National Institute of Child Health and Human Development (NICHD)).
The research was generously funded by the March of Dimes Prematurity Research Center at UCSF and by ImmPort. The data utilized in this pivotal study was generated in part with support from the Pregnancy Research Branch of the NICHD, highlighting the collaborative ecosystem that underpins such significant scientific progress.

