The company has outlined a clear roadmap, with a near-term goal of developing an "autonomous AI research intern" by September. This intern will be a precursor to a comprehensive multi-agent research system slated for debut in 2028. This fully realized AI researcher, as envisioned by OpenAI, will possess the capability to address challenges that currently elude human capacity due to their sheer scale or complexity.

The potential applications for such a system are broad and impactful, spanning disciplines like mathematics and physics, where it could generate novel proofs and conjectures, to life sciences such as biology and chemistry, and even complex business and policy quandaries. In essence, any problem amenable to textual, coded, or even rudimentary visual representation could theoretically be posed to this advanced AI tool.

OpenAI has long been a driving force in the AI landscape, with its pioneering work in large language models (LLMs) profoundly shaping the technology adopted by billions globally. However, the company now faces escalating competition from formidable rivals like Anthropic and Google DeepMind, underscoring the critical importance of its next strategic moves for both its own trajectory and the broader evolution of artificial intelligence.

A pivotal figure in charting OpenAI’s long-term research agenda is Chief Scientist Jakub Pachocki. Alongside Chief Research Officer Mark Chen, Pachocki plays a crucial role in defining the company’s future scientific pursuits. His contributions have been instrumental in the development of GPT-4, a landmark LLM released in 2023, and the subsequent emergence of reasoning models, a technology that now forms the bedrock of contemporary chatbots and agent-based systems.

In an exclusive interview, Pachocki articulated OpenAI’s latest vision, stating, "I think we are getting close to a point where we’ll have models capable of working indefinitely in a coherent way just like people do." He elaborated, "Of course, you still want people in charge and setting the goals. But I think we will get to a point where you kind of have a whole research lab in a data center."

This ambition to solve humanity’s most pressing problems is a shared objective among leading AI firms. Demis Hassabis of DeepMind cited it as the genesis of his company, while Anthropic CEO Dario Amodei envisions creating a "country of geniuses in a data center." Sam Altman, OpenAI’s CEO, has publicly expressed aspirations to cure cancer. Pachocki, however, believes OpenAI is now converging on having the necessary components to achieve these grand objectives.

OpenAI’s recent release of Codex, an agent-based application capable of generating code on demand for various tasks, serves as an early testament to this vision. Codex can analyze documents, create charts, curate daily digests of communications, and perform a multitude of other functions. Similar tools have been introduced by competitors, such as Anthropic’s Claude Code and Claude Cowork.

According to OpenAI, a significant portion of its technical staff now integrates Codex into their daily workflows. Pachocki views Codex as a nascent iteration of the AI researcher, stating, "I expect Codex to get fundamentally better."

The crucial factor for advancing towards an autonomous researcher lies in developing systems that can operate for extended durations with minimal human intervention. Pachocki explained, "What we’re really looking at for an automated research intern is a system that you can delegate tasks [to] that would take a person a few days."

Doug Downey, a research scientist at the Allen Institute for AI, not affiliated with OpenAI, acknowledges the widespread enthusiasm for systems capable of undertaking long-term scientific research. He attributes this surge in interest to the success of coding agents like Codex. "The fact that you can delegate quite substantial coding tasks to tools like Codex is incredibly useful and incredibly impressive. And it raises the question: Can we do similar things outside coding, in broader areas of science?"

Pachocki emphatically answers yes, positing that continued advancement in general capabilities will naturally lead to models with extended operational autonomy. He draws a parallel to the significant leap in duration and complexity between OpenAI’s GPT-3 (2020) and GPT-4 (2023), noting that GPT-4 could engage with problems for considerably longer periods without specialized training.

The incorporation of reasoning models has further enhanced this capability. By training LLMs to process problems sequentially, identify errors, and backtrack, their ability to sustain prolonged task execution has improved. Pachocki is confident that OpenAI’s reasoning models will continue to evolve.

Furthermore, OpenAI is actively training its systems for extended autonomous operation by exposing them to specific, complex tasks. This includes challenging puzzles from mathematics and coding competitions, which necessitate models learning to manage large volumes of text and decompose problems into manageable subtasks.

The ultimate objective is not merely to create AI that excels in specific domains like mathematics competitions. "That lets you prove that the technology works before you connect it to the real world," Pachocki explained. "If we really wanted to, we could build an amazing automated mathematician. We have all the tools, and I think it would be relatively easy. But it’s not something we’re going to prioritize now because, you know, at the point where you believe you can do it, there’s much more urgent things to do." He emphasized, "We are much more focused now on research that’s relevant in the real world."

Currently, this focus involves extending Codex’s coding prowess to broader problem-solving applications. "There’s a big change happening, especially in programming," Pachocki observed. "Our jobs are now totally different than they were even a year ago. Nobody really edits code all the time anymore. Instead, you manage a group of Codex agents." The underlying premise is that if Codex can conquer coding challenges, it can, in principle, address any problem.

Recent months have indeed seen remarkable achievements by OpenAI. Researchers have leveraged GPT-5, the LLM powering Codex, to devise novel solutions for several persistent mathematical problems and to overcome significant hurdles in various biology, chemistry, and physics challenges.

"Just looking at these models coming up with ideas that would take most PhD weeks, at least, makes me expect that we’ll see much more acceleration coming from this technology in the near future," Pachocki stated.

However, Pachocki acknowledges that the path forward is not without its uncertainties and understands the skepticism some may harbor regarding the transformative potential of this technology. He believes its utility is contingent upon individual work styles and requirements. "I can believe some people don’t find it very useful yet," he conceded.

He shared a personal anecdote, admitting he did not use autocomplete—the most rudimentary form of generative coding technology—until a year ago, preferring manual coding in Vim, a text editor favored by many programmers for its keyboard-centric interface. This stance shifted upon witnessing the capabilities of current models. While he still wouldn’t delegate complex design tasks, he finds it an invaluable time-saver for rapid idea exploration. "I can have it run experiments in a weekend that previously would have taken me like a week to code," he remarked.

He further clarified, "I don’t think it is at the level where I would just let it take the reins and design the whole thing. But once you see it do something that would take a week to do—I mean, that’s hard to argue with."

Pachocki’s strategy centers on augmenting the existing problem-solving capabilities of tools like Codex and applying them across scientific disciplines.

Downey concurs that the concept of an automated researcher is compelling. "It would be exciting if we could come back tomorrow morning and the agent’s done a bunch of work and there’s new results we can examine," he commented.

However, he cautions that the development of such a system might present greater challenges than Pachocki suggests. Downey and his colleagues recently evaluated several leading LLMs on scientific tasks, with OpenAI’s GPT-5 emerging as the top performer, albeit still exhibiting numerous errors. "If you have to chain tasks together, then the odds that you get several of them right in succession tend to go down," he noted. Downey also acknowledged the rapid pace of advancement, admitting his findings might already be outdated, as OpenAI released GPT-5.4 only two weeks prior.

Significant unanswered questions loom regarding the potential risks associated with systems capable of independently solving complex problems with minimal human oversight. Pachocki affirmed that these concerns are a constant topic of discussion within OpenAI.

"If you believe that AI is about to substantially accelerate research, including AI research, that’s a big change in the world. That’s a big thing," he stated. "And it comes with some serious unanswered questions. If it’s so smart and capable, if it can run an entire research program, what if it does something bad?"

Pachocki outlined several scenarios through which such negative outcomes could manifest: the system could deviate from its intended course, be compromised by malicious actors, or simply misinterpret its instructions.

OpenAI’s primary method for addressing these risks involves training its reasoning models to provide detailed accounts of their operations. This approach, known as chain-of-thought monitoring, involves LLMs documenting their decision-making processes as they execute tasks. Researchers can then scrutinize these records to ensure the model is behaving as expected. OpenAI recently published new details on its internal use of chain-of-thought monitoring to study Codex.

"Once we get to systems working mostly autonomously for a long time in a big data center, I think this will be something that we’re really going to depend on," Pachocki said. The intention is to utilize other LLMs to monitor the "scratch pads" of AI researchers, identifying and flagging undesirable behavior before it escalates into a problem, rather than attempting to prevent it from occurring altogether, given the current incomplete understanding of LLMs.

"I think it’s going to be a long time before we can really be like, okay, this problem is solved," he added. "Until you can really trust the systems, you definitely want to have restrictions in place." Pachocki advocates for deploying highly potent models within controlled environments, or "sandboxes," isolated from any systems they could potentially damage or exploit.

Concerns about the misuse of AI are not hypothetical. AI tools have already been employed to devise sophisticated cyberattacks, and there are worries about their potential use in designing synthetic pathogens for bioweapons. Pachocki acknowledged, "I definitely think there are worrying scenarios that we can imagine."

"It’s going to be a very weird thing. It’s extremely concentrated power that’s in some ways unprecedented," Pachocki observed. "Imagine you get to a world where you have a data center that can do all the work that OpenAI or Google can do. Things that in the past required large human organizations would now be done by a couple of people." He concluded, "I think this is a big challenge for governments to figure out."

However, some argue that governmental actions complicate matters. The US government’s interest in employing AI on the battlefield, for instance, highlights a societal divergence on the ethical boundaries of AI usage. The recent conflict between Anthropic and the Pentagon revealed a lack of consensus on these red lines and who should establish them. In the aftermath, OpenAI secured a deal with the Pentagon, a development that mirrored some of Anthropic’s fears. The situation remains ambiguous.

When pressed on his trust in external entities to navigate these challenges, Pachocki asserted, "I do feel personal responsibility. But I don’t think this can be resolved by OpenAI alone, pushing its technology in a particular way or designing its products in a particular way. We’ll definitely need a lot of involvement from policymakers."

This leaves a fundamental question: are we truly on a trajectory toward the AI envisioned by Pachocki? Downey, when asked, expressed his inability to accurately predict the timeline for such capabilities, stating, "I’ve been in this field for a couple of decades and I no longer trust my predictions for how near or far certain capabilities are."

OpenAI’s stated mission is to ensure that artificial general intelligence (AGI)—a hypothetical future technology capable of matching human cognitive abilities across most tasks—benefits all of humanity. Their strategy involves being the first to achieve AGI. However, Pachocki’s only mention of AGI was quickly qualified by his reference to "economically transformative technology."

He differentiated LLMs from human brains, stating, "They are superficially similar to people in some ways because they’re kind of mostly trained on people talking. But they’re not formed by evolution to be really efficient."

"Even by 2028, I don’t expect that we’ll get systems as smart as people in all ways. I don’t think that will happen," he added. "But I don’t think it’s absolutely necessary. The interesting thing is you don’t need to be as smart as people in all their ways in order to be very transformative."