Pokémon Go, the groundbreaking augmented-reality game that captivated the globe upon its 2016 release, has evolved beyond its initial purpose of virtual creature collection. Developed by Niantic, a spinout from Google, the game ingeniously superimposed fantastical creatures onto the real world, compelling millions of players to explore their surroundings in pursuit of Pokémon like Jigglypuff and Squirtle, or the elusive Galarian Zapdos. This widespread engagement meant countless individuals were directing their smartphones at a vast array of buildings and landmarks worldwide. Brian McClendon, CTO at Niantic Spatial, a company spun out by Niantic in May of the previous year, highlights the sheer scale of adoption: "Five hundred million people installed that app in 60 days." Even eight years after its debut, in 2024, the game continues to boast over 100 million active players, according to Scopely, the video game firm that acquired Pokémon Go from Niantic.

This immense and unparalleled repository of crowdsourced data – comprising images of urban landmarks meticulously tagged with highly accurate location markers, all captured by the phones of hundreds of millions of Pokémon Go players globally – is now being leveraged by Niantic Spatial. The company is utilizing this rich dataset to construct what can be described as a "world model," a sophisticated new technology designed to ground the intelligence of Large Language Models (LLMs) within tangible, real-world environments. Niantic Spatial’s latest innovation is a model capable of pinpointing a user’s location on a map with astonishing accuracy, down to a few centimeters, based on just a handful of visual snapshots of surrounding buildings or landmarks. The firm’s ambition is to deploy this technology to enhance the navigation precision of robots in environments where traditional GPS signals are notoriously unreliable.

In a significant first application of its technology, Niantic Spatial has forged a strategic partnership with Coco Robotics, a burgeoning startup that operates last-mile delivery robots across numerous cities in the United States and Europe. "Everybody thought that AR was the future, that AR glasses were coming," remarks McClendon. "And then robots became the audience." This shift signifies a new frontier where the immersive experiences pioneered by AR games are finding practical, real-world utility in the burgeoning field of robotics.

From Pikachu to Pizza Delivery: A New Chapter in Logistics

Coco Robotics currently deploys approximately 1,000 robots, each roughly the size of a flight case, designed to carry substantial loads – up to eight extra-large pizzas or four grocery bags. These autonomous couriers are actively serving customers in major urban centers including Los Angeles, Chicago, Jersey City, Miami, and Helsinki. Zach Rash, CEO of Coco Robotics, reports that their fleet has successfully completed over half a million deliveries to date, collectively covering millions of miles and operating in a diverse range of weather conditions.

However, to effectively compete with human delivery personnel, Coco’s robots, which navigate sidewalks at a steady pace of around five miles per hour, must achieve an exceptional level of reliability. "The best way we can do our job is by arriving exactly when we told you we were going to arrive," emphasizes Rash. This crucial aspect of timely delivery hinges on the robots’ ability to navigate with unwavering accuracy, ensuring they do not get lost.

The primary challenge Coco faces stems from the inherent limitations of GPS, particularly in dense urban landscapes. In these environments, radio signals often reflect off buildings and interfere with each other, significantly degrading GPS accuracy. "We do deliveries in a lot of dense areas with high-rises and underpasses and freeways, and those are the areas where GPS just never really works," explains Rash. McClendon further elaborates on this issue, describing the "urban canyon" as the most problematic terrain for GPS: "If you look at that blue dot on your phone, you’ll often see it drift 50 meters, which puts you on a different block going a different direction on the wrong side of the street." It is precisely in these challenging scenarios that Niantic Spatial’s expertise becomes indispensable.

For the past several years, Niantic Spatial has been meticulously processing the vast quantities of data collected from players of Pokémon Go and Ingress, Niantic’s earlier AR mobile game launched in 2013. This extensive data has been instrumental in developing a sophisticated visual positioning system – a technology that determines a device’s location based on its visual surroundings. John Hanke, CEO of Niantic Spatial, draws a compelling parallel: "It turns out that getting Pikachu to realistically run around and getting Coco’s robot to safely and accurately move through the world is actually the same problem." This insight underscores the transferable nature of advanced spatial understanding, bridging the gap between gaming and practical robotics.

"Visual positioning is not a very new technology," acknowledges Konrad Wenzel from ESRI, a leading company in digital mapping and geospatial analysis software. "But it’s obvious that the more cameras we have out there, the better it becomes." Niantic Spatial has trained its advanced model on an astonishing 30 billion images captured from urban environments. A significant portion of these images are concentrated around "hot spots" – locations that were integral to Niantic’s games, such as Pokémon battle arenas, and were therefore frequently visited by players. "We had a million-plus locations around the world where we can locate you precisely," states McClendon. "We know where you’re standing within several centimeters of accuracy and, most importantly, where you’re looking."

The result of this extensive data collection is that for each of these million key locations, Niantic Spatial possesses thousands of images captured from slightly different angles, at various times of day, and under diverse weather conditions. Crucially, each image is accompanied by detailed metadata that precisely records the phone’s position and orientation at the moment of capture, including its movement, speed, and direction. This meticulously curated dataset has enabled Niantic Spatial to train a model capable of predicting its exact location by analyzing its visual input, even in areas beyond the initial million hot spots where high-quality image and location data might be less abundant.

In addition to their existing GPS capabilities, Coco Robotics’ robots, equipped with four cameras, will now integrate Niantic Spatial’s model to enhance their location awareness and directional accuracy. While the robots’ cameras are positioned at hip height and capture a 360-degree view, a perspective distinct from that of a Pokémon Go player, Rash confirms that adapting the data for this new application was a "straightforward" process. Other companies are also employing visual positioning systems in the robotics sector. Starship Technologies, an Estonian-founded robot delivery firm established in 2014, utilizes its robots’ sensors to construct 3D maps of their surroundings, identifying features like building edges and streetlights.

However, Rash is confident that Niantic Spatial’s technology will provide Coco with a distinct competitive advantage. He anticipates that this will enable his robots to precisely position themselves at designated pickup locations outside restaurants, thereby avoiding obstruction and ensuring more accurate doorstep deliveries, a marked improvement over previous instances where robots might have stopped a few steps away from the customer’s door.

A Cambrian Explosion in Robotics: The Future of Autonomous Navigation

When Niantic Spatial initially embarked on developing its visual positioning system, the primary objective was its application in augmented reality. "If you are wearing AR glasses and you want the world to lock in to where you’re looking, then you need some method for doing that," explains Hanke. "But now we’re seeing a Cambrian explosion in robotics." This observation highlights the rapid and diverse evolution of robotic applications across various industries.

A significant portion of these emerging robots are designed to coexist with humans in shared spaces, such as construction sites and sidewalks. "If robots are ever going to assimilate into that environment in a way that’s not disruptive for human beings, they’re going to have to have a similar level of spatial understanding," Hanke asserts. "We can help robots find exactly where they are when they’ve been jostled and bumped." This points to the critical need for robots to possess a robust sense of their position and orientation, especially in dynamic and unpredictable environments.

The partnership with Coco Robotics represents just the initial phase of Niantic Spatial’s broader vision. Hanke describes their ongoing work as laying the foundation for what he terms a "living map" – a hyper-detailed, dynamic virtual simulation of the world that continuously updates in response to real-world changes. As robots from Coco and other companies navigate their environments, they will generate new streams of map data, contributing to increasingly sophisticated and accurate digital replicas of the planet.

Hanke and McClendon perceive a fundamental shift in the purpose of maps. Historically, maps have served as tools for human navigation and orientation. While the transition from 2D to 3D and now to 4D representations (encompassing real-time simulations and digital twins) has increased their complexity, the core principle remains: points on a map correspond to specific locations or moments in space and time. However, the advent of machine-driven navigation necessitates a redefinition of what maps are for.

Maps designed for machines may need to evolve into something akin to comprehensive guidebooks, packed with information that humans intuitively understand but machines require explicit instruction for. Companies like Niantic Spatial and ESRI are actively working to incorporate descriptive elements that inform machines about the nature of what they are observing, with each object tagged and annotated with its properties. "This era is about building useful descriptions of the world for machines to comprehend," states Hanke. "The data that we have is a great starting point in terms of building up an understanding of how the connective tissue of the world works."

The concept of "world models" is generating significant excitement across the AI landscape, and Niantic Spatial is keenly aware of this burgeoning interest. While LLMs may appear omniscient, they often lack the common-sense understanding necessary to interpret and interact with everyday environments effectively. World models aim to bridge this gap. Some organizations, including Google DeepMind and World Labs, are developing models that generate virtual, simulated worlds on demand, serving as training grounds for AI agents.

Niantic Spatial, however, approaches this challenge from a distinct perspective. McClendon articulates their focus: "I’m very focused on trying to re-create the real world. We’re not there yet, but we want to be there." Their approach emphasizes the meticulous reconstruction of the physical world, utilizing the rich tapestry of real-world data to build a comprehensive and accurate digital counterpart, ultimately enabling robots like Coco’s to navigate and operate with unprecedented precision and reliability.