The latest AI models are truly human-like in their ability to produce text, audio, and video on demand. However, so far these algorithms have largely remained relegated to the digital world, rather than the physical, three-dimensional world in which we live. In fact, each of us tries to apply these models to the real world, even the most sophisticated struggle to do so adequately. – Consider, for example, how difficult it is to develop safe and reliable self-driving cars. While artificially intelligent, these models not only do not understand physics, but also often hallucinate, which causes them to make inexplicable mistakes. the world we inhabit. Developing AI beyond digital boundaries requires machine thinking, combining the digital intelligence of AI with the mechanical prowess of robotics. This is the so-called “physical intelligence”, a new form of intelligent machines that can understand the dynamic environment, cope with unpredictability, and make decisions in real time. Unlike the models used by standard AI, physical intelligence is rooted in physics; in understanding the basic principles of the real world, such as cause-and-effect. These features allow physical intelligence models to interact and adapt to different environments. In our research group at MIT, we are developing a physical intelligence model called liquid networks. In one experiment, for example, we trained two drones – one operated by a standard AI model and the other by a liquid network – to find objects in a forest during summer, using data taken by human pilots. While both drones performed well when tasked with doing what they had been trained to do, when asked to find objects in different situations—during winter or in an urban setting—only the liquid network drone was able to complete the task. This experiment shows that, unlike traditional AI systems that stop developing after the initial training phase, liquid networks continue to learn and adapt from experience, just as humans do. Physical intelligence can also interpret and implement complex commands that come from text or physical. image, bridging the gap between digital instructions and real-world execution. For example, in my lab, we have developed a physically intelligent system that, in less than a minute, can iteratively design and then 3D print a small robot based on directions like “a robot that can walk forward” or “a robot that can hold an object”. Other labs also made significant breakthroughs. For example, robotics startup Covariant, founded by UC-Berkeley researcher Pieter Abbeel, is developing chatbots—similar to ChatGTP—that can control robotic arms on demand. They have raised more than $222 million to develop and deploy sorting robots in warehouses around the world. A team at Carnegie Mellon University also recently demonstrated that a robot with just one camera and imprecise actuation can perform dynamic and complex parkour moves—including jumping over obstacles twice its height and across gaps twice its length—using a single neural network trained through reinforcement . study. If 2023 was the year of text-to-image and 2024 was the year of text-to-video, then 2025 will mark the era of physical intelligence, with a new generation of devices—not just robots, but also everything from power grids to homes an intelligent one—one who can interpret what he is told and perform tasks in the real world.
