How LLM is Changing Robotics?
How "Robotic Foundation Models" are merging LLM-style common sense
Let’s be very clear. For the last 30 years, most robots have been quite dumb.
You have the factory robot, which does one single task, again and again, like a machine. Then you have your home ‘robots’, like a vacuum cleaner, which just bump around and get stuck.
These are not ‘intelligent’. They are just pre-programmed.
But now, a major shift is happening. Researchers, like those at Physical Intelligence, are no longer trying to build a robot that only serves tea.
They are trying to build a general-purpose system that can learn to do anything.
They are calling this the ‘Robotic Foundation Model’, it’s a fundamental part of solving AI itself.
Fully Autonomous Robot
What is the end goal? It is not to give a robot a simple command like, “Please pick up that bottle.”
The real vision is to give the robot a responsibility.
Imagine telling your robot: “Listen, you are in charge of the kitchen. Make sure breakfast is ready by 8 AM and the kitchen is clean before I leave for work.”
And the robot just... does it. For months. It learns your preferences. It knows what to do when you run out of milk. It handles problems. That is the level of autonomy we are talking about. This requires the robot to have common sense, to learn, and to fix its own mistakes. It’s a proper agent.
“Flywheel” Strategy for Robots
So, how will we get there? The main strategy is called the “flywheel”.
The idea is to get these robots into the real world as soon as possible—maybe in just one or two years. The full ‘house-running’ robot? The median estimate is around five years.
The “flywheel” means the robot learns by doing. It will make mistakes, and it will learn from them. This is the biggest advantage robotics has over a field like self-driving cars.
Think about it. If a self-driving car makes one small mistake, it’s a major, catastrophic accident. The learning is very costly and very dangerous.
But what happens if a home robot makes a mistake?
Say it’s trying to make a Pizza for the first time. It messes up the dough.
Or it’s trying to sort your groceries and puts the ice cream in the regular fridge.
Is it a disaster? No. It’s a recoverable error. The robot can see the result, understand “this is not right,” and do it better the next time. This ability to make small, safe mistakes is a super-fast way to learn.
The need of Common Sense Brain
The biggest change today is that we don’t have to teach the robot everything from zero. We can use the ‘brain’ from existing AI models, like the ones that power ChatGPT or Google’s AI.
These VLMs (Vision-Language Models) act as the ‘common sense’ part. They are like the Head Chef in a kitchen. They already have all the world knowledge:
They know what a banana is.
They know that milk is kept in the fridge.
They know the logical steps for cleaning.
This VLM part does the high-level planning. It can see the kitchen and think, To clean up, I must first pick up the plates and put them in the sink.
And the Action Expert
But having a Head Chef (the brain) is not enough. You need the “Junior Chef” who can actually do the work—chop the vegetables, move the pans.
This is the action expert, and it is the hardest part. This is the robot’s ‘motor cortex’. Its job is to turn a plan (like “pick up the plate”) into a smooth, continuous stream of physical motions.
This is not like an LLM, which predicts the next word (a discrete token). Physical motion is continuous. To do this, it uses a technique called Diffusion.
So, what is a Diffusion Model?
This is the same amazing tech used to create those AI images (like in DALL-E or Midjourney). Here’s a simple analogy:
Noise: Imagine you have a perfect, clear photograph. You add a little bit of ‘noise’ or ‘static’ to it. Then a little more. You keep adding noise until the photo is just a complete jumble of static.
De-Noise: A diffusion model is trained to do the exact opposite. It learns to start with a screen full of pure static and, step-by-step, ‘de-noise’ it until a perfect, clear image appears.
How it works: It’s like a sculptor starting with a random block of marble and slowly carving away the ‘noise’ to reveal the statue hidden inside.
The robot’s “action expert” does the same thing, but for motion. It starts with a ‘noisy’ or random set of possible movements and, in a fraction of a second, ‘de-noises’ them into one single, smooth, and precise action—like the perfect, gentle arc to pick up an egg without breaking it.
The Big Paradox: Why Dexterity is Harder Than Strategy
This new architecture highlights a famous old idea in AI: Moravec’s Paradox.
The paradox is simple: For AI, the things humans find hard (like playing chess, doing complex maths, strategic planning) are actually easy. And the things humans find easy (like walking, picking up a pen, seeing a face) are extremely hard.
LLMs have basically solved the ‘hard’ part (planning). Now, all the focus is on the ‘easy’ part: physical dexterity.
Surprisingly, researchers found that a robot can do a complex one-minute task (like neatly arranging books on a shelf) with only one second of context. It’s like a trained cricketer. When he’s about to hit a six, he’s not thinking about his long-term career. He’s in the moment, relying on pure, ‘baked-in’ skill. This is dexterity.
So, the strategy is: Master dexterity first. Once the robot can physically do things reliably, adding the long-term memory and planning (the LLM brain) is the simple part.
if you are Preparing for Google Product Manager Interview, Read This
Resources
About Author
Shailesh Sharma! I help PMs and business leaders excel in Product, Strategy, and AI using First Principles Thinking. For more, check out my Live cohort course, PM Interview Mastery Course, Cracking Strategy, and other Resources
References
https://www.mckinsey.com/capabilities/tech-and-ai/our-insights/the-top-trends-in-tech
https://www.bcg.com/publications/2017/strategy-technology-digital-gaining-robotics-advantage
https://www.lunartech.ai/blog/ai-for-advanced-robotics-and-automation


Spot on. Your point about giving robots responsability instead of simple commands really nails the fundamental shift in how we're approaching intelligent systems. It makes me wonder about the immense data and robust generalisation needed for a 'kitchen manager' robot to truly learn complex human preferences and handle unexpected real-world situations for months without a hitch, beyond controlled environments.