Humanoid Foundation Models
The brains are being rebuilt – from VLAs to predictive world models.
Humanoid world models are the new brains of embodied AI. The systems that turn perception into action are shifting from vision-language-action models to predictive world models that let a robot imagine the consequences of an action before it takes it.
This report maps that shift end to end, independently scores the 40 foundation models that matter, and asks – soberly – when the technology will actually be reliable.


Why the intelligence layer – and why now
Hardware is consolidating; the open contest now is the “brain.” The shift from VLAs to humanoid world models is happening in months, not years, and the architecture that wins will shape the whole stack. This report tells you what is changing, who is ahead, and what it means – for six audiences.
Investors & corporate strategy
Where the USD 13.8B funding wave is flowing, which architectures are winning, and how to read valuations priced years ahead of revenue.
Robotics & AI engineers
The four architectural paradigms, the System 1 / System 2 convergence, and why the field is moving from VLAs to predictive world models.
Foundation-model builders
The data moat, scaling laws for embodied data, simulation and sim-to-real – and where the open challengers are closing the gap.
Humanoid OEMs & suppliers
Which “brain” to license or build, the split-stack reality of Western models on Chinese bodies, and the compute beneath it all.
Policy & compute teams
Export controls, state-backed scale, and the geopolitics that decide who trains and who only assembles.
Anyone betting on embodied AI
An independent, sober read on capability, safety and timing – not a hype reel and not a teardown.
The central finding
Humanoid world models are the most exciting idea in robot learning – and the least finished. The advances are real: predictive, world-action architectures more than double generalization over VLA baselines, video-trained policies now top manipulation benchmarks, and embodied data has begun to show clean scaling laws.
But demos are not deployment. Reliable, unattended generalist autonomy in unstructured settings is, on our base case, a 2028–2030 proposition – gated less by ideas than by the last 10% of reliability, safety validation and the lab-to-field gap. The likely equilibrium is a split stack: Western brains on Chinese bodies. The question is not whether world models will reshape embodied AI – it is who controls the architecture, the data and the compute when they do.
Inside humanoid world models: the four paradigms
Under the banner of “world models” sit four distinct ways to generate an action. The report sorts them on two axes – discrete vs. continuous control, and whether the model predicts the future at all – and shows why the strongest systems are converging on a two-speed design.
Autoregressive
Emits discrete action tokens like a language model. Simple and general, but with no explicit model of how the world will change.
Diffusion / flow matching
Generates smooth, continuous action trajectories by denoising. The basis of the fast “System 1” layer, running above 200 Hz.
Latent world models (JEPA)
Predict how the environment evolves in a compact latent space, so a robot can plan against an imagined future rather than raw pixels.
World-action models (WAM)
Combine continuous control with world prediction – the strongest recipe so far, more than doubling generalization over VLA baselines.
A fair verdict needs both columns
This report sets the genuine advances against the unsolved failures, and dates the turn – so you can tell real capability from a good demo.
The advances
Predictive architectures, generative simulators and JEPA-style models that let robots imagine consequences before acting.
The failures
Hallucination, drift, the sim-to-real gap and new alignment risks – the reasons deployment is still hard.
The timeline
Benchmarks, unit-cost crossovers and a base case for when a reliable generalist actually arrives.
What’s inside the humanoid world models report
Twenty-four chapters and 16 figures – a complete reference on the “brains” of humanoid robots: the architectures, the players, the compute and geopolitics behind them, and a sober read on capability, safety and timing.
Foundations
- 01 – Reshaping Physical AI
- 02 – From VLAs to World Models
- 03 – Inside a World Model
- 04 – The Four Architectural Paradigms
- 05 – JEPA & the Predictive Architectures
- 06 – Generative World Simulators
- 07 – Scaling Laws & the Data Question
The Models & Makers
- 08 – Foundation Models I: NVIDIA’s Platform
- 09 – Foundation Models II: The Challengers
- 10 – The Humanoid Makers
- 11 – Training Data & the Data Moat
- 12 – Simulation & Sim-to-Real
Compute & Geopolitics
- 13 – Compute & Infrastructure
- 14 – USA: Innovation vs. Volume
- 15 – China: State-Backed Scale
- 16 – Europe, Japan & Korea
- 17 – Geopolitics
Applications & Evaluation
- 18 – Industrial Applications
- 19 – Domestic & Service Robots
- 20 – Benchmarks & Evaluation
- 21 – Strengths, Shortcomings & Safety
Outlook
- 22 – Business & Investment
- 23 – Scenarios
- 24 – Playbook & the 2026–2030 Window
The Brain Score directory
All 40 foundation models tracked by humanoid.guide – each independently rated 0–10 across ten capability dimensions, so you can compare the whole field at a glance. Yellow means the model has the skill.
16 figures, sharp in the report
Each chapter pairs deep narrative with the visual frameworks that strategy and engineering teams use to communicate findings internally. The previews below are intentionally blurred – the full, readable versions, with the data behind them, come with the report.






Heavy on illustration, light on filler
Every chapter ties an idea to what it means for builders, buyers and investors – with original editorial illustration throughout.
Get the report
Single-user license for individual analysts and engineers. Enterprise license for strategy, research and product teams.
Single User License
- Full 100-page report (PDF)
- All 24 chapters & 16 figures
- The 40-model Brain Score directory
- Single-user license
- Free updates through the 2026 cycle
Enterprise License
- Everything in Single User
- Unlimited internal users at one organization
- Right to quote in internal strategy documents
- Priority email support
- Optional 60-min briefing call with the authors
Frequently asked questions
- What exactly is a humanoid world model?
- A humanoid world model is a foundation model that learns how an environment changes, so a robot can predict – and imagine – the consequences of an action before it acts. The report explains how humanoid world models differ from vision-language-action (VLA) models, and why the field is shifting toward them.
- Who is the report for?
- Investors and corporate strategy teams, robotics and AI engineers, foundation-model builders, and humanoid OEMs and suppliers evaluating the “brain” layer of the stack.
- How is this different from the Market Report and the Supply Chain report?
- The Market Report and the Supply Chain report cover the full market and the hardware. This report goes deep on the intelligence layer – humanoid world models, VLAs, the architectures and players, and an independent 40-model directory.
- What is the Brain Score?
- humanoid.guide’s editorial 0–10 rating of each model across ten capabilities – locomotion, whole-body control, manipulation, navigation, reasoning, sim-to-real, cross-embodiment, real-time inference and long-horizon planning.
- What is the methodology?
- The report synthesizes published model results, benchmarks and primary industry sources, with every claim cited and a full source index in the appendix. Brain Scores are humanoid.guide’s own editorial evaluations.
- Will it be updated?
- Yes – buyers receive free updates throughout the 2026 cycle as the landscape shifts.
- Can I get a preview?
- Yes – get in touch for a sample or a briefing.
The brains are being rebuilt – right now.
Understand the architectures, the players and the timeline before the field consolidates in the 2026–2030 window.
Buy full report →