Humanoid World Models – The 2026 Report & Directory
New · 2026

Humanoid Foundation Models

The brains are being rebuilt – from VLAs to predictive world models.

Just released · 2026

Humanoid world models are the new brains of embodied AI. The systems that turn perception into action are shifting from vision-language-action models to predictive world models that let a robot imagine the consequences of an action before it takes it.

This report maps that shift end to end, independently scores the 40 foundation models that matter, and asks – soberly – when the technology will actually be reliable.

100 pages · 24 chapters · 40 models scored across 10 capabilities · 16 figures · forecasts to 2030
Humanoid World Models report cover
Humanoid world models – a robot imagining its environment before acting
$13.8B
Record robotics venture funding in 2025 – humanoid investment up ~143× in four years
Report ch.22, industry funding data
World-action models more than double generalization over VLA baselines (DreamZero 62.2% vs 27.4%)
Report ch.8
~90% / 70%
China’s share of humanoid unit sales and of the component supply chain – “Western brains, Chinese bodies”
Report ch.15
2028–30
Base-case window for reliable, unattended generalist autonomy
humanoid.guide base case

Why the intelligence layer – and why now

Hardware is consolidating; the open contest now is the “brain.” The shift from VLAs to humanoid world models is happening in months, not years, and the architecture that wins will shape the whole stack. This report tells you what is changing, who is ahead, and what it means – for six audiences.

Investors & corporate strategy

Where the USD 13.8B funding wave is flowing, which architectures are winning, and how to read valuations priced years ahead of revenue.

Robotics & AI engineers

The four architectural paradigms, the System 1 / System 2 convergence, and why the field is moving from VLAs to predictive world models.

Foundation-model builders

The data moat, scaling laws for embodied data, simulation and sim-to-real – and where the open challengers are closing the gap.

Humanoid OEMs & suppliers

Which “brain” to license or build, the split-stack reality of Western models on Chinese bodies, and the compute beneath it all.

Policy & compute teams

Export controls, state-backed scale, and the geopolitics that decide who trains and who only assembles.

Anyone betting on embodied AI

An independent, sober read on capability, safety and timing – not a hype reel and not a teardown.

The central finding

Humanoid world models are the most exciting idea in robot learning – and the least finished. The advances are real: predictive, world-action architectures more than double generalization over VLA baselines, video-trained policies now top manipulation benchmarks, and embodied data has begun to show clean scaling laws.

But demos are not deployment. Reliable, unattended generalist autonomy in unstructured settings is, on our base case, a 2028–2030 proposition – gated less by ideas than by the last 10% of reliability, safety validation and the lab-to-field gap. The likely equilibrium is a split stack: Western brains on Chinese bodies. The question is not whether world models will reshape embodied AI – it is who controls the architecture, the data and the compute when they do.

Inside humanoid world models: the four paradigms

Under the banner of “world models” sit four distinct ways to generate an action. The report sorts them on two axes – discrete vs. continuous control, and whether the model predicts the future at all – and shows why the strongest systems are converging on a two-speed design.

1

Autoregressive

Emits discrete action tokens like a language model. Simple and general, but with no explicit model of how the world will change.

2

Diffusion / flow matching

Generates smooth, continuous action trajectories by denoising. The basis of the fast “System 1” layer, running above 200 Hz.

3

Latent world models (JEPA)

Predict how the environment evolves in a compact latent space, so a robot can plan against an imagined future rather than raw pixels.

4

World-action models (WAM)

Combine continuous control with world prediction – the strongest recipe so far, more than doubling generalization over VLA baselines.

The convergence: the field is settling on a System 1 / System 2 pattern – a slow 7–34B reasoner sets a latent plan while a fast diffusion policy closes the loop at ~200 Hz. The world model increasingly supplies the predictive substrate that makes the slow half worth consulting. There is no single “humanoid world model” architecture; the winning recipe mixes paradigms.

A fair verdict needs both columns

This report sets the genuine advances against the unsolved failures, and dates the turn – so you can tell real capability from a good demo.

The advances

Predictive architectures, generative simulators and JEPA-style models that let robots imagine consequences before acting.

The failures

Hallucination, drift, the sim-to-real gap and new alignment risks – the reasons deployment is still hard.

The timeline

Benchmarks, unit-cost crossovers and a base case for when a reliable generalist actually arrives.

What’s inside the humanoid world models report

Twenty-four chapters and 16 figures – a complete reference on the “brains” of humanoid robots: the architectures, the players, the compute and geopolitics behind them, and a sober read on capability, safety and timing.

PART I

Foundations

  • 01 – Reshaping Physical AI
  • 02 – From VLAs to World Models
  • 03 – Inside a World Model
  • 04 – The Four Architectural Paradigms
  • 05 – JEPA & the Predictive Architectures
  • 06 – Generative World Simulators
  • 07 – Scaling Laws & the Data Question
PART II

The Models & Makers

  • 08 – Foundation Models I: NVIDIA’s Platform
  • 09 – Foundation Models II: The Challengers
  • 10 – The Humanoid Makers
  • 11 – Training Data & the Data Moat
  • 12 – Simulation & Sim-to-Real
PART III

Compute & Geopolitics

  • 13 – Compute & Infrastructure
  • 14 – USA: Innovation vs. Volume
  • 15 – China: State-Backed Scale
  • 16 – Europe, Japan & Korea
  • 17 – Geopolitics
PART IV

Applications & Evaluation

  • 18 – Industrial Applications
  • 19 – Domestic & Service Robots
  • 20 – Benchmarks & Evaluation
  • 21 – Strengths, Shortcomings & Safety
PART V

Outlook

  • 22 – Business & Investment
  • 23 – Scenarios
  • 24 – Playbook & the 2026–2030 Window

The Brain Score directory

All 40 foundation models tracked by humanoid.guide – each independently rated 0–10 across ten capability dimensions, so you can compare the whole field at a glance. Yellow means the model has the skill.

LocomotionLocomotion
Whole BodyWhole Body
BimanualBimanual
DexterousDexterous
NavigationNavigation
ReasoningReasoning
Sim-to-RealSim-to-Real
Cross-EmbodimentCross-Embodiment
Real-TimeReal-Time
Long-HorizonLong-Horizon

16 figures, sharp in the report

Each chapter pairs deep narrative with the visual frameworks that strategy and engineering teams use to communicate findings internally. The previews below are intentionally blurred – the full, readable versions, with the data behind them, come with the report.

Figure 2.1 – The speed–generalization trade-off
Figure 2.1 – The speed–generalization trade-off
Figure 4.1 – The four architectural paradigms
Figure 4.1 – The four architectural paradigms
Figure 8.1 – World-action models vs. VLA baselines
Figure 8.1 – World-action models vs. VLA baselines
Figure 21.1 – World models vs. VLAs across six capabilities
Figure 21.1 – World models vs. VLAs across six capabilities
Figure 22.1 – Robotics venture funding by year
Figure 22.1 – Robotics venture funding by year
Figure 23.1 – Three (plus one) scenarios on the AI-progress × hardware axes
Figure 23.1 – Three (plus one) scenarios on the AI-progress × hardware axes

Heavy on illustration, light on filler

Every chapter ties an idea to what it means for builders, buyers and investors – with original editorial illustration throughout.

End-to-end manipulation
End-to-end manipulation
Real-world training & the data moat
Real-world training & the data moat
Safety, drift & the sim-to-real gap
Safety, drift & the sim-to-real gap
When the technology becomes reliable
When the technology becomes reliable
Inside: scenarios & timelineInside: the model directory
A robot imagining an outcome

Get the report

Single-user license for individual analysts and engineers. Enterprise license for strategy, research and product teams.

Single User License

$390
One-time purchase · PDF delivered immediately
  • Full 100-page report (PDF)
  • All 24 chapters & 16 figures
  • The 40-model Brain Score directory
  • Single-user license
  • Free updates through the 2026 cycle
Buy single license →

Enterprise License

$1,590
One-time purchase · Internal distribution rights
  • Everything in Single User
  • Unlimited internal users at one organization
  • Right to quote in internal strategy documents
  • Priority email support
  • Optional 60-min briefing call with the authors
Buy enterprise license →
Pricing shown is the recommended launch figure (peer to the Market Report, ~4× single-to-enterprise multiple). Easy to change.

Frequently asked questions

What exactly is a humanoid world model?
A humanoid world model is a foundation model that learns how an environment changes, so a robot can predict – and imagine – the consequences of an action before it acts. The report explains how humanoid world models differ from vision-language-action (VLA) models, and why the field is shifting toward them.
Who is the report for?
Investors and corporate strategy teams, robotics and AI engineers, foundation-model builders, and humanoid OEMs and suppliers evaluating the “brain” layer of the stack.
How is this different from the Market Report and the Supply Chain report?
The Market Report and the Supply Chain report cover the full market and the hardware. This report goes deep on the intelligence layer – humanoid world models, VLAs, the architectures and players, and an independent 40-model directory.
What is the Brain Score?
humanoid.guide’s editorial 0–10 rating of each model across ten capabilities – locomotion, whole-body control, manipulation, navigation, reasoning, sim-to-real, cross-embodiment, real-time inference and long-horizon planning.
What is the methodology?
The report synthesizes published model results, benchmarks and primary industry sources, with every claim cited and a full source index in the appendix. Brain Scores are humanoid.guide’s own editorial evaluations.
Will it be updated?
Yes – buyers receive free updates throughout the 2026 cycle as the landscape shifts.
Can I get a preview?
Yes – get in touch for a sample or a briefing.
The next platform race

The brains are being rebuilt – right now.

Understand the architectures, the players and the timeline before the field consolidates in the 2026–2030 window.

Buy full report →
www.humanoid.guideHumanoid Foundation Models · World Models Report 2026© 2026 human@humanoid.guide