Buy full report Updated June 2026

Humanoid Foundation Models – World Models Report – humanoid.guide

Foundation Models · The Robot Brains

Humanoid Foundation Models

The brains are being rebuilt – from VLAs to predictive world models.

Just released · 2026

Humanoid world models are the new brains of embodied AI. The systems that turn perception into action are shifting from vision-language-action models to predictive world models that let a robot imagine the consequences of an action before it takes it.

This report maps that shift end to end, independently scores the 40 foundation models that matter, and asks – soberly – when the technology will actually be reliable.

Buy full report → See what’s inside ↓ Explore all Foundation Models →

100 pages · 24 chapters · 40 models scored across 10 capabilities · 16 figures · forecasts to 2030

Free preview

Get the free preview

Read the executive summary and a sample of the 40-model Brain Score directory before you buy – delivered to your inbox as a PDF.

$13.8B

Record robotics venture funding in 2025 – humanoid investment up ~143× in four years

Report ch.22, industry funding data

2×

World-action models more than double generalization over VLA baselines (DreamZero 62.2% vs 27.4%)

Report ch.8

~90% / 70%

China’s share of humanoid unit sales and of the component supply chain – “Western brains, Chinese bodies”

Report ch.15

2028–30

Base-case window for reliable, unattended generalist autonomy

humanoid.guide base case

Why now

Why the intelligence layer – and why now

Hardware is consolidating; the open contest now is the “brain.” The shift from VLAs to humanoid world models is happening in months, not years, and the architecture that wins will shape the whole stack. This report tells you what is changing, who is ahead, and what it means – for six audiences.

Investors & corporate strategy

Where the USD 13.8B funding wave is flowing, which architectures are winning, and how to read valuations priced years ahead of revenue.

Robotics & AI engineers

The four architectural paradigms, the System 1 / System 2 convergence, and why the field is moving from VLAs to predictive world models.

Foundation-model builders

The data moat, scaling laws for embodied data, simulation and sim-to-real – and where the open challengers are closing the gap.

Humanoid OEMs & suppliers

Which “brain” to license or build, the split-stack reality of Western models on Chinese bodies, and the compute beneath it all.

Policy & compute teams

Export controls, state-backed scale, and the geopolitics that decide who trains and who only assembles.

Anyone betting on embodied AI

An independent, sober read on capability, safety and timing – not a hype reel and not a teardown.

The central finding

The most exciting idea – and the least finished

Humanoid world models are the most exciting idea in robot learning – and the least finished. The advances are real: predictive, world-action architectures more than double generalization over VLA baselines, video-trained policies now top manipulation benchmarks, and embodied data has begun to show clean scaling laws.

But demos are not deployment. Reliable, unattended generalist autonomy in unstructured settings is, on our base case, a 2028–2030 proposition – gated less by ideas than by the last 10% of reliability, safety validation and the lab-to-field gap. The likely equilibrium is a split stack: Western brains on Chinese bodies. The question is not whether world models will reshape embodied AI – it is who controls the architecture, the data and the compute when they do.

Inside the architecture

The four paradigms

Under the banner of “world models” sit four distinct ways to generate an action. The report sorts them on two axes – discrete vs. continuous control, and whether the model predicts the future at all – and shows why the strongest systems are converging on a two-speed design.

Autoregressive

Emits discrete action tokens like a language model. Simple and general, but with no explicit model of how the world will change.

Diffusion / flow matching

Generates smooth, continuous action trajectories by denoising. The basis of the fast “System 1” layer, running above 200 Hz.

Latent world models (JEPA)

Predict how the environment evolves in a compact latent space, so a robot can plan against an imagined future rather than raw pixels.

World-action models (WAM)

Combine continuous control with world prediction – the strongest recipe so far, more than doubling generalization over VLA baselines.

The convergence: the field is settling on a System 1 / System 2 pattern – a slow 7–34B reasoner sets a latent plan while a fast diffusion policy closes the loop at ~200 Hz. The world model increasingly supplies the predictive substrate that makes the slow half worth consulting. There is no single “humanoid world model” architecture; the winning recipe mixes paradigms.

A fair verdict

A fair verdict needs both columns

This report sets the genuine advances against the unsolved failures, and dates the turn – so you can tell real capability from a good demo.

The advances

Predictive architectures, generative simulators and JEPA-style models that let robots imagine consequences before acting.

The failures

Hallucination, drift, the sim-to-real gap and new alignment risks – the reasons deployment is still hard.

The timeline

Benchmarks, unit-cost crossovers and a base case for when a reliable generalist actually arrives.

The contents

What’s inside the report

Twenty-four chapters and 16 figures – a complete reference on the “brains” of humanoid robots: the architectures, the players, the compute and geopolitics behind them, and a sober read on capability, safety and timing.

Part I

Foundations

01 – Reshaping Physical AI
02 – From VLAs to World Models
03 – Inside a World Model
04 – The Four Architectural Paradigms
05 – JEPA & the Predictive Architectures
06 – Generative World Simulators
07 – Scaling Laws & the Data Question

Part II

The Models & Makers

08 – Foundation Models I: NVIDIA’s Platform
09 – Foundation Models II: The Challengers
10 – The Humanoid Makers
11 – Training Data & the Data Moat
12 – Simulation & Sim-to-Real

Part III

Compute & Geopolitics

13 – Compute & Infrastructure
14 – USA: Innovation vs. Volume
15 – China: State-Backed Scale
16 – Europe, Japan & Korea
17 – Geopolitics

Part IV

Applications & Evaluation

18 – Industrial Applications
19 – Domestic & Service Robots
20 – Benchmarks & Evaluation
21 – Strengths, Shortcomings & Safety

Part V

Outlook

22 – Business & Investment
23 – Scenarios
24 – Playbook & the 2026–2030 Window

The Brain Score directory

All 40 foundation models tracked by humanoid.guide – each independently rated 0–10 across ten capability dimensions, so you can compare the whole field at a glance. Yellow means the model has the skill.

Locomotion

Whole-Body

Bimanual

Dexterous

Navigation

Reasoning

Sim-to-Real

Cross-Embodiment

Real-Time

Long-Horizon

The figures

16 figures, sharp in the report

Each chapter pairs deep narrative with the visual frameworks that strategy and engineering teams use to communicate findings internally. The previews below are intentionally blurred – the full, readable versions, with the data behind them, come with the report.

Figure 2.1 – The speed–generalization trade-off — **Figure 2.1** – The speed–generalization trade-off

Figure 4.1 – The four architectural paradigms — **Figure 4.1** – The four architectural paradigms

Figure 8.1 – World-action models vs. VLA baselines — **Figure 8.1** – World-action models vs. VLA baselines

Figure 21.1 – World models vs. VLAs across six capabilities — **Figure 21.1** – World models vs. VLAs across six capabilities

Figure 22.1 – Robotics venture funding by year — **Figure 22.1** – Robotics venture funding by year

Figure 23.1 – Three (plus one) scenarios on the AI-progress × hardware axes — **Figure 23.1** – Three (plus one) scenarios on the AI-progress × hardware axes

Get the report

Choose your license

Single-user license for individual analysts and engineers. Enterprise license for strategy, research and product teams.

Single User License

$390

One-time purchase · PDF delivered immediately

Full 100-page report (PDF)
All 24 chapters & 16 figures
The 40-model Brain Score directory
Single-user license
Free updates through the 2026 cycle

Buy single license →

Enterprise License

$1,590

One-time purchase · Internal distribution rights

Everything in Single User
Unlimited internal users at one organization
Right to quote in internal strategy documents
Priority email support
Optional 60-min briefing call with the authors

Buy enterprise license →

License terms

Single User License · Terms

What you can and cannot do with the report as an individual buyer – one reader, one organization.

Read the terms · PDF

Enterprise License · Terms

Company-wide use: internal sharing, presentations and citation rights across your organization.

Read the terms · PDF

By purchasing, you agree to the terms of the license you select. Questions? human@humanoid.guide

Questions

Frequently asked questions

What exactly is a humanoid world model?: A humanoid world model is a foundation model that learns how an environment changes, so a robot can predict – and imagine – the consequences of an action before it acts. The report explains how humanoid world models differ from vision-language-action (VLA) models, and why the field is shifting toward them.
Who is the report for?: Investors and corporate strategy teams, robotics and AI engineers, foundation-model builders, and humanoid OEMs and suppliers evaluating the “brain” layer of the stack.
How is this different from the Market Report and the Supply Chain report?: The Market Report and the Supply Chain report cover the full market and the hardware. This report goes deep on the intelligence layer – humanoid world models, VLAs, the architectures and players, and an independent 40-model directory.
What is the Brain Score?: humanoid.guide’s editorial 0–10 rating of each model across ten capabilities – locomotion, whole-body control, manipulation, navigation, reasoning, sim-to-real, cross-embodiment, real-time inference and long-horizon planning.
What is the methodology?: The report synthesizes published model results, benchmarks and primary industry sources, with every claim cited and a full source index in the appendix. Brain Scores are humanoid.guide’s own editorial evaluations.
Will it be updated?: Yes – buyers receive free updates throughout the 2026 cycle as the landscape shifts.
Can I get a preview?: Yes – get in touch for a sample or a briefing.

The next platform race

The brains are being rebuilt – right now.

Understand the architectures, the players and the timeline before the field consolidates in the 2026–2030 window.

Buy full report →

New! 2026 Humanoid
Robot Market Report

198 pages of exclusive insight from global robotics experts — uncover funding trends, technology challenges, leading manufacturers, supply chain shifts, and surveys and forecasts on future humanoid applications.

Featuring insights from Aaron Saunders, Former CTO of Boston Dynamics,
now Google DeepMind

Get the free preview

Why the intelligence layer – and why now

Investors & corporate strategy

Robotics & AI engineers

Foundation-model builders

Humanoid OEMs & suppliers

Policy & compute teams

Anyone betting on embodied AI

The most exciting idea – and the least finished

The four paradigms

Autoregressive

Diffusion / flow matching

Latent world models (JEPA)

World-action models (WAM)

A fair verdict needs both columns

The advances

The failures

The timeline

What’s inside the report

Foundations

The Models & Makers

Compute & Geopolitics

Applications & Evaluation

Outlook

The Brain Score directory

16 figures, sharp in the report

Choose your license

Single User License

Enterprise License

Frequently asked questions

The brains are being rebuilt – right now.

New! 2026 HumanoidRobot Market Report

The Humanoid Robot Supply Chain

New! 2026 Humanoid
Robot Market Report