1X says NEO is “starting to learn on its own” with a new video based World Model
1X Technologies has announced what it calls a major update to its humanoid AI stack: the 1X World Model (1XWM), designed to let NEO turn natural language prompts into new robot skills on demand, including tasks it has never been explicitly trained on
2026 Humanoid Robot Market Report
160 pages of exclusive insight from global robotics experts – uncover funding trends, technology challenges, leading manufacturers, supply chain shifts, and surveys and forecasts on future humanoid applications.

Featuring insights from
Aaron Saunders, Former CTO of
Boston Dynamics,
now Google DeepMind

2026 Humanoid Robot Market Report
160 pages of exclusive insight from global robotics experts – uncover funding trends, technology challenges, leading manufacturers, supply chain shifts, and surveys and forecasts on future humanoid applications.
From “tell the robot” to “show the robot the future”
The core idea is simple to describe but hard to execute: instead of mapping text + images directly to motor commands (the common “vision language action” route), 1XWM generates a text conditioned video rollout of what should happen next in the scene, then converts that imagined future into real robot motion.
According to 1X, the pipeline has two main components:
- A world model backbone, built on a 14B generative video model, trained on web scale video, then adapted with egocentric human video and finally fine tuned on NEO specific sensorimotor logs.
- An inverse dynamics model (IDM) that predicts the action sequence needed to move from one video frame to the next, bridging “pixels to actuators.”
At runtime, the system takes a starting frame plus a text or voice prompt, rolls out a short future, extracts an action trajectory, and executes it.
What 1X showed in demos
In the accompanying announcement and demo material, 1X claims NEO can generalize across household tasks like packing a lunch box and handle more novel interactions (for example manipulating a toilet seat, opening a sliding door, ironing a shirt, brushing a person’s hair) without prior examples for those exact tasks.
The company frames this as a shift away from humanoid models that rely heavily on teleoperation data collection. The stated goal is a “flywheel” where NEO can collect more of its own experience and continuously expand its capabilities.
Why this matters for humanoids
One of the biggest constraints in real world humanoid progress is not compute, it’s data: collecting diverse, high quality robot demonstrations is slow and expensive. 1X’s bet is that internet scale video already contains physical common sense, and a sufficiently grounded world model can transfer that knowledge into robotic behavior, especially when the robot’s embodiment is close to human form.
In the blog post, 1X explicitly positions 1XWM against mainstream VLA approaches and argues that video generation can better capture physical dynamics, as long as the rollout is grounded to the robot’s viewpoint, kinematics, and real world constraints.
Limits and open questions
1X also notes remaining failure modes: generated rollouts can look plausible while still violating real world constraints (depth, contact, geometry), and some dexterous tasks remain challenging.
They also explore “best of N” rollouts (generate several futures, pick the best) as one path to higher success rates, and suggest this could eventually be automated with a model based evaluator.
Availability context
1X continues to position NEO as a consumer home humanoid with Early Access ownership priced at $20,000 (with priority delivery targeted for 2026) and a $499/month subscription option.
For Humanoid Guide readers, the key question is whether world model driven control can scale faster than today’s demonstration heavy methods. If 1XWM holds up outside curated demos, it could be an important step toward robots that learn new household skills at internet speed rather than operator speed.
