UniDomain

Pretraining a Unified PDDL Domain from Real-World Demonstrations for Generalizable Robot Task Planning

Haoming Ye1,2*, Yunxiao Xiao2,3*, Cewu Lu1,2, Panpan Cai1,2†
1Shanghai Jiao Tong University 2Shanghai Innovation Institute 3Beijing University of Posts and Telecommunications

Abstract

Robotic task planning in real-world environments requires reasoning over implicit constraints from language and vision. While LLMs and VLMs offer strong priors, they struggle with long-horizon structure and symbolic grounding. Existing methods that combine LLMs with symbolic planning often rely on handcrafted or narrow domains, limiting generalization. We propose UniDomain, a framework that pre-trains a PDDL domain from robot manipulation demonstrations and applies it for online robotic task planning. It extracts single domains from 12,393 manipulation videos to form an all-domain set with 3137 operators, 2875 predicates, and 16,481 causal edges. Given a target class of tasks, it retrieves relevant atomics from the all-domain set and systematically fuses them into high-quality meta-domain to support compositional generalization in planning. Experiments on diverse real-world tasks show that UniDomain solves complex, unseen tasks in a zero-shot manner, achieving up to 58% higher task success and 160% improvement in plan optimality over state-of-the-art LLM and LLM-PDDL baselines.

Highlights

Teach robots a reusable planning world from demonstrations.

Pretraining from Real-World Demonstrations
12,393 demonstrations

Pretrain once, then reuse symbolic knowledge for unseen tasks with zero-shot planning.

Unified Domain as Planning World
3,137 operators 2,875 predicates 16,481 causal edges

One reusable symbolic graph built from real robot demonstrations.

State-of-the-Art Performance
+58% success +160% optimality

On unseen long-horizon tasks, UniDomain beats strong LLM-only and LLM-PDDL baselines.

Real-World Drink Serving
Dual-arm humanoid robot serving

Users can speak what they want, and the robot prepares and serves the drink at the table.

Inside the Unified Domain

  • One graph, two kinds of knowledge. In the unified domain, predicate nodes in purple describe world states, while operator nodes in green describe reusable robot actions.
  • Causality is explicit. The edges capture the semantic relationships used in planning, including how predicates and operators connect through preconditions and effects.
  • Local clusters reveal reusable patterns. Nearby nodes often reflect predicates and operators that frequently appear together, exposing reusable planning motifs across tasks.
Visualization of Unified Domain
Visualization of our pre-trained unified domain, with 3,137 operator nodes (green) and 2,875 predicate nodes (purple).

Navigate in the Graph

Tips
Interaction Tips
  • Hover to inspect a node neighborhood.
  • Select a node to focus on its local structure.
  • Click empty space to clear the current selection.
  • Drag nodes to adjust positions.
  • Use the mouse wheel to zoom in and out.
Unified Domain (10% Sample): A sampled view of the full graph, showing the scale and connectivity learned from demonstrations.

The Vocabulary of the Graph

  • Broad Coverage. UniDomain spans 170 action categories, from everyday verbs like push and stir to fine-grained behaviors such as scrunch and rub. The compact meta-domain still preserves rich semantic knowledge for efficient planning.
Action type distribution of synthesized meta domain
Meta Domain

See Planning in Action

Given a scene image and a natural-language instruction, UniDomain grounds a PDDL problem and solves it into an executable plan.

Task input image
Task Instruction
Move the corn from the pot into the orange bowl, wipe the table with the towel in the drawer and put it back to the closed drawer.

PDDL Problem Generated by UniDomain

Executable Plan

From Demonstrations to Planning

  • Pretrain symbolic domains from demonstrations. UniDomain segments each video into keyframes, proposes an initial domain, and refines it through closed-loop verification.
  • Fuse relevant domain fragments into a meta-domain. UniDomain retrieves the right domains and merges them into a compact planning graph for a task family.
  • Ground the scene and plan online. UniDomain grounds a scene image and user instruction into a PDDL problem, which is then solved into a plan.
Overview of UniDomain
UniDomain first learns reusable domains from demonstrations, then fuses the right planning graph for a task family, and finally grounds the scene to produce a plan.

State-of-the-Art Performance

Across 100 unseen long-horizon tasks in four domains, UniDomain outperforms both direct LLM/VLM planners and hybrid LLM-PDDL baselines on success, plan quality, and efficiency.

Comparison results of UniDomain and baselines
Main comparison on unseen tasks: UniDomain leads on core planning metrics while maintaining competitive runtime and fewer LLM calls among top-performing methods.
Shared legend for specific result figures
Specific comparison result
Success Rate breakdown

Why It Works

  • Verification keeps the learned domains usable. Without closed-loop verification, atomic domains become brittle in syntax, solvability, and task logic.
  • Hierarchical fusion builds a coherent planning graph. Naive union or direct LLM-only merging produces domains that do not compose cleanly.
  • Task-relevant grounding makes the planner much stronger. Predicate grouping and task-relevant filtering significantly improve planning performance.
Ablation study on domain generation
Closed-loop verification and hierarchical fusion are essential for building usable atomic domains and compact meta-domains.
Ablation study of the UniDomain planner
Predicate grouping and task-relevant filtering significantly improve planning performance on compositional tasks.

From Language to a Drink on the Table

UniDomain can be seamlessly integrated into a real robot system. In our drink-making setup, a dual-arm humanoid robot takes spoken requests, reasons over ingredients and preparation steps, and serves the finished drink to the user.

The user can customize what to make from available ingredients such as tea, milk, water, floral teas, and fruit syrups including mango, lychee, and kumquat lemon, rather than choosing from a single fixed recipe.

Example request: “Make me a cup of milk tea with mango juice.”

BibTeX

@inproceedings{ye2025unidomain,
    title={UniDomain: Pretraining a Unified {PDDL} Domain from Real-World Demonstrations for Generalizable Robot Task Planning},
    author={Haoming Ye and Yunxiao Xiao and Cewu Lu and Panpan Cai},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
}