Abstract
The lack of sufficiently diverse data, coupled with limited data efficiency, remains a major bottleneck for generalist robotic models, yet systematic strategies for collecting and curating such data are not fully explored. One major challenge in achieving generalization lies in the inherently high-dimensional nature of robotic data. Task diversity arises from implicit factors that are sparsely distributed across multiple dimensions and are difficult to define explicitly. To address this challenge, we introduce F-ACIL, a heuristic Factor-Aware Compositional Iterative Learning framework for data factorization and compositional generalization.
Data Distribution
We show three representative data distributions. The 3D surfaces (blue-purple) represent different training data distributions, while the contour maps(warm amber) show the shared evaluation distribution. We compare three data distribution: (a) a narrow Gaussian-like distribution with limited coverage, (b) a quasi-uniform distribution with full coverage but low efficiency, and (c) F-ACIL with multiple gaussian modes, which achieves efficient, broad coverage via factor-wise composition.
Experiment
We show how F-ACIL helps to achieve compositional generalization across high-dimensional spaces on two representative manipulation skill families: Pick-and-Place and Open-and-Close. We consider three groups of strategies to evaluate the improvement in data efficiency and the performance differences:
- F-ACIL-Factors-Ratio: We constructs demonstrations by progressively increasing the coverage of task-relevant factors while controlling the ratio among factor spaces.
- F-ACIL-Factors-Mixture: Increases the overall number of demonstrations from a quasi-uniform distribution without explicitly considering the factor structure.
- Gaussian: Samples demonstrations purely at random from a gaussian distribution without exploiting factor-aware data composition.
1. Compositional Generalization Results
For Pick-and-Place trials in object factor space, F-ACIL-Factors-Ratio requires 2~3x less data to achieve 80–90% success rate comparing to F-ACIL-Factors-Mixture. In the object–action setting of Open-and-Close, F-ACIL-Factors-Ratio requires 4x less data to achieve 80-85% success rate comparing to F-ACIL-Factors-Mixture. In the most complex object–action–environment space, both skills trained with F-ACIL-Factors-Ratio can achieve approximately 85–95% success rate with 3~4x less data than F-ACIL-Factors-Mixture. Both models trained with structured factor-wise strategies outperform the Gaussian baseline with 5~10x less data in all cases.
2. Scaling Laws with Increasing Dimension.
a. Success rate
Scaling simple tasks requires extensive data to reach baseline performance, which is far more difficult for complex skills such as Open-and-Close.
Changes in success rate in dimensions.
b. Power Law
The slope of these power laws can vary dramatically depending on the dimensionality of the distribution spaces. Blindly scaling up dataset volume without accounting for dimensionality often results the curse of dimensionality.
Though the model performance improves proportional to the dataset size according to the scaling law, the scaling exponent can vary dramatically depending on the dimensionality of the data manifold or task space. Blindly scaling up dataset volume without accounting for dimensionality often results in substantially diminished returns — a manifestation of the curse of dimensionality in high-dimensional regimes.
Rollout Exhibition (Fully Autonomous 1x Speed)
We conduct below entensive experiments for Pick-and-Place and Open-and-Close, where the latter one should be noticed: The texture of hinged object is defined by the texture of its manipulated component.
F-ACIL-Object
| Texture | Transparent | Specular | Absorptive |
|---|---|---|---|
| Geometry | Cylindrical | Rod-like | Irregular |
| Texture | Specular | Diffuse | Transparent |
|---|---|---|---|
| Size | Large | Medium | Small |
F-ACIL-Action
where the rows {Left, Middle, Right} discretize positions along the horizontal direction of the tabletop (x-axis), and the columns {Top, Bottom} discretize positions along the vertical direction of the tabletop (y-axis)
| Top | |||
|---|---|---|---|
| Bottom | Left | Middle | Right |
| Top | |||
|---|---|---|---|
| Bottom | Left | Middle | Right |
F-ACIL-Environment
| Warm Light | ||
|---|---|---|
| Cool Light | Toward Right | Toward Left |
| Warm Light | ||
|---|---|---|
| Cool Light | Toward Right | Toward Left |
Citation
@article{ xiao2026generalizableroboticdataflywheel, title={Towards Generalizable Robotic Data Flywheel: High-Dimensional Factorization and Composition}, author={Yuyang Xiao and Yifei Zhou and Haoran Wang and Wenxuan Ou and Yuxiao Liu}, year={2026}, eprint={2603.25583}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2603.25583},}