Learning High-Frequency Continuous Action Chunks in Latent Space
AIPR assessment
This is a hard and competitive robotics control problem, not a toy benchmark: smooth contact-rich manipulation at 60 Hz with real hardware, asynchronous inference, and latency-sensitive execution. The strongest aspects reinforce one another, the latent representation improves within-chunk precision and smoothness, RTR improves cross-chunk continuity, and the real-robot experiments show the combination matters operationally. The main weaknesses also interact, the study is limited to one platform
Abstract
Modern robotic policies increasingly rely on action chunking to execute complex tasks in the physical world. While action chunking improves temporal consistency at moderate action frequencies, it becomes insufficient when the action frequency is further increased (e.g., to 60~Hz). At such high frequencies, policies often fail to generate actions that are both temporally smooth and spatially consistent. We address this challenge by shifting high-frequency action learning from the action space to a latent space with variational autoencoder (VAE). This formulation significantly improves both temporal and spatial consistency of high-frequency control. To enable smooth real-time execution, we further introduce Reuse-then-Refine, a chunk-level refine strategy that improves continuity between adjacent action chunks under asynchronous inference. As a result, robots controlled by our policy can execute complex contact-rich tasks continuously, with less pauses and jerky motions. Experiments on three real-world contact-rich robotic tasks show that our approach consistently completes tasks with smooth motions. Our code and data are available at https://github.com/tars-robotics/RTR.
Score Breakdown
More from this week
- Optimus: Elastic Decoding for Efficient Diffusion LLM Serving
- TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery
- Context Features Are Cheap: Rank-Aware Decomposition for Efficient Feature Interaction in Recommender Systems
- Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions
- A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification
More in Robotics
- Benchmarking Vision-Language-Action Models on SO-101: Failure and Recovery Analysis
- vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models
- Dynamic Neural Koopman Distillation for Real-Time Robot Control Using Diffusion Models
- AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust
- 123D: Unifying Multi-Modal Autonomous Driving Data at Scale