Jun 1 – Jun 7, 2026

Preprint Report: Robot manipulation, agent safety, and efficient long-context inference

Nearly 4,000 CS preprints landed on arXiv this week. Around approximately 8% touched robotics research, roughly 30% touched LLM agent research, and close to 30% touched model efficiency research. Robot manipulation systems were roughly 13% of robotics research. Work on tool-using agent safety forms about 3% of LLM agent research, with evaluations centered on workflow abuse and trace-level failures. Work on efficient long-context inference forms about 3% of model efficiency research, where memory budgets are forcing more selective computation.

Structured robot control

Robot learning keeps circling back to a practical question: how much structure has to be built into a policy before it stops falling apart on real hardware. Spline Policy: A Structured Representation for Robot Policies addresses the usual problem of jittery, hard-to-interpret action outputs by representing behavior with smoother policy primitives that are easier to execute and inspect. RGB-S: Image-Aligned Tactile Saliency for Robust Dexterous Manipulation tackles the common failure of vision-only grasping under contact by highlighting the tactile cues that matter at the moment of manipulation. vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models goes after another bottleneck, the difficulty of running large VLA models on modest hardware, by providing a C++ runtime that makes those policies more portable at the edge.

Tool-using agent safety

Safety work around agents is becoming less about whether an answer looks right and more about how the system reached it. GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines takes on the abstractness of many prompt-injection demos by showing attacks inside live software automation workflows and testing pipeline-level mitigations. Where Instruction Hierarchy Breaks: Diagnosing and Repairing Failures in Reasoning Language Models focuses on a different weak point, models losing track of which instructions should outrank others, and proposes targeted repairs for those hierarchy failures. Beyond Pass/Fail: Using Process Mining to Understand How LLMs Resist (and Fail) Red Team Attacks addresses the blindness of single success metrics by tracing the sequence of agent actions, so unsafe behavior can be diagnosed before it shows up as a final visible failure.

Efficient long-context inference

Inference efficiency now looks less like generic compression and more like budgeting attention for the parts of context that still matter. STAR-KV: Low-Rank KV Cache Compression via Soft Thresholding for Adaptive Rank Control targets the swelling key-value cache, model memory that grows with every token, by compressing it adaptively instead of storing everything at full size. IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference tackles the same memory problem from the agent side by dropping cached state that no longer serves the user's active goal. Look Less, Reason More: Block-wise Attention Skipping for Efficient Multimodal LLMs attacks wasted compute in long multimodal prompts by skipping attention blocks that contribute little, a direct response to models that are expensive because they inspect too much rather than because they know too little.

See this week's rankings