May 18 – May 24, 2026

Preprint Report: 3D video vision, LLM agent workflows, and efficient LLM inference

Nearly 4,000 CS preprints landed on arxiv this week. Roughly 23% touched computer vision research, around 30% touched LLM agents, and approximately 8% fell into efficient ML systems. 3D video vision models were a visible slice of vision activity, close to 4% of that parent, pushing generation toward scene-consistent structure. Work on LLM agent workflows forms about 3% of LLM agents research, shifting from single prompts to managed multi-turn systems. Work on efficient LLM inference forms about 13% of efficient ML systems research, with runtime bottlenecks replacing retraining as the main target.

Geometry-aware vision models

A lot of vision work is trying to make generative models behave more like structured scene models. The pressure is clear: free-form image synthesis can look good while breaking geometry, camera motion, or object consistency. GenRecon: Bridging Generative Priors for Multi-View 3D Scene Reconstruction tackles sparse-view reconstruction by using generative priors to fill in missing scene detail without drifting away from the observed views. DeltaCam: Differential Intrinsic Camera Modeling for Video Generation addresses camera-control errors by explicitly modeling changing camera intrinsics, which helps generated video respect lens behavior instead of faking it frame by frame. Where Detectors Fail: Probing Generative Space for Generalizable AI-Generated Image Detection goes after brittle fake-image detectors by testing them in the generative space itself, aiming for failures that expose model shortcuts before deployment.

Managed LLM agent systems

Agent preprints are increasingly about the scaffolding around the model, not only the model reply itself. The recurring problem is that a competent chat model can still fail once memory, tools, and long task trajectories enter the loop. Polar: Agentic RL on Any Harness at Scale tries to solve the narrow-benchmark problem by making reinforcement learning work across many agent harnesses, so training can target full workflows instead of handcrafted demos. Found in Conversation: LLMs Teach Themselves to Close the Multi-Turn Gap addresses the mismatch between one-shot training and real dialogue by letting models learn from conversation traces that expose turn-to-turn errors. MemMorph: Tool Hijacking in LLM Agents via Memory Poisoning studies how long-term memory can become an attack surface, showing that agent reliability now depends on memory hygiene as much as prompt filtering.

Runtime efficiency for LLMs

Efficiency work this week is less about shrinking models in the abstract and more about removing the specific runtime costs that make large models expensive to serve. Optimus: Elastic Decoding for Efficient Diffusion LLM Serving targets uneven decoding workloads by changing how work is scheduled during serving, which helps keep hardware busy instead of waiting on stragglers. Approaching I/O-optimality for Approximate Attention goes after memory traffic, a hidden cost in long-context models, by reorganizing approximate attention so data movement gets closer to the hardware limit. CONF-KV: Confidence-Aware KV Cache Eviction with Mixed-Precision Storage for Long-Horizon LLM tackles KV cache growth, the stored key-value history used during generation, by keeping the most useful states at higher fidelity and compressing the rest before context windows become too expensive.

See this week's rankings