May 11 – May 17, 2026

Preprint Report: Controllable vision, agentic LLM workflows, and LLM security auditing

Nearly 4,600 CS preprints landed on arxiv this week. Approximately 23% touched computer vision research, around 7% touched LLM agent research, and roughly 30% touched AI security research. Controllable vision systems took the largest of the three slices, at close to 4% of computer vision research. Work on agentic LLM workflows forms about 14% of LLM agent research, pushing benchmarks toward longer real workflows. Work on LLM security auditing forms about 3% of AI security research, with attention moving to system failures after deployment.

Controllable vision systems

A lot of vision work now assumes the hard part is not seeing an image once, but steering a visual model over time without losing fidelity or blowing up compute. AtlasVid: Efficient Ultra-High-Resolution Long Video Generation via Decoupled Global-Local Modeling by Mai et al. at Dartmouth splits long video generation into global and local pieces, so ultra high resolution clips stop collapsing under sequence length and memory pressure. HierEdit: Region-Aware Hierarchical Diffusion for Efficient High-Resolution Editing by Zhang et al. at Dartmouth tackles the common editing problem where local fixes spill into the rest of the image, using region-aware diffusion to keep edits targeted. Attend Locally, Remember Linearly: Linear Attention as Cross-Frame Memory for Autoregressive Video Diffusion by Li et al. at the University of Central Florida addresses another recurring failure, long videos forgetting earlier frames, by replacing heavier cross-frame memory with linear attention that scales more gently.

Agentic LLM workflows

Agent work is increasingly judged by whether a system can keep its footing across a whole task, not whether it can answer one prompt well. GraphMind: From Operational Traces to Self-Evolving Workflow Automation by Zhu et al. at the University of Illinois Chicago starts from execution traces, then turns them into workflows that can revise themselves as repeated jobs reveal new branches and failure points. Argus: Evidence Assembly for Scalable Deep Research Agents addresses the brittle habit of research agents making early claims before they have enough support, organizing evidence collection so later steps are grounded in assembled source material. Skim: Speculative Execution for Fast and Efficient Web Agents by Wong et al. at Princeton cuts another bottleneck, web agents waiting on every action in sequence, by letting likely next steps run ahead and rolling back when the guess was wrong.

LLM security auditing

Security preprints are treating language models less like isolated chatbots and more like software stacks that can be steered, poisoned, or tricked through their surrounding tools. Securing LLM Agents Need Intent-to-Execution Integrity by Qu et al. at the National University of Singapore focuses on the gap between what an agent appears to intend and what its tool calls actually do, which is where many real failures hide. Privacy Policy Enforcement Guardrails for Data-Sensitive Retrieval-Augmented Generation by Zafar et al. at Case Western Reserve University targets RAG (retrieval-augmented generation) systems that can pull the right document but still reveal the wrong detail, adding policy checks tied to the retrieval flow itself. Auditing Reasoning-Trace Memorization Claims after Unlearning with Head-Conditioned Canaries by Li et al. at Northeastern University examines a different weak point, whether unlearning really removes sensitive traces, using canaries to test if supposedly erased reasoning patterns still leak through.

See this week's rankings