GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines
AIPR assessment
The problem is hard and active, with a crowded and fast-moving security literature around prompt injection, AI agents, and CI/CD automation. The strongest parts of the paper reinforce each other: real execution on live ephemeral repositories, concrete attack reproductions, and workflow-level mitigations make the results credible and useful. The weaknesses also compound in a mild way: the study is bounded to selected providers and workflows, and some broad structural claims lean on extrapolation
Abstract
AI-powered agents are increasingly embedded in continuous integration and continuous delivery/deployment (CI/CD) pipelines to autonomously review pull requests (PRs), triage issues, and maintain codebases. These agents ingest untrusted content while operating with elevated repository permissions, making them a natural target for prompt injection attacks with supply chain consequences. We present GitInject, an open-source framework for evaluating prompt injection vulnerabilities in real, live GitHub workflows, a widely deployed instance of CI/CD pipelines. Unlike prior agent security benchmarks that simulate tool calls, GitInject provisions ephemeral repositories and triggers actual workflow runs, so that sandbox constraints, credential handling, and permission boundaries behave exactly as in production. Using GitInject, we study workflow configurations across four AI providers and document eleven named attacks spanning config-file injection, credential exfiltration, judgment manipulation, and availability. We find that all tested providers are susceptible to at least one attack class in their default configuration, and that the most critical vulnerabilities are structural: they arise from how CI/CD infrastructure handles credentials and configuration files, not from any specific model's behavior. For each confirmed attack class, we identify the minimum-cost workflow-level countermeasure and analyze its coverage and limitations. GitInject is released publicly to facilitate further research in this direction.
Score Breakdown
More from this week
- RealDocBench: A Benchmark for Field-Level QA and Layout Understanding on Real-World Regulated Documents
- vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models
- ScaleDisturb: Exploiting Temporal Asymmetry to Amplify Read Disturbance in Modern DRAM Chips
- The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence
- The Windows IOCTL Census: A Corpus-Scale, Multi-Architecture Database of the Driver Control-Code Surface
More in AI
- The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence
- PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
- SCOPE: Cost-Efficient Model Selection for Compound AI Systems under Quality Constraints
- Context Features Are Cheap: Rank-Aware Decomposition for Efficient Feature Interaction in Recommender Systems
- Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment