The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence
AIPR assessment
Problem difficulty is moderately high. This is not a saturated benchmark in the sense of ImageNet-like competition, but it sits at an intersection where legal evidence, document forensics, and synthetic media detection all matter, so a well-designed corpus is genuinely useful. The strongest features reinforce each other: source-disjoint splits, controlled manipulation tiers, per-item metadata, and provenance markers together make the benchmark more credible than a simple synthetic dataset. The m
Abstract
The growing ability of generative models to produce realistic documents poses a direct challenge to evidentiary workflows in the justice system and the courts, where decisions increasingly depend on the authenticity of evidence such as receipts, communications, and administrative records. Unlike social media or academic settings, evidentiary documents are often only subtly altered, with small, localized edits that preserve overall plausibility while changing legal meaning. Yet progress on automated detection remains limited, largely due to the absence of suitable training and evaluation data especially suited for the justice system requirements. Existing resources are either focused on photos of human faces or natural scenery or on narrowly scoped academic or social media document types, and do not capture the structure, diversity, or manipulation patterns characteristic of real-world evidentiary data. As a result, current detection systems do not necessarily learn meaningful signals appropriate for the justice system. We introduce the CIFAR Synthetic Evidence Corpus, a dataset designed to enable rigorous evaluation of evidence verification under realistic and controlled conditions. The corpus spans multiple document families and a spectrum of manipulation strategies, from small field-level edits to complete document fabrication, and is constructed using a diverse set of state-of-the-art generative tools. It is organized to systematically vary both manipulation complexity and generation method, while enforcing source-level separation between training and test data to reflect real-world generalization challenges.
Score Breakdown
More from this week
- RealDocBench: A Benchmark for Field-Level QA and Layout Understanding on Real-World Regulated Documents
- GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines
- vla.cpp: A Unified Inference Runtime for Vision-Language-Action Models
- ScaleDisturb: Exploiting Temporal Asymmetry to Amplify Read Disturbance in Modern DRAM Chips
- The Windows IOCTL Census: A Corpus-Scale, Multi-Architecture Database of the Driver Control-Code Surface
More in AI
- GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines
- PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
- SCOPE: Cost-Efficient Model Selection for Compound AI Systems under Quality Constraints
- Context Features Are Cheap: Rank-Aware Decomposition for Efficient Feature Interaction in Recommender Systems
- Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment