SCOPE: Cost-Efficient Model Selection for Compound AI Systems under Quality Constraints
AIPR assessment
This is a hard, competitive optimization problem, large combinatorial search over many LLM assignments under an explicit quality constraint, with expensive evaluations and practical cost pressure. The strengths reinforce each other well: the query-level search idea, the proof-based feasibility guarantee, the open code, and the strong benchmark gains all point in the same direction. The main weaknesses also interact: the algorithm depends on kernelized surrogate modeling, tuned bound parameters,
Abstract
A compound AI system consists of multiple LLM modules, together handling complex and multi-step tasks that exceed the capabilities of a single model. Existing systems often use a single expensive LLM across all modules to improve the result quality of the whole system. However, this configuration incurs prohibitive costs, particularly for data management and analytics tasks at scale, such as data manipulation. To this end, we formalize the problem of constrained LLM selection for compound AI systems, leveraging the diverse pricing and capabilities of different LLMs to achieve competitive quality at lower cost. Given a query dataset and a user-specified quality threshold, we aim to select an LLM for each module to minimize the system's average cost while ensuring that overall quality meets the required threshold. To solve this problem, we propose SCOPE, a cost-efficient optimization algorithm. Unlike existing approaches that rely on expensive dataset-level evaluations, SCOPE exploits per-query results to rapidly estimate the system's cost and quality, and constructs confidence bounds to guide the search for promising LLM combinations. Furthermore, SCOPE provides theoretical guarantees for meeting the quality threshold and achieving near-optimal average cost. We evaluate SCOPE against 7 baselines on three data processing tasks, demonstrating that it outperforms all baselines. Under the same search budget and quality constraint, it finds solutions with up to $20\times$ lower cost than the best competitor during the search and achieves up to $6\times$ lower final cost in the returned solution.
Score Breakdown
More from this week
- Towards Interactive Video World Modeling: Frontiers, Challenges, Benchmarks, and Future Trends
- On Thin Perfect Matchings up to Polylogarithmic Factors
- ViBE: Co-Optimizing Workload Skew and Hardware Variability for MoE Serving
- LeAP: Learnable Adaptive Permutation for Feature Selection in Heterogeneous and Sparse Recommender Systems
- PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
More in AI
- GitInject: Real-World Prompt Injection Attacks in AI-Powered CI/CD Pipelines
- The CIFAR Synthetic Evidence Corpus for Detecting AI-Generated Evidence
- PolySpeech-100: A Large-Scale Benchmark for Speech Understanding Across 100+ Languages and Dialects
- Context Features Are Cheap: Rank-Aware Decomposition for Efficient Feature Interaction in Recommender Systems
- Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment