TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery
AIPR assessment
Problem difficulty is high: this is a saturated industrial recommendation and search setting where many teams optimize ranking quality, latency, and serving complexity under real traffic. The strongest evidence comes from a combination of offline gains, live A/B tests, and latency reduction, and those signals reinforce one another, the unified representation is not only better offline but also cheaper to serve. The main weaknesses interact in a familiar way for industrial papers, proprietary con
Abstract
Personalized discovery systems often train separate models for item ranking, carousel ranking, and search, even though these tasks expose complementary signals from the same viewer journey: watches shape carousel and item ranking, search queries reveal intent even when they do not lead to a catalog match, and watch history helps interpret search as rewatching, continuation, or new discovery. We introduce the user story, a serialized representation that turns a user's cross-surface history - attributes, sessions, watch events with surface and carousel context, and search events - into a single token sequence. By interleaving pretrained language tokens with domain-specific event tokens, user stories let heterogeneous recommendation and search tasks be expressed as prompted next-token prediction over a shared grammar. TubiFM is one instantiation of this approach: a Llama 3.2 1B-based model trained on user stories and prompted to rank items, carousels, or search results without task-specific architectures. In offline evaluation, this single model outperforms specialist baselines across item, carousel, and search ranking. In online A/B tests, TubiFM significantly improves search total viewing time (TVT) by $+3.9\%$ and carousel TVT by $+0.30\%$. Item ranking is statistically neutral on TVT ($+0.14\%$), but matches a mature production stack; across all three tasks, TubiFM serves on L40S GPUs and reduces p99 ranking latency from 500ms to 200ms. These results show that shared user stories can improve discovery while simplifying ranking systems.
Score Breakdown
More from this week
- Optimus: Elastic Decoding for Efficient Diffusion LLM Serving
- Context Features Are Cheap: Rank-Aware Decomposition for Efficient Feature Interaction in Recommender Systems
- Rethinking Continual Anomaly Detection on the Edge: Benchmarking Under Realistic Industrial Conditions
- Learning High-Frequency Continuous Action Chunks in Latent Space
- A Pragmatic Approach to Learned Indexing in RocksDB: Targeted Optimizations with Minimal System Modification