Yan 2025 - LLM for Recsys

Recsys Keynote: Improving Recommendation Systems & Search in the Age of LLMs - Eugene Yan, Amazon

This talk covers future of recsys and how LLMs can be incorporated. 3 challenges:

Cold start challenge of hash based item IDs
Lack of metadata
Task specific models duplicate engineering, increase maintenance cost and don't benefit from transfer learning
- Benefits: simplifies systems, reduces maintenance and transfer learning
- But there may be alignment tax

Kuaishou Example for semantic IDs

Challenge: hash based item IDs don't encode item content and struggle with cold start and sparsity problem.
Solution: Semantic IDs based on multimodal content

Kuaishou is a short video platform. The main problem they wanted to tackle is to help users discover new items faster.

Idea:

Train standard ID-based embeddings for user and items
Create cluster ID from concatenated content embeddings
- Text: BERT
- Video: ResNet
- Audio: VGGish
Run k-means on 100 million items into around 1k clusters
- Each cluster gets an ID and also an embedding
- Incorporate cluster embedding in final embedding

Result:

+3.4% clicks
+3.0% likes
+3.6% cold start coverage (% of item impressions which are new items)
+1.2% cold start velocity (% of new items that were able to hit some threshold of views)
Example:
- trainable, multimodal, semantic IDs @ Kuaishou
- Short videos platform
- Problem: help users discover new items faster

Filtering Bad job recommendations at Indeed

Problem: poor user experience of email job recommendations and lost trust due to low quality job recommendations Solution: Lightweight classifier trained from GPT-4o annotated data to filter bad recs

Process:

Start with evals - 250 labelled examples with confidence labels
Started with open LLMs like Llama2, but performance was very bad
Used GPT-4, which was very accurate but too slow and costly (22 secs)
Used GPT-3.5, but had poor precision (0.63) on job recommendations
Finetuned GPT-3.5 and got 0.9 precision at 1/4th of of GPT-4 cost and latency
Distilled lightweight classifier on finetuned GPT-3.5 labels

Lightweight classifier was 0.86 auc-roc, with latency <200ms. Result was:

-18% bad recommendations
- Expected lower application rates because recommending fewer items
unsubscribed rate -5%
application rate +5%

Enriching exploratory search queries @ Spotify

Problem: Help users search for new items (podcasts, audiobooks) in a catalogue of known items (e.g. songs, artists)

How to solve cold start issue for new categories?
Exploratory search was essential to expand beyond music

Solution: Query recommendation system

Start creating queries from new items (e.g. podcast title, author etc.) and ask LLM to rewrite as natural language query

Unified Ranker for Search & Recsys @ Netflix

Joint Modeling of Search and Recommendations Via an Unified Contextual Recommender (UniCoRn)

Example of Stripe building a transformer based foundation model from sequence of transactions to identify fraud.

Problem: teams deal with complexity from bespoke models for search, similar item recs, pre-query recs

High operational cost and missed transfer learning opportunities

Unified Contextual Ranker (UniCoRn) takes in a unified input schema and returns a prediction. Unified inputs:

User ID
item ID
Search Query
Task

Some clever tricks to reframe item to item recommendations as search, by using last item title as query.

Unified model used for search, pre-query filtering, video to video recs and more. Able to match or exceed previous task based models. Can iterate much faster with a unified model.

Unified Embeddings @ Etsy

Problem: How to help users get better results with highly specific or broad queries, on ever-changing inventory.

Query mother's day gift does not match product vocabulary
Lexical retrieval does not account for user preferences

Solution: Unified embedding and retrieval model

Two tower architecture for user and product side
Add a quality vector on the product side (rating, freshness, conversion rate) concatenated to the product vector
Add a constant vector on the user side just to make dimensions match