Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Yan 2025 - LLM for Recsys

Recsys Keynote: Improving Recommendation Systems & Search in the Age of LLMs - Eugene Yan, Amazon

This talk covers future of recsys and how LLMs can be incorporated. 3 challenges:

  • Cold start challenge of hash based item IDs
  • Lack of metadata
  • Task specific models duplicate engineering, increase maintenance cost and don't benefit from transfer learning
    • Benefits: simplifies systems, reduces maintenance and transfer learning
    • But there may be alignment tax

Kuaishou Example for semantic IDs

  • Challenge: hash based item IDs don't encode item content and struggle with cold start and sparsity problem.
  • Solution: Semantic IDs based on multimodal content

Kuaishou is a short video platform. The main problem they wanted to tackle is to help users discover new items faster.

Idea:

  • Train standard ID-based embeddings for user and items
  • Create cluster ID from concatenated content embeddings
    • Text: BERT
    • Video: ResNet
    • Audio: VGGish
  • Run k-means on 100 million items into around 1k clusters
    • Each cluster gets an ID and also an embedding
    • Incorporate cluster embedding in final embedding

Result:

  • +3.4% clicks

  • +3.0% likes

  • +3.6% cold start coverage (% of item impressions which are new items)

  • +1.2% cold start velocity (% of new items that were able to hit some threshold of views)

  • Example:

    • trainable, multimodal, semantic IDs @ Kuaishou
    • Short videos platform
    • Problem: help users discover new items faster

Filtering Bad job recommendations at Indeed

Problem: poor user experience of email job recommendations and lost trust due to low quality job recommendations Solution: Lightweight classifier trained from GPT-4o annotated data to filter bad recs

Process:

  • Start with evals - 250 labelled examples with confidence labels
  • Started with open LLMs like Llama2, but performance was very bad
  • Used GPT-4, which was very accurate but too slow and costly (22 secs)
  • Used GPT-3.5, but had poor precision (0.63) on job recommendations
  • Finetuned GPT-3.5 and got 0.9 precision at 1/4th of of GPT-4 cost and latency
  • Distilled lightweight classifier on finetuned GPT-3.5 labels

Lightweight classifier was 0.86 auc-roc, with latency <200ms. Result was:

  • -18% bad recommendations
    • Expected lower application rates because recommending fewer items
  • unsubscribed rate -5%
  • application rate +5%

Enriching exploratory search queries @ Spotify

Problem: Help users search for new items (podcasts, audiobooks) in a catalogue of known items (e.g. songs, artists)

  • How to solve cold start issue for new categories?
  • Exploratory search was essential to expand beyond music

Solution: Query recommendation system

Start creating queries from new items (e.g. podcast title, author etc.) and ask LLM to rewrite as natural language query

Unified Ranker for Search & Recsys @ Netflix

Joint Modeling of Search and Recommendations Via an Unified Contextual Recommender (UniCoRn)

Example of Stripe building a transformer based foundation model from sequence of transactions to identify fraud.

Problem: teams deal with complexity from bespoke models for search, similar item recs, pre-query recs

  • High operational cost and missed transfer learning opportunities

Unified Contextual Ranker (UniCoRn) takes in a unified input schema and returns a prediction. Unified inputs:

  • User ID
  • item ID
  • Search Query
  • Task

Some clever tricks to reframe item to item recommendations as search, by using last item title as query.

Unified model used for search, pre-query filtering, video to video recs and more. Able to match or exceed previous task based models. Can iterate much faster with a unified model.

Unified Embeddings @ Etsy

Problem: How to help users get better results with highly specific or broad queries, on ever-changing inventory.

  • Query mother's day gift does not match product vocabulary
  • Lexical retrieval does not account for user preferences

Solution: Unified embedding and retrieval model

  • Two tower architecture for user and product side
  • Add a quality vector on the product side (rating, freshness, conversion rate) concatenated to the product vector
  • Add a constant vector on the user side just to make dimensions match