Guo 2017 - DeepFM
Guo 2017 is an important innovation for recommender systems. The task tackled in this paper is to predict clickthrough rate (CTR) of recommended items.
The paper argues that it is important to learn implicit interaction signals between user and item features. Suppose we define the
- Recency of an item is an
order-1
signal - Item category and timestamp is an
order-2
signal, since e.g. demand for food items spikes at meal times - Gender, game category and user age is an
order-3
signal, since e.g. male teenagers prefer shooting games
One can see that both low and high order signals are important for modelling CTR, and manual feature engineering is cumbersome to derive all such interaction rules. Hence we would like to have a deep learning model that can model such signals directly from raw user / item features and interaction data. This paper proposes to use Factorization Machines to model low order feature interactions and deep neural network to model high order feature interactions.
Setup
Suppose we have a dataset comprising n
instances of tuples , where is an m-fields raw feature record where each field could be categorical or numerical. Each categorical field is represented as a vector of one-hot encoding, and each continuous field is represented as-is. Let denote the vector representation of field j
(where dimension of numericals is 1
and dimension of categoricals is the number of categories), and denote the flattened vector of the laid out horizontally.
DeepFM
DeepFM comprises two distinct models: the FM
component and the deep
component which are simply summed together: . We go into each component below.
The FM component is a factorization machine that captures order-1 and order-2 interactions between the raw features. We first project each feature field to a k
dimensional latent vector using a learned embedding matrix (in the paper, the authors set k=10
). The latent vector representation of field j
is denoted as . We compute the FM output as follows:
The first term represents the order-1 representation of the features as-is. The second term is a pairwise inner product between the embedding representations of each feature field, which represents order-2 interactions between the features.
The deep component tries to capture higher-order interactions between feature fields. This is done simply by laying out the embedding vectors horizontally, such that we form a flat input vector of size k \times m
, and we call this . This fixed size vector is then fed into a multi-layer perceptron (or dense neural network) to finally return a sigmoid output. The standard forward equation for layer of the MLP is denoted below. Note that the embedding layer is shared between the deep and FM networks, allowing the deep component to benefit from the faster learning of the FM component.
In the paper, the MLP is of the form 400
->400
->400
. Dropout is set to 0.5
.
Implementation
Huawei has an implementation of DeepFM (amongst other models) in pytorch.