Guo 2017 - DeepFM

Guo 2017 is an important innovation for recommender systems. The task tackled in this paper is to predict clickthrough rate (CTR) of recommended items.

The paper argues that it is important to learn implicit interaction signals between user and item features. Suppose we define the of a signal as the number of features involved in deriving the signal. Then:

Recency of an item is an order-1 signal
Item category and timestamp is an order-2 signal, since e.g. demand for food items spikes at meal times
Gender, game category and user age is an order-3 signal, since e.g. male teenagers prefer shooting games

One can see that both low and high order signals are important for modelling CTR, and manual feature engineering is cumbersome to derive all such interaction rules. Hence we would like to have a deep learning model that can model such signals directly from raw user / item features and interaction data. This paper proposes to use Factorization Machines to model low order feature interactions and deep neural network to model high order feature interactions.

Setup

Suppose we have a dataset comprising n instances of tuples $(χ, y)$ , where $χ$ is an m-fields raw feature record where each field could be categorical or numerical. Each categorical field is represented as a vector of one-hot encoding, and each continuous field is represented as-is. Let $x_{f i e l d_{j}}$ denote the vector representation of field j (where dimension of numericals is 1 and dimension of categoricals is the number of categories), and $x$ denote the flattened vector of the $x_{f i e l d_{j}}$ laid out horizontally.

DeepFM

DeepFM comprises two distinct models: the FM component and the deep component which are simply summed together: $\overset{y}{^} = σ (y_{FM} + y_{d ee p})$ . We go into each component below.

The FM component is a factorization machine that captures order-1 and order-2 interactions between the raw features. We first project each feature field $x_{f i e l d_{j}}$ to a k dimensional latent vector using a learned $d i m_{f i e l d_{j}} \times k$ embedding matrix (in the paper, the authors set k=10). The latent vector representation of field j is denoted as $V_{j} \in R^{k}$ . We compute the FM output as follows:

$y_{FM} = ⟨ w, x ⟩ + i = 1 \sum m j = i + 1 \sum m ⟨ V_{i}, V_{j} ⟩$

The first term represents the order-1 representation of the features as-is. The second term is a pairwise inner product between the embedding representations of each feature field, which represents order-2 interactions between the features.

The deep component tries to capture higher-order interactions between feature fields. This is done simply by laying out the embedding vectors $V_{j}$ horizontally, such that we form a flat input vector of size k \times m, and we call this $a^{(0)}$ . This fixed size vector is then fed into a multi-layer perceptron (or dense neural network) to finally return a sigmoid output. The standard forward equation for layer $l$ of the MLP is denoted below. Note that the embedding layer is shared between the deep and FM networks, allowing the deep component to benefit from the faster learning of the FM component.

$a^{(l + 1)} = σ (W^{(l)} a^{(l)} + b^{(l)}) y_{d ee p} = σ (W^{(H + 1)} a^{(H)} + b^{(H)})$

In the paper, the MLP is of the form 400->400->400. Dropout is set to 0.5.

Implementation

Huawei has an implementation of DeepFM (amongst other models) in pytorch.

Keyboard shortcuts

Chux's Notebook

Guo 2017 - DeepFM

Setup

DeepFM

Implementation