Ranking in a two-sided matching platform (Part 2)

Aug 28, 2023 learning to rank uplift modelling choice modelling matching market

Recap #

This is the second post about the ranking problem in Profi, a services marketplace where customers can find tutors, beauty masters, plumbers, and other professionals, and professionals can find customers.

Here is the short recap of the previous post:

In Profi, professionals search, choose orders, and send proposals to customers. And then the customers review the proposals and choose the professionals, or don't choose anyone. If a professional sends a proposal to a customer, and the customer chooses the professional then this is a deal.
The objective was to increase the number of deals by implementing a new order ranking algorithm in the app for professionals.
Ranking orders by the probability of a proposal did increase the number of proposals, but didn't increase the number of deals in the marketplace.
Ranking orders by the uplift in the probability of a deal significantly increased the number of deals in the marketplace.

In the second post I'll explain the solution. This post is more mathematical. Please bear with me while I go through some details of the model.

Mathematical formulation #

First, let's express the uplift in the probability of a deal in a mathematical form:

$$P(Y_i = 1 \mid S_{ij} = 1) - P(Y_i = 1 \mid S_{ij} = 0)$$

$Y_i$ is an outcome for the order $i$. If there is a deal, then $Y_i = 1$, otherwise $Y_i = 0$.
$S_{ij}$ indicates whether the professional $j$ has seen the order $i$ or not.
$P(Y_i = 1 \mid S_{ij} = 1)$ is the probability of a deal in the order $i$, given that the professional $j$ has seen the order.
$P(Y_i = 1 \mid S_{ij} = 0)$ is the probability of a deal in the order $i$, given that the professional $j$ hasn't seen the order.

It seems to be impossible to solve this problem directly. Hundreds of professionals should see the order for a deal to happen. Estimation of the effect of showing an order to a professional would be very noisy.

Let's decompose the problem by expanding the expression using the law of total probability:

$$P(T_{ij} = 1 \mid S_{ij} = 1)(P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0))$$

$T_{ij}$ indicates whether the professional $j$ has sent a proposal to the customer $i$ or not.
$P(T_{ij} = 1 \mid S_{ij} = 1)$ is the probability that the professional $j$ will send a proposal to the customer $i$, given that the professional has seen the order.
$P(Y_i = 1 \mid T_{ij} = 1)$ is the probability of a deal in the order $i$, given that the professional $j$ has sent a proposal to the customer.
$P(Y_i = 1 \mid T_{ij} = 0)$ is the probability of a deal in the order $i$, given that the professional $j$ hasn't sent a proposal to the customer.

Derivation

Let's expand the left and the right part of the uplift expression:

$$ \small \begin{split} P(Y_i = 1 \mid S_{ij} = 1) & = P(Y_i = 1 \mid T_{ij} = 1)P(T_{ij} = 1 \mid S_{ij} = 1) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0)P(T_{ij} = 0 \mid S_{ij} = 1) \\ & = P(Y_i = 1 \mid T_{ij} = 1)P(T_{ij} = 1 \mid S_{ij} = 1) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0)(1 - P(T_{ij} = 1 \mid S_{ij} = 1)) \\ & = P(T_{ij} = 1 \mid S_{ij} = 1)(P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0)) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0) \\ \\ P(Y_i = 1 \mid S_{ij} = 0) & = P(Y_i = 1 \mid T_{ij} = 1)P(T_{ij} = 1 \mid S_{ij} = 0) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0)P(T_{ij} = 0 \mid S_{ij} = 0) \\ & = P(Y_i = 1 \mid T_{ij} = 0) \\ \end{split} $$

The last equation takes into account that a professional cannot send a proposal without seeing the order, i.e.:

$$ \small \begin{split} & P(T_{ij} = 1 \mid S_{ij} = 0) = 0 \\ & P(T_{ij} = 0 \mid S_{ij} = 0) = 1 \end{split} $$

Now, we can derive that:

$$ \small \begin{split} & P(Y_i = 1 \mid S_{ij} = 1) - P(Y_i = 1 \mid S_{ij} = 0) \\ = P(T_{ij} = 1 \mid S_{ij} = 1)(& P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0)) \end{split} $$

We already have a model that predicts $P(T_{ij} = 1 \mid S_{ij} = 1)$, the probability that the professional $j$ will send a proposal to the customer $i$.

So, now the problem reduces to predicting the uplift in the probability of a deal caused by a proposal from the professional:

$$P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0)$$

To solve the problem, we'll create a model for the probability of a deal in the order. The difference between probabilities of a deal with and without a proposal from the professional is the uplift we want to predict.

Discrete choice model #

A customer can choose one of the professionals who sent a proposal, or not choose anyone.

Let's model customer's choice with the softmax function:

$$P(Y_{ik}=1) = \frac{e^{U_{ik}}}{e^{U_{i0}} + \sum_{j \in J_i} e^{U_{ij}}} = \frac{e^{U_{ik}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}}$$

$Y_{ik}$ indicates whether the customer $i$ chooses to the professional $k$ or not.
$P(Y_{ik}=1)$ is the probability that the customer $i$ will choose the professional $k$.
$U_{ij}$ (or $U_{ik}$) is the utility the customer $i$ obtains from choosing the professional $j$ (or $k$).
$U_{i0}$ is the utility the customer $i$ obtains from not choosing anyone.
$J_i$ is a set of professionals who sent a proposal to the customer $i$.

This model assumes no correlation among alternatives. This might not be 100% true. But we decided to go further with this model:

Given that customers and professionals have unique preferences that only they can discover, this assumption seems realistic.
Taking into account the sequential number of the proposal in the order can capture correlations between proposals.

Now, we can express the probability that the customer will choose one of the professionals who sent a proposal:

$$P(Y_i=1) = \sum_{j \in J_i} P(Y_{ij}=1) = \frac{\sum_{j \in J_i} e^{U_{ij}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}}$$

Suppose, the customer $i$ has received proposals from a set of professionals $J_i$. Then the uplift in the probability of a deal caused by a proposal from the professional $k$ will be:

$$ \begin{split} & P(Y_i = 1 \mid T_{ik} = 1) - P(Y_i = 1 \mid T_{ik} = 0) \\ & = \frac{\sum_{j \in J_i} e^{U_{ij}-U_{i0}} + e^{U_{ik}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}} + e^{U_{ik}-U_{i0}}} - \frac{\sum_{j \in J_i} e^{U_{ij}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}} \\ & = \frac{e^{U_{ik}-U_{i0}}}{(1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}})(1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}} + e^{U_{ik}-U_{i0}})} \end{split} $$

If we can model the utility, we can estimate the uplift in the probability of a deal.

Utility function #

The utility a customer obtains from choosing a professional depends on:

How much time has passed from the moment of the order to the proposal from the professional.
The rating and the number of reviews in the professional's profile.
When the customer was last online.
And other factors.

If we represent the utility as a function of these variables, then the parameters of the function can be estimated by minimizing the cross-entropy (maximum likelihood estimation):

$$ \begin{split} H(Y, P(Y)) & = -\sum_i (Y_i \log P(Y_i = 1) + (1 - Y_i) \log P(Y_i = 0)) \\ & = \sum_i (\log (1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}) - Y_i \log (\sum_{j \in J_i} e^{U_{ij}-U_{i0}})) \end{split} $$

We started with a basic model that accounts only for how much time has passed from the moment of the order to the proposal from the professional. And this basic model helped us significantly increase the number of deals.

Combined target #

Let's review the components of the successful solution:

A model $f(X_1)$ that takes a feature set $X_1$ and predicts the probability that the professional will send a proposal to the customer.
A model $g(X_2)$ that takes a feature set $X_2$ and predicts the uplift in the probability of a deal in the order caused by a proposal from the professional.
Orders are are sorted by the product of these models' predictions $f(X_1) g(X_2)$.

This solution has its disadvantages:

Using two models in production increases the complexity of the solution.
The model $f(X_1)$ should predict the probability. This limits how the model can be improved. For example, the loss function is the cross-entropy. We cannot use loss functions that are more suitable for the learning to rank problem.

These disadvantages can be resolved by training a model on a combined target. Instead of training a model $f(X_1)$ on an indicator of a proposal $y$, we can train a model $h(X_1, X_2)$ on a combined target $y g(X_2)$. This solution resulted in a further increase in the number of deals.

Conclusions #

In two posts I explored the ranking problem in a services marketplace. It's a two-sided matching platform where both sides have to choose each other for a deal to happen. I explained the solution that significantly increased the number of deals in the marketplace.

The right problem formulation is more important than a sophisticated solution. When one side searches and proposes, and other chooses among the proposals, it's not sufficient to maximize the probability of a proposal. Ranking by the uplift in the probability of a deal increased the number of deals in the marketplace.

When two sides choose, it makes sense to model two processes separately:

Probability of a proposal from the side that chooses first.
Uplift in the probability of a match caused by a proposal.

A discrete choice model can be used to estimate the uplift in the probability of a match.

If one of the models should predict the probability measure, it limits the model improvements. Training a single model on a combined target can resolve this issue.