learning to rank uplift modelling choice modelling matching market
This is the second post about the ranking problem in Profi, a services marketplace where customers can find tutors, beauty masters, plumbers, and other professionals, and professionals can find customers.
Here is the short recap of the previous post:
In the second post I'll explain the solution. This post is more mathematical. Please bear with me while I go through some details of the model.
First, let's express the uplift in the probability of a deal in a mathematical form:
$$P(Y_i = 1 \mid S_{ij} = 1) - P(Y_i = 1 \mid S_{ij} = 0)$$
It seems to be impossible to solve this problem directly. Hundreds of professionals should see the order for a deal to happen. Estimation of the effect of showing an order to a professional would be very noisy.
Let's decompose the problem by expanding the expression using the law of total probability:
$$P(T_{ij} = 1 \mid S_{ij} = 1)(P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0))$$
Let's expand the left and the right part of the uplift expression:
$$ \small \begin{split} P(Y_i = 1 \mid S_{ij} = 1) & = P(Y_i = 1 \mid T_{ij} = 1)P(T_{ij} = 1 \mid S_{ij} = 1) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0)P(T_{ij} = 0 \mid S_{ij} = 1) \\ & = P(Y_i = 1 \mid T_{ij} = 1)P(T_{ij} = 1 \mid S_{ij} = 1) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0)(1 - P(T_{ij} = 1 \mid S_{ij} = 1)) \\ & = P(T_{ij} = 1 \mid S_{ij} = 1)(P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0)) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0) \\ \\ P(Y_i = 1 \mid S_{ij} = 0) & = P(Y_i = 1 \mid T_{ij} = 1)P(T_{ij} = 1 \mid S_{ij} = 0) \\ & \quad + P(Y_i = 1 \mid T_{ij} = 0)P(T_{ij} = 0 \mid S_{ij} = 0) \\ & = P(Y_i = 1 \mid T_{ij} = 0) \\ \end{split} $$
The last equation takes into account that a professional cannot send a proposal without seeing the order, i.e.:
$$ \small \begin{split} & P(T_{ij} = 1 \mid S_{ij} = 0) = 0 \\ & P(T_{ij} = 0 \mid S_{ij} = 0) = 1 \end{split} $$
Now, we can derive that:
$$ \small \begin{split} & P(Y_i = 1 \mid S_{ij} = 1) - P(Y_i = 1 \mid S_{ij} = 0) \\ = P(T_{ij} = 1 \mid S_{ij} = 1)(& P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0)) \end{split} $$
We already have a model that predicts $P(T_{ij} = 1 \mid S_{ij} = 1)$, the probability that the professional $j$ will send a proposal to the customer $i$.
So, now the problem reduces to predicting the uplift in the probability of a deal caused by a proposal from the professional:
$$P(Y_i = 1 \mid T_{ij} = 1) - P(Y_i = 1 \mid T_{ij} = 0)$$
To solve the problem, we'll create a model for the probability of a deal in the order. The difference between probabilities of a deal with and without a proposal from the professional is the uplift we want to predict.
A customer can choose one of the professionals who sent a proposal, or not choose anyone.
Let's model customer's choice with the softmax function:
$$P(Y_{ik}=1) = \frac{e^{U_{ik}}}{e^{U_{i0}} + \sum_{j \in J_i} e^{U_{ij}}} = \frac{e^{U_{ik}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}}$$
This model assumes no correlation among alternatives. This might not be 100% true. But we decided to go further with this model:
Now, we can express the probability that the customer will choose one of the professionals who sent a proposal:
$$P(Y_i=1) = \sum_{j \in J_i} P(Y_{ij}=1) = \frac{\sum_{j \in J_i} e^{U_{ij}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}}$$
Suppose, the customer $i$ has received proposals from a set of professionals $J_i$. Then the uplift in the probability of a deal caused by a proposal from the professional $k$ will be:
$$ \begin{split} & P(Y_i = 1 \mid T_{ik} = 1) - P(Y_i = 1 \mid T_{ik} = 0) \\ & = \frac{\sum_{j \in J_i} e^{U_{ij}-U_{i0}} + e^{U_{ik}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}} + e^{U_{ik}-U_{i0}}} - \frac{\sum_{j \in J_i} e^{U_{ij}-U_{i0}}}{1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}} \\ & = \frac{e^{U_{ik}-U_{i0}}}{(1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}})(1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}} + e^{U_{ik}-U_{i0}})} \end{split} $$
If we can model the utility, we can estimate the uplift in the probability of a deal.
The utility a customer obtains from choosing a professional depends on:
If we represent the utility as a function of these variables, then the parameters of the function can be estimated by minimizing the cross-entropy (maximum likelihood estimation):
$$ \begin{split} H(Y, P(Y)) & = -\sum_i (Y_i \log P(Y_i = 1) + (1 - Y_i) \log P(Y_i = 0)) \\ & = \sum_i (\log (1 + \sum_{j \in J_i} e^{U_{ij}-U_{i0}}) - Y_i \log (\sum_{j \in J_i} e^{U_{ij}-U_{i0}})) \end{split} $$
We started with a basic model that accounts only for how much time has passed from the moment of the order to the proposal from the professional. And this basic model helped us significantly increase the number of deals.
Let's review the components of the successful solution:
This solution has its disadvantages:
These disadvantages can be resolved by training a model on a combined target. Instead of training a model $f(X_1)$ on an indicator of a proposal $y$, we can train a model $h(X_1, X_2)$ on a combined target $y g(X_2)$. This solution resulted in a further increase in the number of deals.
In two posts I explored the ranking problem in a services marketplace. It's a two-sided matching platform where both sides have to choose each other for a deal to happen. I explained the solution that significantly increased the number of deals in the marketplace.
The right problem formulation is more important than a sophisticated solution. When one side searches and proposes, and other chooses among the proposals, it's not sufficient to maximize the probability of a proposal. Ranking by the uplift in the probability of a deal increased the number of deals in the marketplace.
When two sides choose, it makes sense to model two processes separately:
A discrete choice model can be used to estimate the uplift in the probability of a match.
If one of the models should predict the probability measure, it limits the model improvements. Training a single model on a combined target can resolve this issue.
© Evgeny Ivanov 2023