I have a few questions about using Spotlight for an item-item problem involving gr

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Formulation and usage questions about spotlight HOT 5 CLOSED

maciejkula commented on August 21, 2024

Formulation and usage questions

from spotlight.

Comments (5)

arita37 commented on August 21, 2024 1

For negative sampling,
what about GAN ?
(IRGAN paper).

from spotlight.

lesterlitch commented on August 21, 2024

@travisbrady , this sounds like a typical use-case for implict matrix factorization i.e where a user hasn't explicitly rated an item. One solution is to create your own rating system for user item pairs, i.e. 1 for a pageview, 3 for a like, 5 for a purchase etc.

For cold start - check out lightFM, which does a nice job of factorization machines and is by the same author.

from spotlight.

maciejkula commented on August 21, 2024

@travisbrady I have discussed this and similar problems with quite a few people; I think that while a principled solution is not very difficult to implement, I haven't seen any implementations.

In Spotlight, I think the first way you suggest (weights) is the better one; unfortunately, weights are not currently correctly wired in the models. (I am working on it, but it's tied into a bigger piece of work around hybrid models.) To accomplish the same effect you could train for more epochs on those subsets of your data that should carry more weight.

As for cold start:

support for folding-in of new users is very good via the sequential models; however, these models are still pure collaborative filtering
support for using user metadata is not currently implemented, but I am working on them in https://github.com/maciejkula/spotlight/tree/hybrid_models

I think you could get that latency quite easily, depending on the complexity of the model. The best course of action is to try to get some timings offline and extrapolate from there.

from spotlight.

MLnick commented on August 21, 2024

For implicit feedback data I've successfully used a similar type of weighting system to that mentioned by @lesterlitch (but in the context of ALS for matrix factorization).

Technically there the weighting is c_ui = 1 + α * r_ui, where c_ui is the confidence and r_ui is the "rating" (the 1, 3, or 5 in this case). α is a hyperparameter.

It would be interesting to evaluate a few options for weighting schemes, e.g. (1) simply weighting the interactions as 1, 3, 5 (or whatever the values are), (2) some confidence-based weighting formula like above, (3) weighting for ranking loss (e.g. GraphLab / Data / Turi had (has?) a factorization machine trained with ranking loss where weights could be given for the negative samples, but IIRC not independently for the positive ones. Their impl was closed source so difficult to say).

I tend to agree that feeding it in through weights is the best approach (and with different loss functions it would allow you to re-create implicit MF as per the paper with MSE, or weighted binary classification with sigmoid, or weighted ranking loss, etc).

from spotlight.

maciejkula commented on August 21, 2024

I wonder if the Turi implementation is to simulate a negative sampling scheme where sampling probability isn't uniform. (As an aside, something like this would be a valuable addition to Spotlight.)

As the project stands at the moment, your best bet is to weigh observations by repeating them in the training set. This is equivalent to weights (but less computationally efficient).

from spotlight.

Formulation and usage questions about spotlight HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent