Comments (5)
For negative sampling,
what about GAN ?
(IRGAN paper).
from spotlight.
@travisbrady , this sounds like a typical use-case for implict matrix factorization i.e where a user hasn't explicitly rated an item. One solution is to create your own rating system for user item pairs, i.e. 1 for a pageview, 3 for a like, 5 for a purchase etc.
For cold start - check out lightFM, which does a nice job of factorization machines and is by the same author.
from spotlight.
@travisbrady I have discussed this and similar problems with quite a few people; I think that while a principled solution is not very difficult to implement, I haven't seen any implementations.
In Spotlight, I think the first way you suggest (weights) is the better one; unfortunately, weights are not currently correctly wired in the models. (I am working on it, but it's tied into a bigger piece of work around hybrid models.) To accomplish the same effect you could train for more epochs on those subsets of your data that should carry more weight.
As for cold start:
- support for folding-in of new users is very good via the sequential models; however, these models are still pure collaborative filtering
- support for using user metadata is not currently implemented, but I am working on them in https://github.com/maciejkula/spotlight/tree/hybrid_models
I think you could get that latency quite easily, depending on the complexity of the model. The best course of action is to try to get some timings offline and extrapolate from there.
from spotlight.
For implicit feedback data I've successfully used a similar type of weighting system to that mentioned by @lesterlitch (but in the context of ALS for matrix factorization).
Technically there the weighting is c_ui = 1 + α * r_ui
, where c_ui
is the confidence and r_ui
is the "rating" (the 1, 3, or 5 in this case). α
is a hyperparameter.
It would be interesting to evaluate a few options for weighting schemes, e.g. (1) simply weighting the interactions as 1, 3, 5 (or whatever the values are), (2) some confidence-based weighting formula like above, (3) weighting for ranking loss (e.g. GraphLab / Data / Turi had (has?) a factorization machine trained with ranking loss where weights could be given for the negative samples, but IIRC not independently for the positive ones. Their impl was closed source so difficult to say).
I tend to agree that feeding it in through weights
is the best approach (and with different loss functions it would allow you to re-create implicit MF as per the paper with MSE, or weighted binary classification with sigmoid, or weighted ranking loss, etc).
from spotlight.
I wonder if the Turi implementation is to simulate a negative sampling scheme where sampling probability isn't uniform. (As an aside, something like this would be a valuable addition to Spotlight.)
As the project stands at the moment, your best bet is to weigh observations by repeating them in the training set. This is equivalent to weights (but less computationally efficient).
from spotlight.
Related Issues (20)
- Spotlight not working with Python 3.7 ? HOT 5
- Does spotlight have tensorboard integration? HOT 1
- Evaluation schema
- Loss improves super slowly after first epoch in LSTM based sequential model.
- Does spotlight include 'iMF' by Hu Yifan? HOT 1
- Hyper parameter tuning HOT 1
- How to predict in sequential models ? HOT 3
- Fix simple typo: predicition -> prediction
- Fix simple typo: siginficant -> significant
- Importing data directly from a Sparse Matrix
- is sequence model only a function of item sequence?
- classification : synthetic unbalanced data generating
- generate not sequential data by spotlight.datasets.synthetic.generate_sequential
- Where to find the code for the experiments around casual convolutions? HOT 1
- why don't we need to take logarithm in pointwise_loss?
- Error in BPR loss function
- How can I train a model when adding a new user or item?
- ModuleNotFoundError: No module named "spotlight.interactions" HOT 2
- Is there guide to use models for "retrieval stage" (namely candidate generation) ?
- Cannot install from conda HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spotlight.