ankane / disco Goto Github PK

View Code? Open in Web Editor NEW

526.0 12.0 12.0 163 KB

Recommendations for Ruby and Rails using collaborative filtering

License: MIT License

Ruby 100.00%

recommender-system recommendation-engine

disco's Introduction

Disco

🔥 Recommendations for Ruby and Rails using collaborative filtering

Supports user-based and item-based recommendations
Works with explicit and implicit feedback
Uses high-performance matrix factorization

Installation

Add this line to your application’s Gemfile:

gem "disco"

Getting Started

Create a recommender

recommender = Disco::Recommender.new

If users rate items directly, this is known as explicit feedback. Fit the recommender with:

recommender.fit([
  {user_id: 1, item_id: 1, rating: 5},
  {user_id: 2, item_id: 1, rating: 3}
])

IDs can be integers, strings, or any other data type

If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating.

recommender.fit([
  {user_id: 1, item_id: 1},
  {user_id: 2, item_id: 1}
])

Each user_id/item_id combination should only appear once

Get user-based recommendations - “users like you also liked”

recommender.user_recs(user_id)

Get item-based recommendations - “users who liked this item also liked”

recommender.item_recs(item_id)

Use the count option to specify the number of recommendations (default is 5)

recommender.user_recs(user_id, count: 3)

Get predicted ratings for specific users and items

recommender.predict([{user_id: 1, item_id: 2}, {user_id: 2, item_id: 4}])

Get similar users

recommender.similar_users(user_id)

Examples

MovieLens

Load the data

data = Disco.load_movielens

Create a recommender and get similar movies

recommender = Disco::Recommender.new(factors: 20)
recommender.fit(data)
recommender.item_recs("Star Wars (1977)")

Ahoy

Ahoy is a great source for implicit feedback

views = Ahoy::Event.where(name: "Viewed post").group(:user_id).group_prop(:post_id).count

data =
  views.map do |(user_id, post_id), _|
    {
      user_id: user_id,
      item_id: post_id
    }
  end

Create a recommender and get recommended posts for a user

recommender = Disco::Recommender.new
recommender.fit(data)
recommender.user_recs(current_user.id)

Storing Recommendations

Disco makes it easy to store recommendations in Rails.

rails generate disco:recommendation
rails db:migrate

For user-based recommendations, use:

class User < ApplicationRecord
  has_recommended :products
end

Change :products to match the model you’re recommending

Save recommendations

User.find_each do |user|
  recs = recommender.user_recs(user.id)
  user.update_recommended_products(recs)
end

Get recommendations

user.recommended_products

For item-based recommendations, use:

class Product < ApplicationRecord
  has_recommended :products
end

Specify multiple types of recommendations for a model with:

class User < ApplicationRecord
  has_recommended :products
  has_recommended :products_v2, class_name: "Product"
end

And use the appropriate methods:

user.update_recommended_products_v2(recs)
user.recommended_products_v2

Storing Recommenders

If you’d prefer to perform recommendations on-the-fly, store the recommender

json = recommender.to_json
File.write("recommender.json", json)

The serialized recommender includes user activity from the training data (to avoid recommending previously rated items), so be sure to protect it. You can save it to a file, database, or any other storage system, or use a tool like Trove. Also, user and item IDs should be integers or strings for this.

Load a recommender

json = File.read("recommender.json")
recommender = Disco::Recommender.load_json(json)

Alternatively, you can store only the factors and use a library like Neighbor. See the examples.

Algorithms

Disco uses high-performance matrix factorization.

For explicit feedback, it uses stochastic gradient descent
For implicit feedback, it uses coordinate descent

Specify the number of factors and epochs

Disco::Recommender.new(factors: 8, epochs: 20)

If recommendations look off, trying changing factors. The default is 8, but 3 could be good for some applications and 300 good for others.

Validation

Pass a validation set with:

recommender.fit(data, validation_set: validation_set)

Cold Start

Collaborative filtering suffers from the cold start problem. It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.

recommender.user_recs(new_user_id) # returns empty array

There are a number of ways to deal with this, but here are some common ones:

For user-based recommendations, show new users the most popular items
For item-based recommendations, make content-based recommendations with a gem like tf-idf-similarity

Get top items with:

recommender = Disco::Recommender.new(top_items: true)
recommender.fit(data)
recommender.top_items

This uses Wilson score for explicit feedback and item frequency for implicit feedback.

Data

Data can be an array of hashes

[{user_id: 1, item_id: 1, rating: 5}, {user_id: 2, item_id: 1, rating: 3}]

Or a Rover data frame

Rover.read_csv("ratings.csv")

Or a Daru data frame

Daru::DataFrame.from_csv("ratings.csv")

Performance

If you have a large number of users or items, you can use an approximate nearest neighbors library like Faiss to improve the performance of certain methods.

Add this line to your application’s Gemfile:

gem "faiss"

Speed up the user_recs method with:

recommender.optimize_user_recs

Speed up the item_recs method with:

recommender.optimize_item_recs

Speed up the similar_users method with:

recommender.optimize_similar_users

This should be called after fitting or loading the recommender.

Reference

Get ids

recommender.user_ids
recommender.item_ids

Get the global mean

recommender.global_mean

Get factors

recommender.user_factors
recommender.item_factors

Get factors for specific users and items

recommender.user_factors(user_id)
recommender.item_factors(item_id)

Credits

Thanks to:

LIBMF for providing high performance matrix factorization
Implicit for serving as an initial reference for user and item similarity
@dasch for the gem name

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

Report bugs
Fix bugs and submit pull requests
Write, clarify, or fix documentation
Suggest or add new features

To get started with development:

git clone https://github.com/ankane/disco.git
cd disco
bundle install
bundle exec rake test

disco's People

Contributors

Stargazers

Watchers

Forkers

m1lt0n stjordanis kimsuelim alexanderjeurissen cxz hacker0x01 nhsykym jcalem youssef-sobhy iq-scm zkan

disco's Issues

Can I control the output to the server log?

I want to ignore server log "[ahoy] Visit excluded".
Is there such a setting?

I cannot find function and settings.

Updating recommendations

Is there a way we can update the model after fitting it once, like

I collect the data
Perform recommender.fit(data) and build the model
Now if I get some new_data or the rating between an item and a user changes
If I perform recommender.fit(new_data), then the old user_factors and item_factors gets overwritten by the new one and the score becomes NaN for many items

Anyway to resolve that or perform incremental model updates?

Multiple factors?

On my project I can have users follow an author and like an author's post. How would I be able to integrate disco so it takes those two explicit ratings and recommend them more posts?

Attributes in different model

Hi,

This looks great, I appreciate the effort, I stumbled upon this gem while I was planning my own recommendation engine. I've got data about my products that's stored in different models, everything from the number of times the item has been viewed, how the user interacted with it, whether they purchased it, its raking etc.

Can I use this gem to fit that data? Is there a wiki or a tutorial of something like that? Is there a more comprehensive documentation for this gem somewhere?

Value param not working for Implicit feedback

I've been using this recommendation engine for implicit feedback and It seems like the value param isn't doing anything.

For example:

recommender.fit([
  {user_id: 1, item_id: 1, value: 1.0},
  {user_id: 2, item_id: 1, value: 1.0},
  {user_id: 2, item_id: 2, value: 1.0},
  {user_id: 2, item_id: 3, value: 0.0}
])

yields the same result as:

recommender.fit([
  {user_id: 1, item_id: 1, value: 1.0},
  {user_id: 2, item_id: 1, value: 1.0},
  {user_id: 2, item_id: 2, value: 1.0},
  {user_id: 2, item_id: 3, value: 1.0}
])

A higher value should indicate a stronger preference but that doesn't appear to be happening in any examples I try.

Combining multiple "values" with weights

Is it possible to use the implicit feedback recommendation system, where you specify a "value" but that value is based off of a number of different values, with weights.

It's still unclear to me how the "value" option works, in terms of the ranking. Does a higher value mean a user "likes" this item more or what's the correlation.

In my particular scenario, I'm trying to do:

Recommend product categories to users
Recommend users to users

And what I considered to be my inputs for the values are:

likes: number of times I've hearted products in this category
page_views: number of times I've viewed products in this category
purchase_count: number of times I've purchased products in this category

If I wanted to combine the values for the three parameters I noted above, with particular weights, say on the purchase_count how exactly would this be possible/what suggestions would you make?

Questions

Not an Issue but a Questions. If you have another place to discuss questions, then of course I'll move it there.

The discussion in #1 suggests that the score is normalized to 0..1?
I did a quick example and this does not seem to be the case for me. Is there something I need to consider to have normalized scores?

Is there a built in way to store/retrieve "simiilar users" for ActiveRecord? If not is this something that would be welcome or even make sense as a PR?

Thanks for the fantastic library. I was able to implement a recommendation system in no time.

Storing recommendations in Redis

Here it has been mentioned that we can store the recommender to a file, database, or any other storage system.

Can we also save the .bin file to Redis and perform recommendations?

Incremental updates

Is there a plan to add incremental updates or is there a recommended workaround for it? Thanks in advance 🙏

Scope of recommendation

I've got a use case where I would like to put a scope on the recommendation. My setup involves two separate domains that are interconnected in the backend. Here it is important to filter out any items that are not set as visible on the current domain of the request.

So:

{ item: 1, sites: [example.com, example.eu] },
{ item: 2, sites: [example.eu] },
{ item: 3, sites: [example.com] },
{ item: 4, sites: [] },

In the case that the visitor accesses example.com I want only item 1 and 3 to be considered in the recommendation and if they visit example.eu, only item 1 and 2 should be considered.

I'm currently doing a bit of a workaround in regards to filtering them after running the recommendation, but it's not ideal as I can't guarantee a good result as some of the outputs will be filtered. Are there another way to do this sorting beforehand that I've overlooked?

"Thematic content reccomendations"

So I understand how Disco can tell me, "Similar users with similar preferences liked this content"

What I'm unclear on is if Disco can tell me, "You liked this video of a guy playing guitar. Here is another video of a different guy playing the guitar".

is there an industry standard way to direct users towards similar content, without a large volume of user generated data? Is this a manual tagging process, or is there something automated that can be utilized? I kinda feel like this XKCD, and am fully prepared to learn that this will be exceptionally difficult.
Would I need to store this additional recommendation in a different way? Could I pipe this thematic linking of data into Disco?

BTW great name, I too am Andrew

Confused about ranking in relation to the rating

Hi, I am wondering if I am misunderstanding how this is supposed to work.

I have say a number of users with different interests (the "items" in my case).

User 1 likes a few of items including item X that has a rating of 5, all others have a much lower rating. User 2 also likes a few items and also likes X with rating 5, and the other items have a much lower rating. User 3 likes completely different items with rating 1.

Why does user 3 rank higher than user 2? What could explain this? Thanks

Couple of clarifications

Are there any minimum hardware requirements to implement this gem?
RIght now I'm using Ruby 2.5.x in a project (that's in production), #1 says the support for Ruby < 2.6 will be removed, will it happen in the near future?
Will the gem be maintained for a couple of years?
Are there are benchmarks or statistics on the working of the gem?

Global Item Rank

This may be a question out of ignorance, but given ranking items with explicit feedback, is there a way to get disco to generate a global rank for all the items.

So basically "top 10 ranked items" given the factors that are produced.

Do the factors produced indicate rank without specifying an id?

Im wondering if a query like this makes sense:

	items.id
FROM
	items
WHERE
	items.neighbor_vector IS NOT NULL
ORDER BY
	items.neighbor_vector <-> CUBE (ARRAY [1, 1, 1, 1, 1, 1, 1, 1])

Questions on whether this gem can help me with my problem

Hi! I can't believe how many awesome gems I am finding authored by you, you are a machine :D

I have a problem that I would like to solve with machine learning, also to learn this subject (I still know close to zero about it).

It's about recommendations. But not like "recommended products for a user", rather "recommended users for a given user".

So I have an app that manages users with their interests, and these users can have meetings between each two of them. These meetings can have different statuses like accepted, pending, declined etc.

I would like to be able to pick a user, and recommend to them other users that might be worth contacting for a meeting because they have compatible interests/intents.

each user can have one or more interests such as "Investments", "Jobs", etc and intents such as "Looking for..." or "Offering". So for example if a user is looking for an investment and another one is offering an investment. The two users have "compatible" interests
I would consider for the training each group of { target interest, target intent, user, meetings ranking } where
- target interest and intent are those of the user I want the recommendations for (so if I want recommendations for User X, I would do this for each of their interest/intent )
- user would be any other user in the database (I would filter those with compatible interests and intents to the target ones)
- ranking would be a number calculated depending on the statuses of meetings arranged by the user in the group with users having the target interests and intents. So for a group say that I find 10 meetings arranged with people with the target interest/intent, and let's say that I assign 10 points for accepted meetings, 8 for rescheduled, 6 for pending and 0 for cancelled or something like that. The ranking would be the sum of the points for all the user's meetings.
the recommendation would then include the users with the highest ranking for the target interest/intent

Would this problem be solvable with this gem? If yes, would you mind giving me pointers on how exactly?

Thanks a lot in advance for any help. I am new to this so I am still wrapping my head around with the many possibilities.

Stored recommendations and similar users

Hi! I am using this awesome gem for recommending people to other people with same interests and it's working great. Due to performance reasons, I am caching the recommender in Redis but I was wondering if I should store recommendations instead.

If I store them in the database table that Disco can create in Postgres, how do I get the similar users given those recommendations? At the moment I am calling similar_users on the recommender.

Thanks!

Disco v0.2.5 item_recs does not return related items

Hello,

I am trying to upgrade to v0.2.5, but doing so breaks some tests that were originally passing in the previous version. Here is a simplified version of what I'm doing:

data = [{:user_id=>952, :item_id=>2057}, {:user_id=>952, :item_id=>2060}, {:user_id=>953, :item_id=>2063}]
recommender = Disco::Recommender.new(factors: 50)
recommender.fit(data)

As you can see, user 952 has items 2057 and 2060. So if I pass in item 2057, I would expect the recommender to return 2060.

recommender.item_recs(2057).pluck(:item_id)

But this returns [2057, 2063]. It returns itself and the non-related item. Am I doing something wrong here?

How to validate results?

Background

Our website is a typical e-commerce site with around 700 orders each day. To boost our sales, we are trying to implement "people who bought this also bought XYZ".

Using Disco

Our current implementation is something like:

data = []

orders.each do |o|
  o.items.each do |i|
    # we actually also put item quantity into consideration. E.g. if user purchases 3 item A, we add 3 entries into the data array
    data << {user_id: current_user.id, item_id: i.id}
  end
end

recommender.fit(data)

recommendations = recommender.item_recs(current_item, count: 10)

However, the results seem way off. Changing factors to 500 makes it better but still the results don't make a lot of sense.

Using Predictor

We also tried Predictor. The results look much better and seem to be logical to us. But to be honest, we don't know how exactly can we validate.

Validation

Please excuse my ignorance, but the only way I can think of is to count the item occurrences.

For example, if we want to find recommendations for item A:

Get orders that have item A
Get items from each order
Count their occurrences
Sort by occurrences

We then have a list of the most popular items sold along with item A:

Item B (sold 100 times)
Item C (sold 97 times)
Item D (sold 90 times)
...

This list does make sense to us in terms of people's purchase preferences.

Questions

A couple of questions regarding the process:

Is the validation correct?
If yes, then why are the results from Disco so off?
If we can get recommendations by simply counting occurrences, why do we need a recommendation engine in the first place?

Thank you. And apology for the super long issue.

Ideas

Please create a new issue to discuss any ideas or share your own.

0.5.0

Drop support for deprecated marshal serialization (use JSON instead)

Ideas

Use this approach for top_items
Add option for index type to optimize_* methods (IVFFlat, HNSW)
Add docs on how to create a good validation set and metrics

user_recs returns recommendations that a user already knows

I want to use disco to compare users that each have a list of services. For example:

User A has Services [1,2,3,4]
User B has Services [3,4,5,6]

If I put this data into the recommender and try to get the user recs for A, I would expect it to return only services [5, 6] because User A already knows about [3, 4]. But it instead returns some services that the user knows about as well as some new services. So it's half working the way I would expect.

Does your algorithm ignore what a user already knows about according to the data given to recommender.fit?

Getting similar users based on item

~~Hi, is it possible to store similar users after training and fetching them using the recommender.similar_users(:user_id) method similar to storing recommended items/products?~~
Hi again, what I asked above was not really my issue but it would help in getting similar users when searching through recommended items with an item_id. Do you have any suggestion on how to achieve that?
What I have in mind is the following:

training_data = ...
recommender.fit(training_data)
recs = recommender.item_recs(item_id)
# the following part is missing
similar_users = ??

I'm looking for something like:

similar_users = recommender.similar_users_for_item(:user_id, :item_id)

that would combine item_recs and similar_users and return only user_ids

Fit on stored Recommender

Does a fit with new data on already fitted and stored recommender add to the model or completely removes the existing training and "starts from scratch" with the new data.

Building a Hybrid Approach

Hey, thanks for this library! I'm trying to work out how I could go about building a hybrid recommendation system that combines collaborative filtering, content based filtering and also knowledge based recommendations rather than one of them exclusively.
Do you have any suggestions how to go about this, or where to start? I've read through the README and you suggested using the https://github.com/jpmckinney/tf-idf-similarity gem for content-based recommendations to remedy the cold-start issue; but how would this look like in a real-world example as a hybrid system?

Ability to explicitly set range (error: comparison of float with 0 failed)

I just want to start by saying that I love this gem. Thanks, @ankane, for all of your work on it.

Recently, I've been building a recommendation system for a small project that I'm working on. I set top_items to true so that my top items can be more accurate, instead of just sorting by average review. Right now, just for testing, I have a small number of items (n < 10) with a review range of 1 to 5. I went in and manually added reviews to each of the items for a couple of different test users. After the first three reviews (one item with a 4 and 5 star and another with a 5 star), I kept getting the error Comparison of float with 0 failed when calling top_items. I fiddled around with more reviews by giving another item a review. I changed the rating for the review to a couple of different numbers, just to see if that would fix anything. Once I changed the rating to 4 or less, the error went away. From my inspection of disco's code and a few guesses, I'm assuming that there are requirements for the range of reviews that are not met by only having two possible numbers because the min_rating and max_rating are inferred based on the min and max of the training set. To fix this, I believe that it could be beneficial to be able to set min_rating and max_rating on the initialization of Disco::Recommender. This would help projects that are just kicking off not to run into unexpected errors. Would this be possible? Thank you!

Support for STI

Hey @ankane, would you be open for supporting STI? because I can't find a way to get recommended_items if I have a model inheriting from another model, example AdminUser inheriting from User in this case the subject_type will be AdminUser but it will query User when using the recommended_items function. Below you can find the change I made to make it work for this use case, I can open a PR if you're open to this change.

Then I used it like this

has_recommended :items, subject_type: 'User'

Ideas

0.3.0

(breaking) Change item_id to user_id for similar_users
(breaking) Change warning to an error when value passed
(breaking, maybe) Make Faiss the default library (and possibly remove support for NGT)
Remove wilson_score dependency for top_items (slightly different calculation, just needs uncommented)
Drop support for Ruby < 2.6

ankane / disco Goto Github PK

disco's Introduction

Disco

Installation

Getting Started

Examples

MovieLens

Ahoy

Storing Recommendations

Storing Recommenders

Algorithms

Validation

Cold Start

Data

Performance

Reference

Credits

History

Contributing

disco's People

Contributors

Stargazers

Watchers

Forkers

disco's Issues

Background

Using Disco

Using Predictor

Validation

Questions

Recommend Projects

Recommend Topics

Recommend Org