Git Product home page Git Product logo

mmssl's Introduction

MMSSL: Multi-Modal Self-Supervised Learning for Recommendation

PyTorch implementation for WWW 2023 paper Multi-Modal Self-Supervised Learning for Recommendation.

MMSSL

MMSSL is a new multimedia recommender system which integrates the generative modality-aware collaborative self-augmentation and the contrastive cross-modality dependency encoding. It achieves better performance than existing SOTA multi-model recommenders.

Dependencies

Usage

Start training and inference as:

cd MMSSL
python ./main.py --dataset {DATASET}

Supported datasets: Amazon-Baby, Amazon-Sports, Tiktok, Allrecipes

Datasets

├─ MMSSL/ 
    ├── data/
      ├── tiktok/
      ...
Dataset Amazon Tiktok Allrecipes
Modality V T V T V A T V T
Embed Dim 4096 1024 4096 1024 128 128 768 2048 20
User 35598 19445 9319 19805
Item 18357 7050 6710 10067
Interactions 256308 139110 59541 58922
Sparsity 99.961% 99.899% 99.904% 99.970%
  • 2024.3.20 baselines LLATTICE and MICRO uploaded: 📢📢📢📢🌹🔥🔥🚀🚀 Because baselines LATTICE and MICRO require some minor modifications, we provide code that can be easily run by simply modifying the dataset path.

  • 2023.11.1 new multi-modal datastes uploaded: 📢📢🔥🔥🌹🌹🌹🌹 We provide new multi-modal datasets Netflix and MovieLens (i.e., CF training data, multi-modal data including item text and posters) of new multi-modal work LLMRec on Google Drive. 🌹We hope to contribute to our community and facilitate your research~

  • 2023.3.23 update(all datasets uploaded): We provide the processed data at Google Drive.

  • 2023.3.24 update: The official website of the Tiktok dataset has been closed. Thus, we also provide many other versions of preprocessed Tiktok. We spent a lot of time pre-processing this dataset, so if you want to use our preprocessed Tiktok in your work please cite.

🚀🚀 The provided dataset is compatible with multi-modal recommender models such as MMSSL, LATTICE, and MICRO and requires no additional data preprocessing, including (1) basic user-item interactions and (2) multi-modal features.

Multi-modal Datasets

🌹🌹 Please cite our paper if you use the 'netflix' dataset~ ❤️

We collected a multi-modal dataset using the original Netflix Prize Data released on the Kaggle website. The data format is directly compatible with state-of-the-art multi-modal recommendation models like LLMRec, MMSSL, LATTICE, MICRO, and others, without requiring any additional data preprocessing.

Textual Modality: We have released the item information curated from the original dataset in the "item_attribute.csv" file. Additionally, we have incorporated textual information enhanced by LLM into the "augmented_item_attribute_agg.csv" file. (The following three images represent (1) information about Netflix as described on the Kaggle website, (2) textual information from the original Netflix Prize Data, and (3) textual information augmented by LLMs.)

Image 1 Image 2 Image 2

Visual Modality: We have released the visual information obtained from web crawling in the "Netflix_Posters" folder. (The following image displays the poster acquired by web crawling using item information from the Netflix Prize Data.)

Image 1

Original Multi-modal Datasets & Augmented Datasets

Image 1

Download the Netflix dataset.

🚀🚀 We provide the processed data (i.e., CF training data & basic user-item interactions, original multi-modal data including images and text of items, encoded visual/textual features and LLM-augmented text/embeddings). 🌹 We hope to contribute to our community and facilitate your research 🚀🚀 ~

Encoding the Multi-modal Content.

We use CLIP-ViT and Sentence-BERT separately as encoders for visual side information and textual side information.

Citing

If you find this work helpful to your research, please kindly consider citing our paper.

@inproceedings{wei2023multi,
  title={Multi-Modal Self-Supervised Learning for Recommendation},
  author={Wei, Wei and Huang, Chao and Xia, Lianghao and Zhang, Chuxu},
  booktitle={Proceedings of the ACM Web Conference 2023},
  pages={790--800},
  year={2023}
}

Acknowledgement

The structure of this code is largely based on LATTICE, MICRO. Thank them for their work.

mmssl's People

Contributors

hkuds avatar weiwei1206 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

mmssl's Issues

Raw dataset about Tiktok

Thanks for sharing the code for your great work.
I've observed that you have provided the pre-processed dataset about Tiktok, which seems different with the one used in DualGNN.

Recall@20, MMSSL: 0.0921 < Recall@10 DualGNN: 0.1318

As this dataset is used in a messy manner, may you also provide the raw dataset about TikTok and how you pre-proceed raw dataset into multimodal features? You efforts are great appreciated. Thanks.

About the baselines for MMSSL.

Thank you very much for your team's excellent work;

There are some confusion about the baselines of this paper. Is the SGL, LightGCN covered in the paper implemented using https://github.com/HKUDS/SSLRec?

When I ran the tiktok dataset with SGL in SSLRec, the final result was surprisingly good and surpassed most of the baselines. key parameters: {'keep_rate': 0.5, 'layer_num': 3, 'reg_weight': 1e-05, 'cl_weight': 1.0, ' temperature': 0.5 'embedding_size': 32, 'augmentation': 'edge_drop'}
Test set [recall@10: 0.0577 recall@20: 0.0856 ] Test set [ndcg@10: 0.0321 ndcg@20: 0.0391 ]

Very much looking forward to your reply, sincerely.

Hyperparameter settings

Hello,I recently saw your work and was very interested, but when I reproduce the paper, it is always a little worse than what was written in the original paper, do you have more sensitive hyperparameter settings or whatever? I hope you can reply to me,thank you!!

Thanks for your outstanding work!

Th study is great,and thank you very much for providing the dataset. I believe it an important contribution to recommendation study!

About dataset statistics

Hi!

Thank you for your novel work and processed datasets.
I downloaded tiktok and allrecipes from the given links and found the their dataset statistics are as follows:

tiktok: #Users: 9308; #Items: 6710;
allrecipes: #Users: 19805; #Items: 10068.
They are different from the reported statistics. Have the datasets been changed?

Thanks!

Some questions about your `Multi-Modal High-Order Connectivity` module.

An excellent paper, but I was confused by your module Multi-Modal High-Order Connectivity:
截屏2023-06-05 17 38 50
In this formula, If I have deduced correctly, $\hat{E}_u^l$ is from the output of modality-wise Dependency Modeling.
Its dimension is $m \times d$, supposing that $m$ is the nums of users. The $A \in \mathbb{R}^{n \times m}$ is the user-inter interactive matrix, and $n$ denotes the item nums. From those, it is inferred that the dimension of the output representations $\hat{E}_u^{l+1}$ is $n \times d$, which is corresponding to the representations of items but users.
So, Can help me solve my confusion? thanks.

Raw Data

Thanks for the great work!

I noticed that you have provided the processed feature. I am wondering if the raw data (such as the images, videos, and text) will be made publicly available? Thanks!

The tiktok dataset

I will appreciate your great work on multi-modal recommendation!I am trying to work on the multimodal encoding, so I just want to see it can achieve higher performance with other feature extractor. I am wondering is it possible to get access to the raw data? Thank you!

Question about no acoustic modal

Hi, Weiwei, I noticed that the performance of MMSSL on Tiktok dataset using the acoustic modal in the paper. Hoever, the code only contains the text and image modal. Is the code not yet released on TikTok dataset?

HELP!!Experimental data issue

Thank you very much for your contribution to multimodal recommendation systems!
When I try to reproduce your paper, the experimental data obtained are always worse than the data you gave in the paper.Have you made any further adjustments to the hyperparameters of your experiment?I would be grateful if you could explain in detail the method you used

Processed data about Allrecipes

Thanks for your excellent work! Can you please share the processed data of Allrecipes Dataset? I can not find it on the shared Google Drive link.

The file path of datasets

Hi, Weiwei:
I debug the codes and find that I can't reproduce the result on Tiktok because I am confused with the file path of the processed datasets on google drive.
The file path of Allrecipes is easy to find. JSON files and Mat files are in the first level directory, and there are no other files and folders. So I reproduce the results successfully.
But the other three datasets are a little confusing with many files and folders. Can you show the path of JSON files and Mat files as the Allrecipes does?

Allrecipes is easy to find
image
but others are difficult for me to find:
image

The use of validate set

Hi, I notice the datasets are split into the train, validate, and test sets, but the validate set is not used. The model that achieves the best performance on the test set is selected as the best model. I think we should select the best-performed model on validate set, and report the performance on the test set. What's your opinion?

embedding question from filter text_feat.npy and image_feat

Thanks for your wonderful contribution for embedding netflix item data.

In python, when I load your Netflix data, the text_feat.npy and image_feat.npy both represents a numpy adarray. To be more exact:

text_feat = np.load('text_feat.npy')
image_feat = np.load('image_feat.npy')

print(text_feat.shape) # -> 17366 * 768 
print(image_feat.shape) # -> 17366 * 512 

May I ask if it is true that the organization of text_feat and image_feat are by the sequence of, for each row,
item 1, [embedding 1];
item 2,[embedding 2]; # as itemid sequence
...
or
item 9733, [embedding 9733];
item 14147, [embedding 14147]; # as the sequence from item_attribute.csv
...

Thanks! I am carrying out embedding_based i2i similarity recommendation.

Raw dataset processing details

Can you detail on how you preprocess the raw data into V/T/A features, which stored in *npy. Only textual features is mentioned in your paper.

result reproduction settings

Hello, thanks for sharing the code. Could you report your the specific settings of each datasets for BEST result reproduction? Thanks.

A code error

Hi, Weiwei, there may be a small error in the codes. In main.py:
image
Line 453, the first param of the test function should be users_to_val.
Now I reproduce the results successfully. Thanks for your careful and patient answer!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.