Git Product home page Git Product logo

recenginemf's Introduction

RecEngineMF


Data

Command to Download and Extract Data

  1. Download MovieLens 20M Data

    wget --output-document=./ml-20m.zip  https://files.grouplens.org/datasets/movielens/ml-20m.zip
  2. Once the download is complete, extract the dataset

    unzip ml-20m.zip

Usage

Install Dependencies

  1. Install Python: Make sure Python is installed on your system. If not, you can download and install Python from the official Python website: https://www.python.org/downloads/

  2. Create a virtual environment:

    python -m venv myenv
  3. Activate the virtual environment

    For Windows CMD Users

    .\myenv\Scripts\Activate.bat

    For Windows Powershell Users

    .\myenv\Scripts\Activate.ps1

    For macOS/Linux Users

    source myenv/bin/activate
  4. Install the dependencies

    pip install -r requirements.txt
  5. Add wandb API key

    Sign in to https://wandb.ai and get your API key.
    Create a file secrets.json in the root directory and put your wandb API key.

    {
    	"WANDB_API_KEY": "YOUR_API_KEY"
    }

Train

python train.py --data_path DATA_PATH [--emb_size EMB_SIZE] [--random_seed RANDOM_SEED] 
                [--batch_size BATCH_SIZE] [--epochs EPOCHS] [--learning_rate LEARNING_RATE] 
                [--weight_decay WEIGHT_DECAY] [--step_size STEP_SIZE] [--gamma GAMMA] 
                [--patience PATIENCE] [--model_name MODEL_NAME] [--metrics_csv_name METRICS_CSV_NAME]
                [--silent] [--log_wandb]

Required Flag

  • --data_path: Path to the CSV file containing the ratings data.

Optional Flags

  • --emb_size: Size of the embedding for users and items. Default is 100.
  • --random_seed: Random seed for reproducibility. Default is 42.
  • --batch_size: Batch size for training. Default is 64000.
  • --epochs: Number of epochs for training. Default is 100.
  • --learning_rate: Learning rate for optimizer. Default is 0.001.
  • --weight_decay: Weight decay for optimizer. Default is 1e-5.
  • --step_size: Step size for learning rate scheduler. Default is 10.
  • --gamma: Gamma value for learning rate scheduler. Default is 0.1.
  • --patience: Patience for early stopping based on validation loss. Default is 3.
  • --model_name: Name of the trained model file to be saved. Default is 'mf_model.pth'.
  • --metrics_csv_name: Name of the CSV file to save the training metrics. Default is 'metrics.csv'.
  • --silent: Whether to hide verbose output during training.
  • --log_wandb: Whether to log metrics into weights and bias (wandb.ai).

Test

python test.py  --data_path DATA_PATH --model_path MODEL_PATH [--batch_size BATCH_SIZE] [--random_seed RANDOM_SEED]

Required Flags

  • --data_path: Path to the CSV file containing the ratings data.
  • --model_path: Path to the trained model file to be loaded for testing.

Optional Flags

  • --batch_size: Batch size for testing. Default is 64000.
  • --random_seed: Random seed for reproducibility. Default is 42.

Run Inference

python inference.py --data_path DATA_PATH --model_path MODEL_PATH --user_id USER_ID [--n_items N_ITEMS]

Required Flags

  • --data_path: Path to the CSV file containing the ratings data.
  • --model_path: Path to the trained model file to be loaded for testing.
  • --user_id: The id of the user for whom item is to be recommended.

Optional Flags

  • --n_items: The top n number of items to be recommended to the user. Default is 10.

Plot Curve

python plot.py --metrics_csv_path METRICS_CSV_PATH [--patience PATIENCE] [--file_name FILE_NAME]

Required Flags

  • --metrics_csv_path: Path to the CSV file containing the mertics data. [ CSV file with column names: 'Epoch', 'Train Loss', 'Val Loss' ]

Optional Flags

  • --patience: Patience for early stopping. Default is None.
  • --file_name: The name for saving the plot. Default is loss_curve.png.

References

recenginemf's People

Contributors

sunilgolden avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

recenginemf's Issues

Activating python virtual environment

For windows, activating the python virtual environment must be
.\myenv\Scripts\Activate.bat for cmd users
and
.\myenv\Scripts\Activate.ps1 for powershell users

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.