Git Product home page Git Product logo

rag-driver's Introduction

RAG-Driver

The original repository is here; Thank YuanJianhao508 for doing such amazing work.

This repository only focus on implementing the program on cluster: CSC

Project Structure

Installation

NOTE: The virtual environment require load 2 modules: tykky and cuda.

git clone https://github.com/timbrist/RAG-Driver.git
cd RAG-Driver # we will continue use this folder as WORKSPACE, This will be the only change directory of all.
export CW_DEBUG_KEEP_FILES=${pwd}
bash rag_env/create_rag_env.sh

After the installation, we need to add additional package. The reason to do it sperately is because tykky can not install pip install flash-attn --no-build-isolation and can not use extra parameter to create new environment.

Please check the create_rag_env.sh file to make sure the version of cuda is above 11.7

module spider cuda #use this to check which cuda your system support, use cuda/11.7+ please.
module cuda 
conda-containerize update --post-install rag_env/restpackages.sh ./rag_env

Data Preparation

Download checkpoint models

This step is for people who cannot use git lfs to download files from hugginface. This script will automatically download the checkpoint models.

NOTE: The checkpoint models will take up at least: 46GB. If you don't have much space in current directory, please change MODELS_DIR in download.sh to desired diretory: export MODELS_DIR=path/to/models. In this case, you will have to specify your model path in scripts/finetune.sh too.

bash models/download.sh

Download processed BDD-X dataset

Processed Version of BDD-X is available from here

If you want to download the dataset manually, Please unzip the file into video_process folder

Then run the following command:

bash video_process/download_bdd.sh

Usage

Cannot find a way to automate the process. A lot of things need to config in this step.

NOTE: Please remember to change:
MODELS_DIR in scripts/finetune.sh CACHESPACE in run_rag.sh and test_rag.sh --account=<project> specify your project name such as project_2010795

the script is follow the exmaple on Puhti

Testing

We will testing if everything will be ok before we submitted to expensive GPU Cluster.

sbatch ./slurm_jobs/test_rag.sh
tail -f slurm-*

Finetuning

sbatch ./slurm_jobs/run_rag.sh
tail -f slurm-*

Wait about 12 hours

rag-driver's People

Contributors

timbrist avatar yuanjianhao508 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.