Light

ulamaca / 1st-place-solution-single-cell-pbs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jean-kouagou/1st-place-solution-single-cell-pbs

0.0 0.0 0.0 25 KB

This repository implements the winning solution for the single cell perturbation problems

License: MIT License

Python 100.00%

1st-place-solution-single-cell-pbs's Introduction

1st-place-solution-single-cell-pbs

This repository implements the winning solution for the single cell perturbation problems

Installation

Make sure Anaconda3 is installed and execute the following:

Clone this repository git clone https://github.com/Jean-KOUAGOU/1st-place-solution-single-cell-pbs.git
First create and activate a conda environement conda create -n single_cell_env python==3.9.0 --y && conda activate single_cell_env
Install all required packages in the environment pip install -r requirements.txt

Dependencies

python 3.9.0
pandas 2.1.3
pyarrow 14.0.1
tqdm 4.66.1
scikit-learn 1.3.2
torch 2.1.1
transformers 4.35.2
matplotlib 3.8.2

Hardware:

Ubuntu 20.04.6 LTS (Kaggle) AMD EPYC 7B12 CPU @ 2.25GHz (4 CPUs) 30GB RAM, 1xTesla GPU P100 16 GB (Kaggle), 73 GB disc
Also tested on Debian GNU/Linux 11, 1xNvidia GPU rtx 3090 24 GB, 252 GB RAM, 500 GB disc

Preprocessing

Create a folder called data/ in the main directory
Add the training data in parquet format, e.g., de_train.parquet as in the competition. Make sure to edit config/config_train.json by specifying the correct training data path. The same path should be entered in config/config_test as it is needed to read columns at inference time.
Also add the test data and a sample submission file (both should be csv files) in the same directory data/ and edit config/config_test.json accordingly. Note that paths to trained models and to the training data should be specified in config/config_test.json.
Run python prepare_data.py to complete all required preprocessing steps

Training

Make sure to locate in the main directory

Run python train.py to train models. This will automatically create a directory call trained_models and store the trained models. One can specify the number of folds in the K-fold cross-validation scheme using python train.py --kf_n_splis. Similarly, the number of epochs can be specified using --epochs
Pretrained models can also be downloaded, see link on Kaggle to avoid training.

Predicting

Check that there is a non-empty directory named trained_models and that its path is specified in config/config_test.json

Run python predict.py to predict on the test data whose path is specified in config/config_test.json. This will automatically create a directory submissions and store predictions in submissions/submission.csv

1st-place-solution-single-cell-pbs's People

Contributors

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.