idospringer / ergo-ii Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 6.0 89 MB

ERGO-II, an updated version of ERGO including more features for TCR-peptide binding prediction

License: MIT License

Python 100.00%

ergo-ii's People

Contributors

Stargazers

Watchers

Forkers

louzounlab giancarlocroce nam0u b8307038 miaomiao2017 ivangl10

ergo-ii's Issues

Using ERGO

I have CDR3 sequence from my MiXCR

I want to use ERGO

ERGO does need two columns, CDR3 and peptide

I don't from where provide the peptide column

Please, can you give me an intuition in this context?

Thank you in advance

Q1) i try to run ERGO-ii, but it popup with an error msg (UnboundLocalError: local variable 'version' referenced before assignment)
may I ask what happens?
Q2) http://tcr2.cs.biu.ac.il/home, it's not working, does it maintenance?

Thanks,

Instructions for training a new model

Hi there, thanks a lot for your contribution to ERGO and ERGO-II, which are really useful.
I want to train a new model from different datasets, however, the instructions of training are missed in the README file.
I would appreciate it if you could add it. Thank you very much.

The performance on Dash et al. Dataset is under expectation.

Hi there, thanks a lot for your efforts to ERGO and ERGO II, which help me a lot.
ERGO performs quite well under my testing, while ERGO II seems to perform under my expectation.
Here are my results.

I have double-checked the input data with my colleagues and there seems to be nothing wrong.
The original dataset, input files and output files are attached below.
all files of ERGO2.zip
I would appreciate it if you could check whether there is anything wrong. Thanks a lot in advance for any replies.

Reproduce training

Hello,

Could you please document in the repository how to run the training in order to reproduce the paper results?

You only documented how to run

python Predict.py dataset file

It would nice to see how to train the models in order to obtain the paper results.

Thanks.

Some code and model path errors.

Dear author.

Hello.

Thanks for your nice tools.

When I using ERGO-II, I found some errors in code.

TCR ae model location error.

_python3 ./Predict.py vdjdb ./example.csv 
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/decorators.py:13: UserWarning: data_loader decorator deprecated in 0.7.0. Will remove 0.9.0
  warnings.warn(w)
Traceback (most recent call last):
  File "./Predict.py", line 104, in <module>
    df = predict(sys.argv[1], sys.argv[2])
  File "./Predict.py", line 86, in predict
    model, train_file = get_model(dataset)
  File "./Predict.py", line 72, in get_model
    model = load_model(hparams, checkpoint)
  File "./Predict.py", line 43, in load_model
    model = ERGOLightning(hparams)
  File "/opt/ERGO-II/Trainer.py", line 49, in __init__
    self.tcra_encoder = AE_Encoder(encoding_dim=self.encoding_dim, tcr_type='alpha', max_len=34)
  File "/opt/ERGO-II/Models.py", line 102, in __init__
    self.init_ae_params(train_ae)
  File "/opt/ERGO-II/Models.py", line 110, in init_ae_params
    checkpoint = torch.load(ae_file)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 525, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 212, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 193, in __init__
    super(_open_file, self).__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'TCR_Autoencoder/tcra_ae_dim_100.pt'

so, I copy the directory.
( in ERGO2 directory:) cp -r ./Models/AE ./TCR_Autoencoder
and it works well.

Torch CPU-only error.

python3 ./Predict.py vdjdb ./example.csv 
/usr/local/lib/python3.8/dist-packages/pytorch_lightning/core/decorators.py:13: UserWarning: data_loader decorator deprecated in 0.7.0. Will remove 0.9.0
  warnings.warn(w)
Traceback (most recent call last):
  File "./Predict.py", line 105, in <module>
    df = predict(sys.argv[1], sys.argv[2])
  File "./Predict.py", line 87, in predict
    model, train_file = get_model(dataset)
  File "./Predict.py", line 73, in get_model
    model = load_model(hparams, checkpoint)
  File "./Predict.py", line 43, in load_model
    model = ERGOLightning(hparams)
  File "/opt/ERGO-II/Trainer.py", line 49, in __init__
    self.tcra_encoder = AE_Encoder(encoding_dim=self.encoding_dim, tcr_type='alpha', max_len=34)
  File "/opt/ERGO-II/Models.py", line 102, in __init__
    self.init_ae_params(train_ae)
  File "/opt/ERGO-II/Models.py", line 110, in init_ae_params
    checkpoint = torch.load(ae_file)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 529, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 702, in _legacy_load
    result = unpickler.load()
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 665, in persistent_load
    deserialized_objects[root_key] = restore_location(obj, location)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 156, in default_restore_location
    result = fn(storage, location)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 132, in _cuda_deserialize
    device = validate_cuda_device(location)
  File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 116, in validate_cuda_device
    raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I used CPU-only machine, and torch need to set map_locations.
in ERGO-II/Models.py : checkpoint = torch.load(ae_file) -> checkpoint = torch.load(ae_file, map_location=torch.device('cpu' ))
it works well.

I wish it will be helpful.

Best regards.

Jeongjun Chae

Website Not working

Hello,
I see the website is back online for using ergo II. However, Everytime I try to sublmit a prediction I get the bad input error message.
My file is exactly similar to the example file:

TRA,TRB,TRAV,TRAJ,TRBV,TRBJ,T-Cell-Type,Peptide,MHC
CATKTPTGTASKLTF,CSAAVGRVTGELFF,TRAV17,TRAJ44,TRBV20-1,TRBJ2-2,CD8,AAAAAIFVI,
CATKTPTGTASKLTF,CSAAVGRVTGELFF,TRAV17,TRAJ44,TRBV20-1,TRBJ2-2,CD8,AAAAALDKKQRNFDKILA,
CATKTPTGTASKLTF,CSAAVGRVTGELFF,TRAV17,TRAJ44,TRBV20-1,TRBJ2-2,CD8,AAAACTTMK,
CATKTPTGTASKLTF,CSAAVGRVTGELFF,TRAV17,TRAJ44,TRBV20-1,TRBJ2-2,CD8,AAAAGWQTL,
CATKTPTGTASKLTF,CSAAVGRVTGELFF,TRAV17,TRAJ44,TRBV20-1,TRBJ2-2,CD8,AAAAKLAGLVFPQPPAPIAV,
CATKTPTGTASKLTF,CSAAVGRVTGELFF,TRAV17,TRAJ44,TRBV20-1,TRBJ2-2,CD8,AAAASTYTGF,

...
It is 48K lines long may be it has become too much ? Whatever the parameters I choose, the error remains.

Dataset name - `mcpas_human`?

Hello, in the code, I see you write: mcpas_human_train_samples.pickle.

In the repo, the file has a different name: mcpas_train_samples.pickle.

Do these two names refer to the same files? Or is a file missing?

Thanks

Inquiry about pre-trained AE models in ERGO-II

Hi Ido,

I recently came across your “ERGO-II” paper and I found the idea of using TCR alpha and beta CDR3, MHC typing, VJ genes in peptide binding problems intriguing. Also, the performance in your paper is very impressive!

I noticed that two pre-trained AE models (one for TCR beta, the other for TCR alpha) were applied in the paper. I am very interested in training the models by myself, however, I may need to compare my models with yours to make sure I did correctly. I wonder if you could kindly share those pre-trained models with me? I was able to easily find the pre-trained AE model for the original ERGO, however I can’t seem to find the pre-trained AE models that were used for ERGO II.

I appreciate your time and look forward to hearing from you soon.
Best regards,

Pengfei Zhang
[email protected]

LICENSE

Hi there, thanks for making ERGO-II public. Could you please clarify the license? Thanks a lot!

ERGO-II output Score

Hi,

Can you give me some information about the meaning of the score provided in ERGO-II result table?
In particular, I'm wondering if I have to consider it as a binding probability, like a p-value so the best prediction is the lowest score or the higher the score, the better the prediction.

Thank you,

Elisa

Suggestion to handle model deserialization on CPU-only machines

I recently encountered an issue when using your software on a CPU-only machine. The problem occurs when attempting to load a model that was trained on a CUDA device while running the code on a machine without GPU support. The specific error message is:

warnings.warn(w)
Traceback (most recent call last):
File "Predict.py", line 103, in
df = predict(sys.argv[1], sys.argv[2])
File "Predict.py", line 85, in predict
model, train_file = get_model(dataset)
File "Predict.py", line 71, in get_model
model = load_model(hparams, checkpoint)
File "Predict.py", line 42, in load_model
model = ERGOLightning(hparams)
...
To make your software more versatile and compatible with CPU-only machines, I suggest modifying the line in Models.py where the model is being loaded:

checkpoint = torch.load(ae_file)

Change it to:

checkpoint = torch.load(ae_file, map_location=torch.device('cpu'))

By adding the map_location parameter, the model will be loaded on the CPU even if it was originally trained on a CUDA device. This small change will make your software more accessible to users without GPU support.

Thank you for your attention to this matter, and I hope this suggestion is helpful.

Error with "Bad input-file, please choose a different input file"

Dear Springer,

Thanks for developing this fantastic package. It is really useful.

When I used the web server, it says "Bad input-file, please choose a different input file". Here is my test csv file,

TRA,TRB,TRAV,TRAJ,TRBV,TRBJ,T-Cell-Type,Peptide,MHC
,CASGAQGTNTEAFF,,,TRBV5-4,TRBJ1-1,CD8,KLLPENNVL,HLA-B08
CAAGYFTGGGNKLTF,,TRAV23/DV6,TRAJ10,,,CD8,KLLPENNVL,HLA-B08
CAVSYSGYSTLTF,,TRAV1-2,TRAJ11,,,CD8,KLLPENNVL,HLA-B08
,CSAVDTTSSTDTQYF,,,TRBV20-1,TRBJ2-3,CD8,KLLPENNVL,HLA-B08
CAVRPRGTGGFKTIF,,TRAV1-1,TRAJ9,,,CD8,KLLPENNVL,HLA-B08
,CASSQAQGGYEQYF,,,TRBV14,TRBJ2-7,CD8,KLLPENNVL,HLA-B08
,CASSLTDNSYEQYF,,,TRBV11-2,TRBJ2-7,CD8,KLLPENNVL,HLA-B08
CLVGGGYSGSARQLTF,,TRAV4,TRAJ22,,,CD8,KLLPENNVL,HLA-B08
CAENGGISSGSARQLTF,,TRAV13-2,TRAJ22,,,CD8,KLLPENNVL,HLA-B08
CAMRPSGGYQKVTF,,TRAV14/DV4,TRAJ13,,,CD8,KLLPENNVL,HLA-B08
,CASSLDRNLDTGELFF,,,TRBV11-2,TRBJ2-2,CD8,KLLPENNVL,HLA-B08
CAMRLLYNFNKFYF,,TRAV14/DV4,TRAJ21,,,CD8,KLLPENNVL,HLA-B08
,CASSKRQQVNEQFF,,,TRBV19,TRBJ2-1,CD8,KLLPENNVL,HLA-B08
,CASSLVLGATGELSF,,,TRBV13,TRBJ2-2,CD8,KLLPENNVL,HLA-B08
CATDALAAGNKLTF,,TRAV17,TRAJ17,,,CD8,KLLPENNVL,HLA-B08
,CASSFARGRADTQYN,,,TRBV28,TRBJ2-3,CD8,KLLPENNVL,HLA-B08
CAGRASGTSYGKLTF,,TRAV35,TRAJ52,,,CD8,KLLPENNVL,HLA-B08
CALTRANSKLTF,,TRAV9-2,TRAJ56,,,CD8,KLLPENNVL,HLA-B08
CAASPNTGNQFYF,,TRAV29/DV5,TRAJ49,,,CD8,KLLPENNVL,HLA-B08
,CASSLSGGGHGYTF,,,TRBV28,TRBJ1-2,CD8,KLLPENNVL,HLA-B08
CAMRENHRDDKIIF,,TRAV14/DV4,TRAJ30,,,CD8,KLLPENNVL,HLA-B08
,CACDEWGVSTDKLIF,,,TRDV2,TRDJ1,CD8,KLLPENNVL,HLA-B08
,CSAPITGSYEQYF,,,TRBV20-1,TRBJ2-7,CD8,KLLPENNVL,HLA-B08
CAASPSMNRQLTF,,TRAV23/DV6,TRAJ22,,,CD8,KLLPENNVL,HLA-B08
,CASSLLPGHLTDPGGEQYF,,,TRBV7-2,TRBJ2-7,CD8,KLLPENNVL,HLA-B08
CALSDELTGANNLFF,,TRAV9-2,TRAJ36,,,CD8,KLLPENNVL,HLA-B08

Would you kindly tell what's going wrong?

Best,
Yingcheng

Training Data

Hi Ido,

Could you please share the training data for ERGO-II? Also, is the training data for the previous ERGO model is same as ERGO-II?