kavrakilab / spec2mol Goto Github PK

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

spec2mol's Introduction

Spec2Mol

Spec2Mol is a deep learning architecture for recommending molecular structures from MS/MS spectra.

Spec2Mol is an encoder-decoder architecture: The endoder creates an embedding from a given set of MS/MS spectra. The decoder reconstructs the molecular structure, in a SMILES format, given the embedding that the encoder generates.

The implementation of the Spec2Mol architecture is based on the Pytorch library.

Processing of the chemical data is based on the RDKit software.

Installation

Create a conda environment:

conda create -n spec2mol python=3.7
source activate spec2mol
conda install rdkit -c rdkit
conda install pytorch=1.6.0 torchvision -c pytorch

Generate spectra embeddings:

python predict_embs.py -pos_low_file 'sample_data/[M+H]_low.csv' \
                     -pos_high_file 'sample_data/[M+H]_high.csv' \
                     -neg_low_file 'sample_data/[M-H]_low.csv' \
                     -neg_high_file 'sample_data/[M-H]_high.csv' \

where pos_low_file, pos_high_file, neg_low_file, neg_high_file are the csv files with the four input spectra:

pos_low_file: precursor [M+H]+, energy 35% NCE (Normalized Collision Energy)

pos_high_file: precursor [M+H]+, energy 130% NCE

neg_low_file: precursor [M-H]-, energy 35% NCE

neg_high_file: precursor [M-H]-, energy 130% NCE

Each csv file has the m/z values in the first column and the intensity values in the second column. The columns are separated with commas. See file sample_data.

Dataset

The spectra encoder has been trained on the NIST Tandem Mass Spectral Library 2020 which is a commercial dataset.

Citation

@article{metatrans,
  author = {Litsa, Eleni E. and Chenthamarakshan, Vijil and Das, Payel and Kavraki, Lydia E.},
  title = {An end-to-end deep learning framework for translating mass spectra to de-novo molecules},
  journal = {Communications Chemistry},
  year = {2023},
}

spec2mol's People

Contributors

Stargazers

Forkers

jiangtao639 judechang x1nyulu wushaowen1992 derky1202 hengzzzhou 1000000000000000000000000000000 chippop zhang1leo zyleeyang

spec2mol's Issues

"--device cpu" does not work when running decode_embeddings.py

Hello,

I am running on my laptop, a device that does not have a GPU.

When I ran the following command:
"python -m scripts.decode_embeddings --output_file decoded_output.csv --predicted_embeddings sample.pt --model translation --device cpu --model_load models/model.pt --vocab_load models/vocab.nb --config_load models/config.nb --n_batch 65 --num_variants 3"

I got this error:
"ArgumentParser(prog='decode_embeddings.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)
Namespace(config_load='models/config.nb', device='cpu', model='translation', model_load='models/model.pt', n_batch=65, num_variants=3, output_file='decoded_output.csv', predicted_embeddings='sample.pt', vocab_load='models/vocab.nb')
{'#': 0, '%': 1, '(': 2, ')': 3, '+': 4, '-': 5, '0': 6, '1': 7, '2': 8, '3': 9, '4': 10, '5': 11, '6': 12, '7': 13, '8': 14, '9': 15, '=': 16, 'B': 17, 'Br': 18, 'C': 19, 'Cl': 20, 'F': 21, 'H': 22, 'I': 23, 'N': 24, 'O': 25, 'P': 26, 'S': 27, '[': 28, ']': 29, 'c': 30, 'n': 31, 'o': 32, 'p': 33, 's': 34, '': 35, '': 36, '': 37, '': 38}
0%| | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\pearsor5\AppData\Local\miniconda3\envs\spec2mol\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\pearsor5\AppData\Local\miniconda3\envs\spec2mol\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\pearsor5\Documents\Scripts\Spec2mol_Test\Spec2Mol\decoder\scripts\decode_embeddings.py", line 176, in
main(config.model, config)
File "C:\Users\pearsor5\Documents\Scripts\Spec2mol_Test\Spec2Mol\decoder\scripts\decode_embeddings.py", line 166, in main
config.num_variants,
File "C:\Users\pearsor5\Documents\Scripts\Spec2mol_Test\Spec2Mol\decoder\scripts\decode_embeddings.py", line 45, in decode
mu_predicted = mu_predicted.cuda()
File "C:\Users\pearsor5\AppData\Local\miniconda3\envs\spec2mol\lib\site-packages\torch\cuda_init.py", line 186, in _lazy_init
check_driver()
File "C:\Users\pearsor5\AppData\Local\miniconda3\envs\spec2mol\lib\site-packages\torch\cuda_init.py", line 68, in _check_driver
http://www.nvidia.com/Download/index.aspx""")
AssertionError:
Found no NVIDIA driver on your system. Please check that you
have an NVIDIA GPU and installed a driver from
http://www.nvidia.com/Download/index.aspx"

To fix my issue, I edited line 45 in the decode_embeddings.py script.

Line 45 was changed from:
" mu_predicted = mu_predicted.cuda()" to " mu_predicted = mu_predicted.to(config.device)".

After some minor changes to the code I was able to get the example working and the output was the following:

Thanks for making the code available!

I can't find smiles_NISTall_tanh.pt

Hello,I can't find smiles_NISTall_tanh.pt. where can I get the file

unexpected indent in predict_embs.py

command:

python3 predict_embs.py -pos_low_file 'sample_data/[M+H]_low.csv' -pos_high_file 'sample_data/[M+H]_high.csv' -neg_low_file 'sample_data/[M-H]_low.csv' -neg_high_file 'sample_data/[M-H]_high.csv'

error:

 File "/content/Spec2Mol/predict_embs.py", line 103
    args = parser.parse_args()
IndentationError: unexpected indent
````

Predicted embds

Hi,

in predict_embs.py file, there are several unnecessary empty lines causing the indent error. They can be deleted as follows::

if name == 'main':
parser = argparse.ArgumentParser()
parser.add_argument('-pos_low_file', type=str, default=None, help='csv file with positive mode [M+H]+ low energy spectrum')
parser.add_argument('-pos_high_file', type=str, default=None, help='csv file with positive mode [M+H]+ high energy spectrum')
parser.add_argument('-neg_low_file', type=str, default=None, help='csv file with positive mode [M-H]+ low energy spectrum')
parser.add_argument('-neg_high_file', type=str, default=None, help='csv file with positive mode [M-H]- low energy spectrum')
args = parser.parse_args()
main(args)

In addition, a command for saving pre_emb is missing in predict_embs.py (e.g. torch.save(pred_emb, 'sample_my.pt')).

Finally, it would be very helpful to write a wrapper to couple the embs predictor with the decoder and to add an automatic setting of the unknown keys when processing the example input data.

kavrakilab / spec2mol Goto Github PK

spec2mol's Introduction

Spec2Mol

Installation

Generate spectra embeddings:

Dataset

Citation

spec2mol's People

Contributors

Stargazers

Forkers

spec2mol's Issues

"--device cpu" does not work when running decode_embeddings.py

I can't find smiles_NISTall_tanh.pt

unexpected indent in predict_embs.py

Predicted embds

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent