paccmann / titan Goto Github PK
View Code? Open in Web Editor NEWCode for "T Cell Receptor Specificity Prediction with Bimodal Attention Networks" (https://doi.org/10.1093/bioinformatics/btab294, ISMB 2021)
License: MIT License
Code for "T Cell Receptor Specificity Prediction with Bimodal Attention Networks" (https://doi.org/10.1093/bioinformatics/btab294, ISMB 2021)
License: MIT License
Hi,
I'm using flexible_model_eval.py
to predict my own input data, but the output prediction has fewer rows than input rows, and I don't know which rows are dropped. There should not be unfound receptor and ligand in the affinity file.
Part of reason may be set drop_last=True
in DataLoader
, but set to False still can't solve the problem completely. Could you help check the issue?
Thanks
Hi Team,
so I have a dataset with training and test example, and I want to check out your model.
What and how shall I approach this?
Shall I still going for the sewmifrozen model and train it or shall I train it from scratch?
I am grateful if you can some instructions tips for how to do it.
Best cheers,
Cedric
Hi, I was trying to use TITAN to TCR epitope prediction by calling the flexible_model_eval.py
script, my tcr and eiptope input are both .csv
file. I got an error
I tried change protein_language
to protein_languages
in the flexible_model_eval.py
, but got another error
I install all the requirement listed in the requirements.txt
, but I doubt pytoda==1.0.2
is the correct version to use the script.
Could you help check and test the code?
SO I was trying out Titan with my own dataset.
Essentially I preprocessed my files to have the same column name. Labelling my CDR3 and Epitopes and further transform the epitopes in smiles.
So now running the semifrozen training, I expected the to have no problems.
Instead I got this prompted.
Here are the problems: Provided arg add_start_and_stop:True does not match the smiles_language value: False NOTE: smiles_language value takes preference!! Provided arg padding:True does not match the smiles_language value: False NOTE: smiles_language value takes preference!! Provided arg padding_length:500 does not match the smiles_language value: None NOTE: smiles_language value takes preference!! To get rid of this, adapt the smiles_language *offline*, feed itready for intended usage, and adapt the constructor args to be identical with their equivalents in the language object Since you provided a smiles_language, the following parameters to this class will be ignored: canonical, augment, kekulize, all_bonds_explicit, selfies, sanitize, all_hs_explicit, remove_bonddir, remove_chirality, randomize, add_start_and_stop, padding, padding_length, device. Here are the problems: Provided arg add_start_and_stop:True does not match the smiles_language value: False NOTE: smiles_language value takes preference!! Provided arg padding:True does not match the smiles_language value: False NOTE: smiles_language value takes preference!! Provided arg padding_length:500 does not match the smiles_language value: None NOTE: smiles_language value takes preference!! To get rid of this, adapt the smiles_language *offline*, feed itready for intended usage, and adapt the constructor args to be identical with their equivalents in the language object Steps to Reproduce the Problem
I was looking into code and couldn't find, where this Message get printed out.
It would be very helpful If you can help me with understanding the message and how I should preprocess my data.
Thanks a lot beforehand,
Cedric
Hi! This is a really good job. I have a question for you.
As for the use of this model, is it necessary to contain the full length of TCR obtained by V and J gene conversion before it can be used?
If I need to predict a TCR sequence that is incomplete and lacks the full-length information, as well as the specific V and J genes involved, what should I do? @annaweber209
Hi there,
Many thanks for developing TITAN and making it public!
This sounds very interesting and I would like to give it a go, but already fail to use the provided test data. Specifically, I'm looking at Run trained TITAN model on data section, which I've translated to:
python3 scripts/flexible_model_eval.py \
tutorial/data/test_small.csv \
tutorial/data/tcr_full.csv \
tutorial/data/epitopes.csv \
../TITAN-dataset/trained_model/ \
bimodal_mca test20230303
I'm unsure about ./TITAN-dataset/trained_model/
, which points to the externally provided data (https://ibm.box.com/v/titan-dataset ). But I couldn't find a model in the repo providing the code. Anyway, the above fails with:
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [128, 1, 500, 26, 32]
Would it be possible to expand the documentation sections on to show how to use https://ibm.box.com/v/titan-dataset?
Many thanks,
Andreas
Hi
i would like run experiments with the shared data. The box link in below is not valid anymore.
https://ibm.box.com/v/titan-dataset
Could you reshare the data with us again?
Thank you
Hi,
First thank you for compiling and providing a benchmark 10-CV dataset.
The MHC molecular information is also important during the binding of the pMHC complex to the TCR.
Could you please provide additional information on the MHC-I molecules (HLA-X*XX:XX
, e.g. HLA-A*02:01
) corresponding to each data (epitope-tcr)?
Thanks
Hi,
Thanks for your help with the python version issue.
I had a question about a workflow for training models from 'scratch'. From what I gather, the 'flexible_training.py' script permits training of a TITAN model. Does this script pretrain the model on BindingDB? I assume that from there, one would finetune this model on TCR sequence data and epitope data of choice - e.g., using the semi_frozen_finetuning.py script?
Best,
Paul
As I was reviewing the project, I noticed that you have provided access to the code and VDJdb dataset. However, the Bindingdb data was not included. Would it be possible for you to share this processed Bindingdb dataset with me
Hi! I am very interested in your article “TITAN: T-cell receptor specificity prediction with bimodal attention networks” . In the article fig4 c, I wonder which tools I can use to visualize the epitope peptides based on the attention scores.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.