Git Product home page Git Product logo

Comments (8)

cirdeCyL avatar cirdeCyL commented on September 25, 2024

Hi a little Reminder.
Shall ignore the warning from above?
Where does this Messaage gets printout ?

Cheers. Cedric

from titan.

jannisborn avatar jannisborn commented on September 25, 2024

Hi @cirdeCyL sorry for the long cycle time.

First of all, you should always expect problems ;)

However, this is a logging message so it might be possible to ignore it safely. I cant judge from distance whether you really can.

Here's some context: TITAN models rely on something called SMILESLanguage from the pytoda library. This class is responsible for 1) preprocessing the SMILES (padding, start/stop tokens, canonicalizing etc), 2) splitting the strings into tokens and 3) converting them into integers that can be embedded via nn.Embedding. If you finetune TITAN on your own dataset you should, ideally, set all those values where the warning is raised, to the same value as during pretraining. If you dont do that, you make it harder for the model to transfer to your dataset. This can get quite drastic, e.g., if you bring molecules that have tokens that are not available in the SMILES language

I'd say in most cases you can safely ignore this warning but if you experience issues with your model these messages are a good hint as to where to start debugging

from titan.

cirdeCyL avatar cirdeCyL commented on September 25, 2024

Hmm maybe, I am stupid.
So I am trying to finetune the model. I don't get a warning for each example of Ligand and Sequence. This message from above, occurs one time at the beginning and then the model just run through. So I cant set those values where the warning is raised, because the message does not provide at which example it got raised.
I am also not so sure what you mean with "set those values (where the warning is raised), to the same value as during pretraining".
Do you mean not to use finetuning and directly train from scratch? I suspect, that the model will perform worse in that case.

from titan.

jannisborn avatar jannisborn commented on September 25, 2024

haha no worries you are not stupid, the warnings are raised intentionally upon dataset setup where the provided configuration is compared to the past one.

Hence, you dont see logging messages during training unless for very drastic instances, e.g., you set padding_length to 20 but then provide a molecule that has 200 tokens. In that case, the molecules are cropped (only the first 20 tokens are fed to the model) but you will see a warning during training.

The message occurs twice since you set up a train and a test dataset.The reported problems are about configuration parameter (like whether to surround the sequences by <START> and <STOP> tokens. These params can be configured via the parameter .json (--params_filepath) so you should have control to fix those, I think. This is what I meant when saying "set those values (where the warning is raised), to the same value as during pretraining".

Hope this helps 👋🏼

from titan.

cirdeCyL avatar cirdeCyL commented on September 25, 2024

Perfect. I will try it out.
I was thinking about increasing the token-size in the param file, but I was afraid that this might change the model and this is what I tried to avoid.
Thank You a lot.

from titan.

Lihua1990 avatar Lihua1990 commented on September 25, 2024

Hi @jannisborn , I am interested in how you prepared the epitope.csv (which contains the amino acid sequence for the epitopes) file into the epitope.smi (which contains the SMILES for the epitopes as input for training or finetuning the model) file, could you please elaborate how you prepared the .smi file? Which tool did you use for generating the epitope.smi files? Probably the warning is due to different methods for generating the .smi file here.

Thanks for your reply in advance!
Lihua

from titan.

jannisborn avatar jannisborn commented on September 25, 2024

Hi Lihua,

This was done with RDKit:

from rdkit import Chem 
mol = Chem.MolFromFasta('EFG')
smi = Chem.MolToSmiles(mol)

from titan.

Lihua1990 avatar Lihua1990 commented on September 25, 2024

Hi Lihua,

This was done with RDKit:

from rdkit import Chem 
mol = Chem.MolFromFasta('EFG')
smi = Chem.MolToSmiles(mol)

Thanks for your reply. Will try this.

Best,
Lihua

from titan.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.