Expected Behavior SO I was trying out Titan with my own dataset.<b

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type=

Hi Lihua, This was done with RDKit: <div class="snippet-clipboar

Hi Lihua, This was done with RDKit: <div class="sni

The logger give some advice, but I don't understand it. about titan HOT 8 CLOSED

paccmann commented on September 25, 2024

The logger give some advice, but I don't understand it.

from titan.

Comments (8)

cirdeCyL commented on September 25, 2024

Hi a little Reminder.
Shall ignore the warning from above?
Where does this Messaage gets printout ?

Cheers. Cedric

from titan.

jannisborn commented on September 25, 2024

Hi @cirdeCyL sorry for the long cycle time.

First of all, you should always expect problems ;)

However, this is a logging message so it might be possible to ignore it safely. I cant judge from distance whether you really can.

Here's some context: TITAN models rely on something called SMILESLanguage from the pytoda library. This class is responsible for 1) preprocessing the SMILES (padding, start/stop tokens, canonicalizing etc), 2) splitting the strings into tokens and 3) converting them into integers that can be embedded via nn.Embedding. If you finetune TITAN on your own dataset you should, ideally, set all those values where the warning is raised, to the same value as during pretraining. If you dont do that, you make it harder for the model to transfer to your dataset. This can get quite drastic, e.g., if you bring molecules that have tokens that are not available in the SMILES language

I'd say in most cases you can safely ignore this warning but if you experience issues with your model these messages are a good hint as to where to start debugging

from titan.

cirdeCyL commented on September 25, 2024

Hmm maybe, I am stupid.
So I am trying to finetune the model. I don't get a warning for each example of Ligand and Sequence. This message from above, occurs one time at the beginning and then the model just run through. So I cant set those values where the warning is raised, because the message does not provide at which example it got raised.
I am also not so sure what you mean with "set those values (where the warning is raised), to the same value as during pretraining".
Do you mean not to use finetuning and directly train from scratch? I suspect, that the model will perform worse in that case.

from titan.

jannisborn commented on September 25, 2024

haha no worries you are not stupid, the warnings are raised intentionally upon dataset setup where the provided configuration is compared to the past one.

Hence, you dont see logging messages during training unless for very drastic instances, e.g., you set padding_length to 20 but then provide a molecule that has 200 tokens. In that case, the molecules are cropped (only the first 20 tokens are fed to the model) but you will see a warning during training.

The message occurs twice since you set up a train and a test dataset.The reported problems are about configuration parameter (like whether to surround the sequences by <START> and <STOP> tokens. These params can be configured via the parameter .json (--params_filepath) so you should have control to fix those, I think. This is what I meant when saying "set those values (where the warning is raised), to the same value as during pretraining".

Hope this helps 👋🏼

from titan.

cirdeCyL commented on September 25, 2024

Perfect. I will try it out.
I was thinking about increasing the token-size in the param file, but I was afraid that this might change the model and this is what I tried to avoid.
Thank You a lot.

from titan.

Lihua1990 commented on September 25, 2024

Hi @jannisborn , I am interested in how you prepared the epitope.csv (which contains the amino acid sequence for the epitopes) file into the epitope.smi (which contains the SMILES for the epitopes as input for training or finetuning the model) file, could you please elaborate how you prepared the .smi file? Which tool did you use for generating the epitope.smi files? Probably the warning is due to different methods for generating the .smi file here.

Thanks for your reply in advance!
Lihua

from titan.

jannisborn commented on September 25, 2024

Hi Lihua,

This was done with RDKit:

from rdkit import Chem 
mol = Chem.MolFromFasta('EFG')
smi = Chem.MolToSmiles(mol)

from titan.

Lihua1990 commented on September 25, 2024

Hi Lihua,

This was done with RDKit:

from rdkit import Chem 
mol = Chem.MolFromFasta('EFG')
smi = Chem.MolToSmiles(mol)

Thanks for your reply. Will try this.

Best,
Lihua

from titan.

The logger give some advice, but I don't understand it. about titan HOT 8 CLOSED

Comments (8)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent