In develop and on google collab, the BasicAnnotationExtractor class is not loading wei

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Error in BassicAnnotationextractor about bio_embeddings HOT 6 CLOSED

sacdallago commented on May 27, 2024

Error in BassicAnnotationextractor

from bio_embeddings.

Comments (6)

sacdallago commented on May 27, 2024

Hey @sacombs , thanks for using the pipeline and rising this issue.

The usual disclaimer
Caution: develop is unstable and can break, so please we aware that installing from develop may lead to headaches at times :) This mostly has to do with the fact that there's a few people working on a few things at any given time & we work in small PR cycles (this happens on a different channel... maybe we should port our development to github so it's more "public"). I try my best to push to devlop semi-stable versions, but this doesn't always play out as I wish :P

Nevertheless:
A very recent change (two days ago, b7118c4) introduced the generalization of supervised predictions to Bert from SeqVec (since they are almost identical; enphasis on almost :)). Unfortunately, there was an oversight, and a few details of how the prediction models are constructed differ between SeqVec and Bert. Specifically, in the case of your issue, this has to do with the fact that batch_norm isn't applied on Bert, while it is on SeqVec embeddings. This issue was fixed in 7ff6a9b.

HOWEVER, another issue we encounter is that the training of the secondary structure model was done in batches, so passing single embeddings will fail because the "batch dimension" is missing from the input.

@mheinzinger is working on a fix for this and we'll update you on this issue once that fix is merged.

Thanks!!! :)

from bio_embeddings.

sacombs commented on May 27, 2024

Hi,

Thanks for the information!

As an update, there is now an error in the annotations section on the google collab where you iterate through the embeddings for the annotations:

for id, embedding in zip([s.id for s in sequences], tqdm(embeddings, total=number_of_sequences-1)):
    annotations = annotations_extractor.get_annotations(embeddings)

There is an error with the generator...not sure what is going on.

from bio_embeddings.

konstin commented on May 27, 2024

Oh, it looks like this should be get_annotations(embedding) instead of get_annotations(embeddings). I'll check if I can fix the colabs

from bio_embeddings.

sacombs commented on May 27, 2024

I thought so, but it still leads to an error. I think it has to do with the input tensors being the wrong size.

from bio_embeddings.

sacdallago commented on May 27, 2024

Hey :) sorry, we are still working on fixing various issues around this particular aspect of the pipeline and will get back to you once we have it ready (including an updated & fully functional colab using Bert)

from bio_embeddings.

sacdallago commented on May 27, 2024

We seem to have fixed all outstanding issues with extract, and tested the pipeline with bert and seqvec (as in use_case_four in examples).

IMPORTANT, in the process of fixing this, we have uploaded new weights for the bert.zip model, here: http://maintenance.dallago.us/public/embeddings/embedding_models/bert/

if you were using a local copy of these weights, please update them, otherwise the pipeline will still throw you the error you saw in #47. The difference between these bert weights and the previous bert weights is that this bert version was trained on BFD, while the previous was trained on UniRef. From our results (https://doi.org/10.1101/2020.07.12.199554), the BFD trained model performs better than the UniRef model, and thus we swapped the weights out.

The Google Colab has also been fixed and updated according to the latest changes.

If you experience any other issues, let us know!

Best regards,
Chris.

from bio_embeddings.

Error in BassicAnnotationextractor about bio_embeddings HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent