Git Product home page Git Product logo

Comments (6)

sacdallago avatar sacdallago commented on May 27, 2024

Hey @sacombs , thanks for using the pipeline and rising this issue.

The usual disclaimer
Caution: develop is unstable and can break, so please we aware that installing from develop may lead to headaches at times :) This mostly has to do with the fact that there's a few people working on a few things at any given time & we work in small PR cycles (this happens on a different channel... maybe we should port our development to github so it's more "public"). I try my best to push to devlop semi-stable versions, but this doesn't always play out as I wish :P

Nevertheless:
A very recent change (two days ago, b7118c4) introduced the generalization of supervised predictions to Bert from SeqVec (since they are almost identical; enphasis on almost :)). Unfortunately, there was an oversight, and a few details of how the prediction models are constructed differ between SeqVec and Bert. Specifically, in the case of your issue, this has to do with the fact that batch_norm isn't applied on Bert, while it is on SeqVec embeddings. This issue was fixed in 7ff6a9b.

HOWEVER, another issue we encounter is that the training of the secondary structure model was done in batches, so passing single embeddings will fail because the "batch dimension" is missing from the input.

@mheinzinger is working on a fix for this and we'll update you on this issue once that fix is merged.

Thanks!!! :)

from bio_embeddings.

sacombs avatar sacombs commented on May 27, 2024

Hi,

Thanks for the information!

As an update, there is now an error in the annotations section on the google collab where you iterate through the embeddings for the annotations:

for id, embedding in zip([s.id for s in sequences], tqdm(embeddings, total=number_of_sequences-1)):
    annotations = annotations_extractor.get_annotations(embeddings)

There is an error with the generator...not sure what is going on.

from bio_embeddings.

konstin avatar konstin commented on May 27, 2024

Oh, it looks like this should be get_annotations(embedding) instead of get_annotations(embeddings). I'll check if I can fix the colabs

from bio_embeddings.

sacombs avatar sacombs commented on May 27, 2024

I thought so, but it still leads to an error. I think it has to do with the input tensors being the wrong size.

from bio_embeddings.

sacdallago avatar sacdallago commented on May 27, 2024

Hey :) sorry, we are still working on fixing various issues around this particular aspect of the pipeline and will get back to you once we have it ready (including an updated & fully functional colab using Bert)

from bio_embeddings.

sacdallago avatar sacdallago commented on May 27, 2024

We seem to have fixed all outstanding issues with extract, and tested the pipeline with bert and seqvec (as in use_case_four in examples).

IMPORTANT, in the process of fixing this, we have uploaded new weights for the bert.zip model, here: http://maintenance.dallago.us/public/embeddings/embedding_models/bert/

if you were using a local copy of these weights, please update them, otherwise the pipeline will still throw you the error you saw in #47. The difference between these bert weights and the previous bert weights is that this bert version was trained on BFD, while the previous was trained on UniRef. From our results (https://doi.org/10.1101/2020.07.12.199554), the BFD trained model performs better than the UniRef model, and thus we swapped the weights out.

The Google Colab has also been fixed and updated according to the latest changes.

If you experience any other issues, let us know!

Best regards,
Chris.

from bio_embeddings.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.