Comments (6)
Hey @sacombs , thanks for using the pipeline and rising this issue.
The usual disclaimer
Caution: develop is unstable and can break, so please we aware that installing from develop may lead to headaches at times :) This mostly has to do with the fact that there's a few people working on a few things at any given time & we work in small PR cycles (this happens on a different channel... maybe we should port our development to github so it's more "public"). I try my best to push to devlop semi-stable versions, but this doesn't always play out as I wish :P
Nevertheless:
A very recent change (two days ago, b7118c4) introduced the generalization of supervised predictions to Bert from SeqVec (since they are almost identical; enphasis on almost :)). Unfortunately, there was an oversight, and a few details of how the prediction models are constructed differ between SeqVec and Bert. Specifically, in the case of your issue, this has to do with the fact that batch_norm isn't applied on Bert, while it is on SeqVec embeddings. This issue was fixed in 7ff6a9b.
HOWEVER, another issue we encounter is that the training of the secondary structure model was done in batches, so passing single embeddings will fail because the "batch dimension" is missing from the input.
@mheinzinger is working on a fix for this and we'll update you on this issue once that fix is merged.
Thanks!!! :)
from bio_embeddings.
Hi,
Thanks for the information!
As an update, there is now an error in the annotations section on the google collab where you iterate through the embeddings for the annotations:
for id, embedding in zip([s.id for s in sequences], tqdm(embeddings, total=number_of_sequences-1)):
annotations = annotations_extractor.get_annotations(embeddings)
There is an error with the generator...not sure what is going on.
from bio_embeddings.
Oh, it looks like this should be get_annotations(embedding)
instead of get_annotations(embeddings)
. I'll check if I can fix the colabs
from bio_embeddings.
I thought so, but it still leads to an error. I think it has to do with the input tensors being the wrong size.
from bio_embeddings.
Hey :) sorry, we are still working on fixing various issues around this particular aspect of the pipeline and will get back to you once we have it ready (including an updated & fully functional colab using Bert)
from bio_embeddings.
We seem to have fixed all outstanding issues with extract
, and tested the pipeline with bert and seqvec (as in use_case_four in examples).
IMPORTANT, in the process of fixing this, we have uploaded new weights for the bert.zip
model, here: http://maintenance.dallago.us/public/embeddings/embedding_models/bert/
if you were using a local copy of these weights, please update them, otherwise the pipeline will still throw you the error you saw in #47. The difference between these bert weights and the previous bert weights is that this bert version was trained on BFD, while the previous was trained on UniRef. From our results (https://doi.org/10.1101/2020.07.12.199554), the BFD trained model performs better than the UniRef model, and thus we swapped the weights out.
The Google Colab has also been fixed and updated according to the latest changes.
If you experience any other issues, let us know!
Best regards,
Chris.
from bio_embeddings.
Related Issues (20)
- Add support for ESM-2 and ESMFold HOT 3
- Update jax-unirep dependency version
- Protocol prottrans_t5_xl_u50: URLError: <urlopen error [Errno 113] No route to host> HOT 1
- OSError: Unable to open file (truncated file: eof = 63504384, sblock->base_addr = 0, stored_eof = 374434776)
- Custom embeddings
- Docker containers shutting down within a few seconds of starting
- Can not install bio-embedding in wsl HOT 2
- Can not install bioembedings on ubuntu? Please help HOT 1
- Hard times trying to run the bindEmbed21 example.
- Error during first step - greenlet size changed, may indicate binary incompatibility. HOT 1
- 3D Protein Embeddings
- AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__'
- Cant install bio_embeddings in colab HOT 2
- Tensor size issue
- Protocol prottrans_t5_xl_u50: PermissionError: [Errno 13] Permission denied: 'C:\\Users\\user\\AppData\\Local\\Temp\\tmpk7e1m0jg' HOT 1
- from where comes the models in "bio_embeddings/utilities /defaults.yml", where is docs, parameters, dataset ?
- Protocol esm1b: AttributeError: 'dict' object has no attribute 'startswith'
- Cannot Import Any Embedder "load()" has been removed, use yaml = YAML(typ='rt') yaml.load(...)
- Protocol prottrans_bert_bfd: OSError: Unable to load weights from pytorch checkpoint file for .catch/bio_embeddings/prottrans_bert_bfd/model_directory/pytorch_model.bin' at '.catch/bio_embeddings/prottrans_bert_bfd/model_directory/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
- Can Word2Vec be used for 4, 5 and 6 kmer? If possible, which file I need to changed and which parameter. I am seeking Guidance on Adapting Word2Vec Code for 4kmer Sequences
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bio_embeddings.