Comments (5)
sure @sacdallago ! nice work 💯 I also work on protein embeddings, let you know when I can publish the code, would be useful for the community to have all the embeddings in one repo..
btw: for me it's straightforward how to use seqvec and average out to get "MNTPA" -> [3,5,1024] -> [5,1024] -> [1024]. However, I would like also to get the embedding for each character and not average out.. I guess I need a transformer tokenizer not only the weights matrix; like the BERT example here?
from bio_embeddings.
WIP is the PhD life ;)
from bio_embeddings.
Thanks @damianosmel ; this is super alpha, so all help is appreciated!! I'm working on a very big backend change which does not break the workflow if you just use
from bio_embeddings import SeqVec
Check out: https://github.com/sacdallago/bio_embeddings/tree/pipeline
I think I'll be through with this by end of year
from bio_embeddings.
Sure @damianosmel , that'd be awesome: the more models, the better!
RE your second question: we haven't really tried many things; per char (aka AA), we directly slapped a CNN on the sum (see
and https://github.com/sacdallago/bio_embeddings/blob/master/bio_embeddings/embedders/elmo/feature_inference_models.py#L32).from bio_embeddings.
Sure @sacdallago! currently it's WIP (PhD best phrase :P) if all go well I would like to submit the work in the end of January, hope it goes smooth..
RE I see.. and I see also that you nicely share the weights between tasks as shown in the SeqVec paper. To use your embeddings I currently do:
seqvec_embeddings = []
count_progress = 0
for seq in self.SEQUENCE.vocab.itos:
if seq == "<unk>" or seq == "<pad>":
seqvec_embeddings.append(torch.zeros(self.emb_dim))
else:
elmo_layers = seqvec.embed_sentence(list(seq))
# elmo_layers = [3,L,1024]
residue_emb = torch.tensor(elmo_layers).sum(dim=0)
# residue_emb = [L,1024]
prot_emb = residue_emb.mean(dim=0)
# prot_emb = [1024]
seqvec_embeddings.append(prot_emb)
count_progress += 1
return torch.stack(seqvec_embeddings)
and then I follow the architecture in the SeqVec paper, like the one you shared.
Many thanks again, let you know if my idea goes through, keep it up 🎄 🎄
from bio_embeddings.
Related Issues (20)
- Update jax-unirep dependency version
- Protocol prottrans_t5_xl_u50: URLError: <urlopen error [Errno 113] No route to host> HOT 1
- OSError: Unable to open file (truncated file: eof = 63504384, sblock->base_addr = 0, stored_eof = 374434776)
- Custom embeddings
- Docker containers shutting down within a few seconds of starting
- Can not install bio-embedding in wsl HOT 2
- Can not install bioembedings on ubuntu? Please help HOT 1
- Hard times trying to run the bindEmbed21 example.
- Error during first step - greenlet size changed, may indicate binary incompatibility. HOT 1
- 3D Protein Embeddings
- AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__'
- Cant install bio_embeddings in colab HOT 3
- Tensor size issue
- Protocol prottrans_t5_xl_u50: PermissionError: [Errno 13] Permission denied: 'C:\\Users\\user\\AppData\\Local\\Temp\\tmpk7e1m0jg' HOT 1
- from where comes the models in "bio_embeddings/utilities /defaults.yml", where is docs, parameters, dataset ?
- Protocol esm1b: AttributeError: 'dict' object has no attribute 'startswith'
- Cannot Import Any Embedder "load()" has been removed, use yaml = YAML(typ='rt') yaml.load(...)
- Protocol prottrans_bert_bfd: OSError: Unable to load weights from pytorch checkpoint file for .catch/bio_embeddings/prottrans_bert_bfd/model_directory/pytorch_model.bin' at '.catch/bio_embeddings/prottrans_bert_bfd/model_directory/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
- Can Word2Vec be used for 4, 5 and 6 kmer? If possible, which file I need to changed and which parameter. I am seeking Guidance on Adapting Word2Vec Code for 4kmer Sequences
- Protocol prottrans_t5_bfd: HTTPError: HTTP Error 406: Not Acceptable
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bio_embeddings.