Git Product home page Git Product logo

Comments (5)

damianosmel avatar damianosmel commented on June 7, 2024 1

sure @sacdallago ! nice work 💯 I also work on protein embeddings, let you know when I can publish the code, would be useful for the community to have all the embeddings in one repo..

btw: for me it's straightforward how to use seqvec and average out to get "MNTPA" -> [3,5,1024] -> [5,1024] -> [1024]. However, I would like also to get the embedding for each character and not average out.. I guess I need a transformer tokenizer not only the weights matrix; like the BERT example here?

from bio_embeddings.

sacdallago avatar sacdallago commented on June 7, 2024 1

WIP is the PhD life ;)

from bio_embeddings.

sacdallago avatar sacdallago commented on June 7, 2024

Thanks @damianosmel ; this is super alpha, so all help is appreciated!! I'm working on a very big backend change which does not break the workflow if you just use

from bio_embeddings import SeqVec

Check out: https://github.com/sacdallago/bio_embeddings/tree/pipeline

I think I'll be through with this by end of year

from bio_embeddings.

sacdallago avatar sacdallago commented on June 7, 2024

Sure @damianosmel , that'd be awesome: the more models, the better!

RE your second question: we haven't really tried many things; per char (aka AA), we directly slapped a CNN on the sum (see

embedding = torch.tensor(self._embedding).to(self._device).sum(dim=0, keepdim=True).permute(0, 2, 1).unsqueeze(dim=-1)
and https://github.com/sacdallago/bio_embeddings/blob/master/bio_embeddings/embedders/elmo/feature_inference_models.py#L32).

from bio_embeddings.

damianosmel avatar damianosmel commented on June 7, 2024

Sure @sacdallago! currently it's WIP (PhD best phrase :P) if all go well I would like to submit the work in the end of January, hope it goes smooth..
RE I see.. and I see also that you nicely share the weights between tasks as shown in the SeqVec paper. To use your embeddings I currently do:

		seqvec_embeddings = []
		count_progress = 0
		for seq in self.SEQUENCE.vocab.itos:
			if seq == "<unk>" or seq == "<pad>":
				seqvec_embeddings.append(torch.zeros(self.emb_dim))
			else:
				elmo_layers = seqvec.embed_sentence(list(seq))
				
				# elmo_layers = [3,L,1024]
				residue_emb = torch.tensor(elmo_layers).sum(dim=0)

				# residue_emb = [L,1024]
				prot_emb = residue_emb.mean(dim=0)

				# prot_emb = [1024]

				seqvec_embeddings.append(prot_emb)
			count_progress += 1
		return torch.stack(seqvec_embeddings)

and then I follow the architecture in the SeqVec paper, like the one you shared.
Many thanks again, let you know if my idea goes through, keep it up 🎄 🎄

from bio_embeddings.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.