fhalab / embeddings_reproduction Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Thank you so much for your great work.
I read a paper called "DeepPrime2Sec: Deep Learning for Protein Secondary Structure Prediction from the Primary Sequences" by Asgari, E., Poerner, N., McHardy, A., & Mofrad, M.. (https://github.com/ehsanasgari/DeepPrime2Sec)
In the paper, he mentioned he used five kinds of features to do the prediction of protein secondary structure from the protein primary sequence. These five features are:
However, he didn't show how to do these feature extraction. I am not sure if you compared your embedding to his work.
By the way,
In my ML project, I want to embed a protein to a vector and then use DL models to do drug-protein interaction prediction. Do you have an example to show how to use it similar to RDkit, eg.
fp = AllChem.GetMorganFingerprintAsBitVect(mol, radius=2, nBits=512)?
Many thanks!
As a user, it would be nice to directly have some examples in the README to show how I could use this library. There are two simple scenarios that would personally benefit me:
I'm 7GB+ into trying to clone this repo and my computer is incredibly upset. I would suggest making a separate repository to house all of the data so the code can be downloaded and used independently.
Hi,
Somewhat related to #3 I cannot identify where to find the actual pretrained protein embedding. In gensim
I would like to use Word2Vec.load(path/to/embedding.model)
-- where can I find this?
Thank you
I tried to recreate the original doc2vec models in train_docvec_models.ipynb but ran into the following error at "model.build_vocab(documents)" when using "merge=True" in the kmer_hypers
TypeError: unhashable type: 'list'
Do you have any suggestions? Thanks!
Hello,
the URL above is returning a 404. Can you provide an alternate URL?
Thanks
from embeddings_reproduction import embedding_tools
embeds = embedding_tools.get_embeddings_new(['ABCFFFFFFFFFFFF','EFGHQWERRTTUIIO'], seqs, k=5, overlap=False)
getting the following error
'Doc2Vec' object has no attribute 'running_training_loss
Can this model be used to generate features for a large number (more than 10,000) of protein sequences, and is there an improvement after trying the low efficiency of the model prediction?
Hi, I was really interested in your paper, but this repository isn't so user friendly. It would be wonderful to add a setup.py
so it can be installed with pip
and some documentation for users on how to access the embeddings.
I would be happy to send a PR for the setup.py then we could discuss further on the PR
Hello,
many thanks for the github.
when I run test_predictions, i got following errors ...
UnpicklingError Traceback (most recent call last)
in
1 with open('../inputs/X_aaindex_64_cosine.pkl', 'rb') as f:
----> 2 X_aa = pickle.load(f)
UnpicklingError: invalid load key, 'v'.
also,
npicklingError Traceback (most recent call last)
in
13 # Sequence and structure
14 with open('../inputs/T50_seq_struct.pkl', 'rb') as f:
---> 15 X, _ = pickle.load(f)
16 evals, mu = evaluate(df_train, df_test, X, y_col, 'seq_struc', guesses=(1, 100))
17 res = pd.concat((res, evals), ignore_index=True)
UnpicklingError: invalid load key, 'v'.
my version
print(np.version)
1.18.5
print(pd.version)
1.1.0.
or pkl files are corrupted?
thanks,
As some users have noted before, in other issues, it is unclear how to use the final model to generate embeddings for a new set of protein sequences. I have identified the files located at http://cheme.caltech.edu/~kkyang/models/ and I have found the script embedding_tools.py from which I suppose the function get_embeddings_new() is the relevant one. But which doc2vec_file should I use to compute embeddings for my set of sequences? Which one is the "final" one?
As previously noted, if a minimal example of this was included in the main README file I am sure it would enable many more users to benefit from your work.
Hi
when i run all script i get this error message:
in plot_ChRs()
4 df = pd.read_csv('../inputs/localization.txt')
5 with open('../inputs/localization_seq.pkl', 'rb') as f:
----> 6 X_1, terms = pickle.load(f)
7 X_p = pd.read_csv('../inputs/localization_profet.tsv', delimiter='\t')
8 X_p.index = X_p['name']
UnpicklingError: invalid load key, 'v'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.