Comments (4)
Hi, Shenggeng!
For the first problem, this is because the same drug-drug pair are recorded twice in the data. For example, (sildenafil, Isosorbide mononitrate) and (Isosorbide mononitrate, sildenafil) for another. But they are the same in fact. So we delete half of them.
For the second problem. Just try to learn the usage of RDKit package. For example, for the drug Isosorbide mononitrate. We can collect its SMILES [H][C@]12OCC@@H[C@@]1([H])OC[C@@h]2O from DrugBank.
So here is the code:
from rdKit import Chem
from rdkit.Chem import AllChem
smile = '[H][C@]12OC[C@@H](O[N+]([O-])=O)[C@@]1([H])OC[C@@H]2O'
mol = Chem.MolFromSmiles(smile)
morgan_hashed = AllChem.GetMorganFingerprintAsBitVect(mol,2,nBits=881)
morgan_hashed.ToBitString()
It will be a bit vector of 881 length.
from ddimdl.
Hello, Yifan!
Thank you very much for your reply. I have understand the first question. Thank you very much!
But I still have questions about the second question.
For drug DB01296, his smiles is' N[C@H]1C(O)OC@HC@@H[C@@h]1O '. Through the code you provided, I did get a 881 dimensional vector. But in the event.db , its smiles features are 9|10|14|18|19|20|178|181|283|284|285|286|299|308|332|338|339|340|341|344|345|346|347|351|352|365|366|367|380|393|405|406|528|563|566|567|571|582|592|614|615|617|637|638|639|643|661|662|663|679|680|681|682|683|689|690|691|701|703.
I wonder what these numbers mean?Does it mean that these positions are 1 in the 881 dimensional vector? But if this is the case, for the drug db01296, its ninth digit is 0, but there are 9 in these numbers. And its 16th digit is 1, but there is no 16 in these numbers.
from ddimdl.
Yes, you are right.
The reason is because the fingerprint methods are different. For the fingerprint in the current dataset, it is obtained by a former student. He used the RDkit in JAVA.
The code in my code used MorganFingerprint. It is the most common method. I have test the result. There is little difference between the current dataset's fingerprint and MorganFingerprint.
from ddimdl.
OK, I see. Thank you for your reply!Thank you very much!
from ddimdl.
Related Issues (20)
- KeyError: 'DB00001' HOT 5
- Smiles Didn't extracted using DRKG_drug_spider.py HOT 6
- Input shapes for LSTMS HOT 4
- table 2 results in paper HOT 3
- 关于数据集的问题 HOT 2
- Request Full Code of 3 Tasks for Further Study HOT 5
- Request full code HOT 2
- about the the running rounds
- Fingerprints
- Ask for the DDI types of Devision HOT 4
- 关于数据处理的问题 HOT 1
- Request full code
- i need some help in testing the model after get files
- i need some help in testing the model after get files model.h5 and smile+target+enzyme_each_DDIMDL.csv
- problem in test the model
- task2 and task3
- error related to sklearn.linear_model.logistic
- Database connectivity issue
- NLP_Process
- NLPProcess running error HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ddimdl.