Git Product home page Git Product logo

lucagl / moad_ligandfinder Goto Github PK

View Code? Open in Web Editor NEW
14.0 1.0 4.0 49 KB

A script able to extract ligands from pdb structure(s) and remove them from the original structure. The ligands are identified according to the *valid* ligands found in the binding MOAD database. The ligand can be extracted as a pdb or as a xyz (coordinate) file. Other options such as converting the queried pdb to pqr are available .

License: MIT License

Python 100.00%
pdb-files ligands pqr-files binding-moad extract-ligand remove-ligands

moad_ligandfinder's Introduction

MOAD Ligand Finder

This script was originally conceived to build reliable databases of structures (in PQR format) and relative ligands extracted from structures. The structures are given as an input in PDB format.

However, this tool could be handy in general. Indeed, is not always obvious how to identify the ligand within a pdb and the naming scheme can be cumbersome (combinations of Residue ID, Chain, Residue Name can represent a single ligand or not). I hope this script could help navigate through this. Ligand names are fetched from the binding MOAD database: "http://bindingmoad.org/pdbrecords/index/" . Only valid ligands (according to MOAD's criteria) are considered. The naming scheme is un-ambiguous and inspired from the binding MOAD database. Note: if a (ligand) file with the same name exists in the working directory, a number will be appended to the created ligand (pdb or xyz) file (see also below).

NOTE: The ligands extracted contain only HEAVY atoms (purged of hidrogens). Please, if this behavior is not always desiderable, write me a feedback to make it user-defined.

Requirements:

Python3 and:

Few non-builtin libraries:

  • Numpy
  • Requests

Only if the option --PQR is activated:

Pdb2Pqr installed:

This software can be found here: https://github.com/Electrostatics/pdb2pqr I do not take any credit on the very nice pdb2pqr software

Usage:

"Manual" Mode

python3 lfetch.py will ask to the user to insert the name of the PDB structure. If the pdb is not found in the working directory, it will be automatically downloaded. The user can keep inserting other queries, the behavior will be identical and new info appended in the ligandMap file (see below). To exit type ctrl-C .

  • Output:
    1. ligandMap.txt file containing ligand(s) name for queried structure(s).
    2. pdb or xyz file containing ligand heavy atoms (ligand extraction filters out Hidrogens) Ligand names are fetched from the MOAD database: "http://bindingmoad.org/pdbrecords/index/" . Only valid ligands are considered. The naming scheme is un-ambiguous and inspired from the MOAD database. CAREFUL: if a (ligand) file with the same name exist in the working directory, a number will be appended to the created ligand (pdb or coordinate) file.

"Database" Mode

python3.py lfetch -d Will look for all pdb files in the working directory (not containing "_") and automatically produce a ligandMap.txt file for them (1 line per structure followed by its ligands) and extract all valid ligands in separate files. The ligands will be placed in an "output" folder created by the script. If the --PQR option is present, also the pqr files will be placed in the "output" folder.

Other options

  • NEW --purgePDB : A new pdb named: <original name>_clean.pdb is created and contains the original structure purged of any HETATM and extracted ligand(s) (non extracted ones which are not HEDATM, due to failed matching scheme -- note: the --safe option will increase those scenarios-- will be present in the purged file)
  • --PQR : A pqr file purged of all found ligands and HETATM will be produced using pdb2pqr with AMBER force field (if needed it could be extended to any force field, please post an issue and I will modify the code)
  • --XYZ: The ligands are saved in coordinates files (.xyz) containing only their coordinates.
  • -c: "Chunk mode" working only if -d option is also activated. The ligand (and pqr, if the above option is set) files with their correspondend ligandMap are distributed in separate folders each one including the number of structures indicated by the user. The folder are simply indicated with numbers. This option was conceived to create a database split in different chunks for parallel use in codes.. This functionality is still very basic: if there are some skipping of structures (for instance due to the --excludeLarge option below), no rearrangement of the folder content is performed.
  • --safe: Partial matches with respect to what expected from the MOAD database are excluded. This means that this type of ligands are not extracted and their entry in the ligandMap.txt file is commented
  • --quiet: do not print info while running.
  • Advanced: --excludeLarge : large pqr files (more than 10000 lines) will be excluded. This means that the line concerning those structures in ligandMap is commented and their relative ligands are not extracted.

Please, do not hesitate to post issues, improvements suggestion or report bugs. I will try to answer them.

Recomendations

Check the "error_log.txt" file after running. Will report the reasons for which a ligand or a full structure was skipped and other useful info.

Example of usage and ligandMap file produced:

python3 lfetch.py

Insert pdb name

6gj6

^C

TOTAL NUMBER OF STRUCTURES SUCCESFULLY PROCESSED = 1

Content of ligandMap.txt: name of the structure with a number in front indicating the number of valid ligands found (according to binding MOAD) followed by the names of the ligands (one line per distint ligand). The name of the ligands appearing in ligandMap.txt is identical to the name of the file containing the ligand coordinates. Example of ligand.txt produced from the above call:

\# ************** PDB ligand Map ***************

\# Created using '*BuildMap module*

\# Ligand validation based on binding MOAD database.

\# Author L. Gagliardi, Istituto Italiano di Tecnologia

\# LOCAL TIME = ---

\# --------------

2 6gj6

EZZ_A:204

GCP_A:203 ~

ACKNOLEDGMENTS:

This is a small accessory project in my work, but still I would very much appreciate an acknowledgment if this helped you save some time :). Luca Gagliardi, MOAD ligandFinder, (2021) GitHub repository and or:

Cite

(*) L. Gagliardi et al, SHREC 2022: Protein-ligand binding site recognition, Computers & Graphics https://doi.org/10.1016/j.cag.2022.07.005 (2022)

moad_ligandfinder's People

Contributors

lucagl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

moad_ligandfinder's Issues

lfinder is only partially robust to mismatches between MOAD naming schemes and pdb structure lines

The naming scheme of binding moad is to associate any residue name appearing on a line to a increasing res number, for each chain (for instance 1cqf--> BGC GAL GLA F:1 G:1 I:1 ..) will look for ligand1=(BGC F 1, GAL F 2, GLA F 3), ligand2 =(BGC G 1, GAL G 2, GLA G 3), ligand2 =(BGC I 1, GAL I 2, GLA I 3) ecc..

So far, the strategy used to overcome mismatches is to skip progressively in order (from the MOAD naming scheme) the name-chainID-ResNumber. This works most of times but not always. An example is 4z0u.pdb where the pdb skips the first 4 names and start from the 5th but keeping to index of the number at the beginning (usually everything is shifted accordingly).

In general, one could device more flexible strategies to overcome mismatches

Ligand file name error in win64

hello, the lfetch.py script in the windows system would generate illegal Ligand file name (could occur colon in the file name) and therefore cannot extrat the ligand corretly. But it runs perfectly in linux, see below:

windows 10, 64

(gpu02_py) G:\05Coding\GNN_DTI-master\dpi_v4\pretrain\MOAD_ligandFinder-1.1>python lfetch.py
Insert pdb name
1FWE
Valid ligands found: 1
Excluded: 0
Ligands: ['HAE_C:989']
Insert pdb name

linux

(/home/dejun/work_spaces/py/conda_env/py365) [dejun@mu01 MOAD_ligandFinder-main]$ python lfetch.py
Insert pdb name
1FWE
Valid ligands found: 1
Excluded: 0
Ligands: ['HAE_C989']
Insert pdb name

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.