Git Product home page Git Product logo

dconstruct's Introduction

DConStruct

Hybridized distance- and contact-based hierarchical protein folding

Installation

Installing DConStruct is very straightforward. The following instructions should work for 64-bit Linux system:

  • Make sure you have Python with NumPy, SciPy and Scikit-learn installed. DConStruct has been tested on Python 2.7.5 (numpy version 1.15.2, scipy version 0.12.1 and scikit-learn version 0.19.2), but it should run on higher versions as well.
  • Install MODELLER, a license key is required. This can be installed using command conda install modeller -c salilab. DConStruct has been tested on MODELLER version 9.20.

That's it! DConStruct is ready to be used.

Usage

To see the usage instructions, run python DConStruct.py -h


*************************************************************************
*                            DConStruct                                 *
*  Hybridized distance- and contact-based hierarchical protein folding  *
*  For comments, please email to [email protected]               *
*************************************************************************

Usage: DConStruct.py [options]

Options:
  -h, --help  show this help message and exit
  -r RR       rr file in CASP format containing the contact map (mandatory)
  -a AA       fasta file containing the amino acid sequence (mandatory)
  -s SS       secondary structure file (mandatory)
  -m M        MODELLER program path that contains modpy.sh script (mandatory)
  -o OUTPUT   existing output directory path (mandatory)
  -n NO       positive integer to be used as seed (optional); default 7
  -c CTYPE    contact type ca or cb (optional); default cb
  -x L        top xL contacts, where L is the sequence length (optional);
              default 8

File formats and parameters

  • Amino Acid (-a): The first line contains the header of the target protein and the second line contains the amino acid sequence. For example, see ./examples/input/T0968s2.fasta
  • Contacts (-r): The first line contains the amino acid sequence followed by list of contact rows using a five-column format similar to CASP RR format. In each contact row, first two columns are the residue pairs in contact, third and fourth columns are lower and upper bounds of their distance (in Å) respectively, and fifth column is a real number indicating the probability of the two residues being in contact. For example, see ./examples/input/T0968s2.rr
  • Secondary structure (-s): Single line containing a sequence of 3-state secondary structure (i.e. 'H', 'E' and 'C'). For example, see ./examples/input/T0968s2.ss
  • Modeller path (-m): Modeller program path that contains modpy.sh script.
  • Output (-o): Output directory path. The directory must exist.
  • Random seed (-n): Positive integer to be used as random seed.
  • Contact type (-c): To define whether the contact is Cα–Cα or Cβ–Cβ. Use -c ca for Cα–Cα contacts and -c cb for Cβ–Cβ contacts.
  • Contact cutoff (-x): To select top xL contacts, where L is the sequence length of protein. For example, use -x 8 to select top 8L contacts.

To run DConStruct with predicted distance-based information, we provide a helper script that can generate distance-based 3-class contact (rr file) from distance histogram (distogram) using rawdistpred.current generated by DMPfold. The script is available here.

Test DConStruct

We give an example of running DConStruct on CASP13 FM target T0968s2.

Create an output directory mkdir output/.

Run python DConStruct.py -r examples/input/T0968s2.rr -a examples/input/T0968s2.fasta -s examples/input/T0968s2.ss -o output/ -c cb -x 8 -m your/modeller/path

Top predicted model will be generated at output/T0968s2_model1.pdb. The predicted 3D model is given here and the output screen should look like this.

DConStruct is reasonably fast. Depending on the sequence length of the target protein, DConStruct takes only a few minutes to a few hours to complete.

Data

  1. Download input data for running DConStruct on FRAGFOLD 150 dataset, CASP12 and CASP13 FM dataset, 510 Membrane protein dataset, and 15 EVfold dataset
  2. Download DConStruct predicted models for FRAGFOLD 150 dataset, CASP12 and CASP13 FM dataset, 510 Membrane protein dataset, and 15 EVfold dataset

Cite

If you find DConStruct useful, please cite our PLOS Computational Biology paper.

dconstruct's People

Contributors

roche78 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

dconstruct's Issues

how to generate the input files

Hi,

Its an interesting module to calculate the quality of the structures generated from homology modeling.
I would like to know how to generate the input files, say if I have a model1.pdb generated from Modeller.
Is there any script to generate the model1.rr and model1.fasta and model1.ss files.
Please can you update me on these will be very helpful in using the module.

Thank you

Inability to reproduce DMPfold result reported in the bioRxiv paper

Table 3 and Table S5 In the bioRxiv manuscript for DConStruct show that DConStruct outperforms the best state-of-the-art method, DMPfold, on 40 CASP FM target. However, in my own benchmark for a subset of 30 CASP FM targets, in terms of both average and median first model TM-score, DMPfold using DeepMSA significantly outperforms DConStruct, which presumably also uses DeepMSA according to the manuscript. Is there any misunderstanding? Thank you.

Target DMPfold+DeepMSA (my run) DMPfold (reported in preprint) DConStruct (reported in preprint)
T0859-D1 0.2409 0.2446 0.1930
T0862-D1 0.8145 0.2755 0.5056
T0863-D1 0.5063 0.2933 0.5030
T0864-D1 0.7319 0.4792 0.6950
T0866-D1 0.7768 0.7369 0.5820
T0869-D1 0.7054 0.7729 0.7448
T0870-D1 0.6807 0.4912 0.6724
T0886-D1 0.3616 0.3214 0.3042
T0886-D2 0.6716 0.6894 0.6944
T0892-D2 0.6401 0.6411 0.6960
T0897-D2 0.6338 0.2187 0.2031
T0898-D1 0.6875 0.3716 0.6463
T0900-D1 0.5865 0.6207 0.6251
T0912-D3 0.6457 0.5599 0.5784
T0918-D1 0.5534 0.5858 0.5547
T0918-D2 0.5853 0.5632 0.3157
T0918-D3 0.5430 0.4087 0.5148
T0953s1-D1 0.2932 0.3567 0.3997
T0953s2-D2 0.5296 0.5258 0.6466
T0957s1-D1 0.4685 0.2140 0.3496
T0957s2-D1 0.4391 0.5481 0.7022
T0968s1-D1 0.7228 0.6635 0.6823
T0968s2-D1 0.8098 0.5256 0.7371
T0960-D2 0.5688 0.2500 0.3601
T0980s1-D1 0.3798 0.2611 0.2905
T0990-D1 0.5153 0.5156 0.3254
T0990-D3 0.3686 0.2625 0.2479
T1021s3-D1 0.6117 0.3951 0.4904
T1021s3-D2 0.2806 0.2975 0.3345
T1022s1-D1 0.5381 0.5253 0.2682
Average 0.5630 0.4538 0.4954
Median 0.5771 0.4852 0.5102

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.