Git Product home page Git Product logo

deepcov's Introduction

DeepCov v1.0

Fully convolutional neural networks for protein residue-residue contact prediction

David T. Jones and Shaun M. Kandathil

University College London

Requirements:

  • Bash shell

  • Working C and C++ compilers (tested with GCC 4.8.5)

  • Python 2 (tested on 2.7.5) or 3 (tested on 3.4.5) with development libraries and headers

  • The following Python modules (version numbers in brackets were used during development/testing):

    • numpy (1.13.1)
    • Theano (0.9.0)
    • Lasagne (0.2.dev1)

At the time of writing, pip will install Lasagne 0.1 by default, which will not work due to changes in Theano 0.9. You may need to use the 'bleeding-edge' install of Lasagne:

$ pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip

On some distributions, the C++ compiler is a separate add-on package and may not be installed by default. For example, on CentOS you will need to yum install packages gcc AND gcc-c++.

To get the Python development headers and libs you may need to install a separate package, the name of which will depend on your package manager. For example, on CentOS this is python-devel or python34-devel.

Setup and testing:

Run setup.sh.

This will compile and test a C executable, cov21stats. This executable generates covariance or pair frequency data from your input alignment. The script will also test the DeepCov prediction pipeline on a test input alignment, so make sure all dependencies listed above are in place before running it. By default, the scripts will use whatever cc and python3 point to in your shell. These can be changed in deepcov.sh and setup.sh.

The testing procedure will compare a newly generated contact prediction (in test/) against the reference file found in test/example_io. Since different OS/compiler combinations can lead to very slightly different contact scores, only the ranking of the contacts is evaluated when deciding whether the test was successful. To see if there are any differences, please compare the two contact files using a program such as sdiff.

Running:

$ /path/to/deepcov.sh [-h] [-m model_type] [-r receptive_field] -i input_file [-o output_contact_file]

The optional arguments -m and -r are primarily a means to reproduce results in our paper. For most 'production' purposes, you can leave these set to their defaults (covariance model + receptive field of 41 residues).

The input alignment must be in the PSICOV format. If your alignment is in a different format, we recommend using the ConKit Python module to reformat it.

The output is in the CASP contact format.

An example input alignment is provided at test/example_io/1guuA.aln. The corresponding DeepCov output contact file is test/example_io/1guuA.con.

Tips:

For inferring contacts for single alignments, we find that running DeepCov on a (reasonably recent) CPU is faster than running on a GPU, when considering end-to-end runtime on our benchmark sets. For this reason, DeepCov will run on your CPU by default. If you'd like to change this behaviour, edit deepcov.sh and change the value of the THEANO_FLAGS variable near the end of the script (see http://deeplearning.net/software/theano/library/config.html for more details on this and other variables). You will also need to install other prerequisites for running on the GPU; please refer to Theano's documentation.

Benchmarking scripts:

We've included some additional scripts that should reproduce results from our paper. For running the benchmarking scripts, you will need a recent install of R in addition to the dependencies listed above. The benchmark process also requires the R package beanplot. You will also need the PSICOV150 test set, which comes with its own README and can be downloaded here.

Once the dataset is in place, edit run_all_covar_rawfreq.sh to specify the location of the psicov150 set, and then run it, e.g.

./run_all_covar_rawfreq.sh covar 6 11

where 6 and 11 refer to the min and max sequence separation you want to consider, and 'covar' refers to the covariance model.

With these inputs, output will be generated in your DeepCov installation directory, in a file named all_windowsize_results_MEAN_covar_min6_max11.txt.

PLEASE NOTE: the benchmarking process does create a number of rather large files. Use with caution if you have limited storage.

Training scripts:

An example training script and a README can be found in training/, which includes a link to where training data can be found.

Citing:

If you find DeepCov useful, please cite our paper in Bioinformatics:

Jones DT and Kandathil SM (2018). High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features. Bioinformatics 34(19): 3308-3315. Link

deepcov's People

Contributors

shaunmk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepcov's Issues

No output file of Benchmarking?

Hello. Thank you very much for your great work! I really enjoyed your research.

I have followed the README and I was able to successfully run setup.sh and ./deepcov.sh -i test/example_io/1guuA.aln -o test/test.out. That was great. Thank you.
However, when doing Benchmarking scripts, 'all_windowsize_results_MEAN_covar_min6_max11.txt' does not show any results. It only shows window L L/2 L/5 L/10 L=100 and receptive field sizes.

Could you describe the potential reasons?

Training data in fasta format?

Hello,

Do you guys have the training data in FASTA format? Or is there a way I can convert the provided PSICOV-format data to FASTA? Thank you

Best,
Jason

Covairance

Thank you for much for that .
I would like to ask where exactly I can find the function of calculating the covariance from the original data.
I hope you can help me in that

Thanks
Saida

features and labels

Hello, i have two questions hope i get the answers from you

1- first the rule of the sequence alignment is that to extract a chunks of subsequences represents the first sequence

2- and then those alignments are fed to the covariance matrix to extract a matrix called covariance matrix the measures the correlations between each of these alignments with each other

3-from what i understand it that proteins contact map describe the distance matrix as a label , like for example the distance between the first amino acid in the first chain and the first amino acid in the second chain is equal to 200 A, we set a threshold with 8 A so the proteins contact map description for this distance number will be "not in contact" "False" or in binary world "0" is im right with that understanding

My Questions
First
1-what is the rule of the covariance matrix
2- what is the rule of proteins contact map are those the labels of the matrix distances if so what is the rule of the covariance matrix
3- what is the input to the neural network model
A- what is the feature, are those the distance matrix if yes what is the rule of covariance matrix
B- what is the label of these features are Proteins contact map is the labels in (0's and 1's )

Second
1- i want from you kindly to give me a hint or steps which is the first script to use and second and so on cuz i want to cite your paper so i started to inspired from your great work

thanks in advance

GPU memory size

Hello,

I'm trying to run the training script that was given, and did not modify any code. I'm running it on GPU, and it throws this memory exception for the first training example:

Starting training...
Traceback (most recent call last):
  File "/home/jason/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM failed to allocate working memory of 1600 x 246016


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "lasagne_cov_train.py", line 249, in <module>
    main()
  File "lasagne_cov_train.py", line 176, in main
    train_err += train_func1(inputs, targets, wtmaps)
  File "/home/jason/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 917, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/jason/anaconda3/lib/python3.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/jason/anaconda3/lib/python3.7/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/home/jason/anaconda3/lib/python3.7/site-packages/theano/compile/function_module.py", line 903, in __call__
    self.fn() if output_subset is None else\
RuntimeError: GpuCorrMM failed to allocate working memory of 1600 x 246016

Apply node that caused the error: GpuCorrMM{half, (1, 1), (1, 1), 1, False}(GpuContiguous.0, GpuContiguous.0)
Toposort index: 235
Inputs types: [GpuArrayType<None>(float32, (True, False, False, False)), GpuArrayType<None>(float32, 4D)]
Inputs shapes: [(1, 64, 496, 496), (64, 64, 5, 5)]
Inputs strides: [(62980096, 984064, 1984, 4), (6400, 100, 20, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuCAReduceCuda{add}{0, 2, 3}(GpuCorrMM{half, (1, 1), (1, 1), 1, False}.0), GpuElemwise{sub,no_inplace}(GpuCorrMM{half, (1, 1), (1, 1), 1, False}.0, GpuElemwise{Composite{(((i0 / i1) / i2) / i3)}}[]<gpuarray>.0), GpuElemwise{sub,no_inplace}(GpuCorrMM{half, (1, 1), (1, 1), 1, False}.0, InplaceGpuDimShuffle{x,0,x,x}.0)]]

I'm just wondering what's the GPU memory you guys ran this on? or is there any way to resolve this? I'm running this on a 8GB GeForce RTX 2070 Max-Q and a 32GB RAM, which is pretty decent but clearly cant handle the size of this matrix...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.