Git Product home page Git Product logo

psipred's Introduction

PSIPRED RELEASE NOTES
=====================

PSIPRED Version 4.0

By David Jones, January 2016

*** IMPORTANT *****************************************************
NCBI are now trying to move users to the new BLAST+ package. Please
see the README file in the BLAST+ subdirectory for more information
on PSIPRED's support for BLAST+. For now the preferred option is
to stick with the classic BLAST package as the default. If the tar
or rpm file you are downloading from NCBI has "+" in the filename,
then you are downloading BLAST+ rather than BLAST.
*******************************************************************

Here are some very brief notes on using the PSIPRED V4 software.

PSIPRED is supplied in source code form - it must be compiled before
it can be used. The code should compile on any ANSI C compiler e.g.
the GNU C compiler.

Please see the LICENSE file for the license terms for the software.
Basically it's free to anyone (including commercial users) as long as
you don't want to sell the software or, for example, store the results
obtained with it in a database and then try to sell the database.
If you do wish to sell the software or use it in a commercial product,
then please contact UCL Business (http://www.uclb.com).

PSIPRED is run via a tcsh shell script called "runpsipred" - this is a
very simple script which you should be able to convert to Perl or whatever
scripting language you like.

If your sequence does not have any homologues in the current data banks,
then it is possible to run PSIPRED on a single sequence. In this case,
PSIPRED is run via a tcsh shell script called "runpsipred_single". Unfortunately,
like every other secondary structure prediction method, PSIPRED does not
perform as well on single sequences. Any secondary structure prediction based on
a single sequence should be considered as unreliable.

Before running PSIPRED, please check the runpsipred and runpsipred_single scripts
to see if the path variables are set to wherever you have installed the
program and data files. The default is to assume that the program is
installed in the current directory - this is probably NOT what you want!

INSTALLATION
============

Firstly compile the software:

tcsh% cd to-wherever-you-untarred-PSIPRED

tcsh% cd src

tcsh% make

tcsh% make install

The executables will be placed in the PSIPRED bin directory.

You must also install the PSI-BLAST and Impala software from the
NCBI toolkit, and also install appropriate sequence data banks.

The NCBI toolkit can be obtained from URL ftp://ftp.ncbi.nih.gov

PSI-BLAST executables can be obtained from ftp://ftp.ncbi.nih.gov/blast


EXAMPLE USAGE
=============

In this example the target sequence is called "example.fasta":

tcsh% runpsipred example.fasta

Running PSI-BLAST with sequence example.fasta ...
Predicting secondary structure...
Pass1 ...
Pass2 ...
Cleaning up ...
Final output file: example.horiz
Finished.

That's it - you can then look at the output:

tcsh% more example.horiz


SPECIAL OPTIONS
===============

The psipass2 program has several special options which you can use if you wish.

For example, the default command is as follows:

psipass2 weights_p2.dat 1 1.0 1.0 output.ss2 input.ss > output.horiz

Arguments 2,3 & 4 are as follows:

Argument 2: No of filter iterations
This controls the amount of "smoothing" that is carried out on the final
prediction. The recommended setting is 1, but it may be worth trying
higher values to increase the level of smoothing.

Argument 3&4: Helix/Strand Decision constants
These options control the bias for helix (Arg3) and strand (Arg4) predictions.
The default values are equal to 1.0, but if you know your protein is, for
example, mostly comprised of beta strands then you can increase the bias
towards beta strand prediction. For example:

psipass2 weights_p2.dat 1 1.0 1.3 output.ss2 input.ss > output.horiz

increases the bias towards beta strand prediction by approximately 30%.

 

SEQUENCE DATA BANK
==================

As of PSIPRED V4.0 onwards, we no longer believe it is necessary for the sequence
data banks used with PSI-BLAST to be filtered to remove low-complexity regions,
transmembrane regions, and coiled-coil segments. The search data bank can
therefore be any large non-redundant protein sequence data bank, with
UNIREF90 (http://www.uniprot.org/help/uniref) being the recommended one.



CHANGES FROM THE ORIGINAL PSIPRED
=================================

The following is a quick summary of the main changes since the original
PSIPRED.

1. The program now makes use of PSI-BLAST binary checkpoint files (using the
Impala program makemat) to reduce loss of precision when parsing the original
ASCII position specific matrices.

2. By default the 1st pass uses an average of 3 different neural network
weight sets - this improves prediction accuracy slightly.

3. In addition to the normal horizontal summary output format, the program
now also produces a full table of results which shows the individual
coil, helix, strand network outputs.

4. A one-line header is output at the start of the output files to allow
THREADER (and other programs) to automatically recognise a PSIPRED
prediction.

5. An experimental interface to BLAST+ has been added (V3.0). This will extract
PSSM data directly from ASN.1 checkpoint files.

6. Minor formatting bugs in .horiz file output for very long sequences
have now been fixed (V3.21).

7. Minor output bug loses singleton residue coil predictions fixed (V3.3)

8. V4.0 released: new neural network architectures.

psipred's People

Contributors

danbuchan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

psipred's Issues

docker image and code license

I was wondering if there exists a public docker image of PsiPred or if the code license allows generating a docker image and uploading it on a public registry such as quay.io or docker hub ?

Thank you !
Eliza

[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048

Dear professor,

When I run my data with runpsipred, it gave me an error "[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048". I have try my best to solve this problem, but it did not work.

Before I run the really input file and reference database. I have test the example data with a built data of several sequences. I got the right output file.

The really database I need contining 86007 sequence. and the input file containing 6048 sequences. Details showing as follwing:

#formatting the database
formatdb -i all_seq_sp6048_cluster.faa -n all_seq -t all_seq -p T

#the formatdb.log

========================[ Nov 26, 2021 8:39 AM ]========================
Version 2.2.26 [Sep-21-2011]
Started database file "all_seq_sp6048_cluster.faa"
Formatted 86007 sequences in volume 0
SUCCESS: formatted database all_seq_db

reedited runpsipred (just changing two lines)

The name of the BLAST data bank

#set dbname = uniref90
#set dbname = test_db
set dbname = all_seq

Where the NCBI programs have been installed

#set ncbidir = /usr/local/bin
set ncbidir = /data/liqingmei/tools/psipred/blast-2.2.26/bin

#the error
#tail -n 5 psitmp2049390c0c660c.blast
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048

#####################

This analysis is very important for me, would you help me to solve this problem?

Thanks a lot.

How to use psipred with multiple fasta seq in a file

Hi,
I have a fasta file containing more than 1000 sequencing. I am looking to predict secondary structure prediction of these sequences to see the structure of the motif in these sequences. Please suggest if I can use psipred in this regards.

psipred error

i ran a psipred command of pgenthreade.sh script
tcsh psipred /media/kakarot/ppi/genthreader/sequence.iter3.mtx /media/kakarot/ppi/psipred/data/weights.dat /media/kakarot/ppi/psipred/data/weights.dat2 /media/kakarot/ppi/psipred/data/weights.dat3 > /media/kakarot/ppi/genthreader/sequence.pgen.ss

it results in

Unmatched ''.`

anyone please can help

ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048

Dear professor,

When I run my data with runpsipred, it gave me an error "[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048". I have try my best to solve this problem, but it did not work.

Before I run the really input file and reference database. I have test the example data with a built data of several sequences. I got the right output file.

The really database I need contining 86007 sequence. and the input file containing 6048 sequences. Details showing as follwing:

#formatting the database
formatdb -i all_seq_sp6048_cluster.faa -n all_seq -t all_seq -p T

#the formatdb.log

========================[ Nov 26, 2021 8:39 AM ]========================
Version 2.2.26 [Sep-21-2011]
Started database file "all_seq_sp6048_cluster.faa"
Formatted 86007 sequences in volume 0
SUCCESS: formatted database all_seq_db

reedited runpsipred (just changing two lines)

The name of the BLAST data bank

#set dbname = uniref90
#set dbname = test_db
set dbname = all_seq

Where the NCBI programs have been installed

#set ncbidir = /usr/local/bin
set ncbidir = /data/liqingmei/tools/psipred/blast-2.2.26/bin

#the error
#tail -n 5 psitmp2049390c0c660c.blast
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048
[blastpgp] ERROR: ncbiapi [000.000] ObjMgrNextAvailEntityID failed with idx 2048

#####################

This analysis is very important for me, would you help me to solve this problem?

Thanks a lot.

pgenthreader.sh (psipred: command not found)

I am trying to run pgenthreader.sh script, the result is :

Finished 3 iterations PSI-BLAST usage : psipred weight-file seq-file usage : psipass2 weight-file seq-file DCA DCB PSIPRED failed ... exiting early (It is creating the sequence.pgen.ss file but its blank)

so, I tried to run psipred command separately : /media/kakarot/ppi/psipred/bin$ psipred /media/kakarot/ppi/genthreader/sequence.iter3.mtx /media/kakarot/ppi/psipred/data/weights.dat /media/kakarot/ppi/psipred/data/weights.dat2 /media/kakarot/ppi/psipred/data/weights.dat3 > /media/kakarot/ppi/genthreader/sequence.pgen.ss

But i am getting : psipred: command not found

Position Specific Scoring Matrix (PSSM)

Thank you for sharing that.
I would like to know if you have source code for calculating PSSM for any data set of sequences.
In fact I have my own data set (Single sequences) I need to know how to calculate PSSM .
And if there is any source code for that.

Thanks

Ncbi BLAST, nr database

I am running the command for BLAST : blastp -query /Users/shalini/Desktop/shalini/project/unmodelled_fasta/AAA18895.fasta -outfmt "7 sacc qcovs pident ppos evalue" -db=/Users/shalini/Downloads/nr -out=/Users/shalini/Desktop/shalini/project/blast_resultnr

After running for one and half hour, it results in Error memory mapping:/Users/shalini/Downloads/nr.79.phr openedFilesCount=251 threadID=0 BLAST Database error: Cannot memory map /Users/shalini/Downloads/nr.79.phr. Number of files opened: 251

I am working on macOS. also I have built the database of nr which results in 114 .phr, .pin, .psq and .pog files.

Please help me, if anyone knows.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.