Git Product home page Git Product logo

pftools3's Introduction

install with bioconda install with EasyBuild install with Docker install with Singularity Anaconda-Server Badge Anaconda-Server Badge

PfTools

Table of Contents

Foreword

(C) Copyright SIB Swiss Institute of Bioinformatics available from https://github.com/sib-swiss/pftools3 under GPL v2. See LICENSE.

Version 3 contains the original FORTRAN 77 pftools (release 2.3) and the new pftoolsV3 programs.

Installation

Using Docker

First you must have Docker installed and running.
Secondly have a look at the availabe pftools biocontainers at quay.io or at Docker Hub.
Then:

# get the chosen pftools container version
docker pull quay.io/biocontainers/pftools:3.2.11--pl5321r41h4b1256a_2
#   or
docker pull sibswiss/pftools:3.2.12
# use an pftools's tool e.g. pfscan 
docker run quay.io/biocontainers/pftools:3.2.11--pl5321r41h4b1256a_2 pfscan -h
#   or
docker run sibswiss/pftools:3.2.12 pfscan -h

Using Singularity

First you must have Singularity installed and running. Secondly have a look at the availabe pftools biocontainers at quay.io or at Docker Hub.
Then:

# get the chosen pftools container version
singularity pull docker://quay.io/biocontainers/quay.io/biocontainers/pftools:3.2.11--pl5321r41h4b1256a_2
#   or
singularity pull docker://sibswiss/pftools:3.2.12
# run the container
singularity run pftools_3.2.11--pl5321r41h4b1256a_2.sif

You are now in the container. You can use an pftools's tool e.g. pfscan doing

pfscan -h

Bioconda

conda install -c bioconda pftools

EasyBuild

eb --robot --rpath pftoolsV3-3.2.11-foss-2021a.eb

Manually

See here for more information

After installation, in the share/examples/ subdirectory, the test_V3.sh shell script is a good starting point for using pfsearchV3/pfscanV3.

Generalized profile syntax

A description of the generalized profile syntax is given in file:

it was originally published in

  • Bucher P, Bairoch A. A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. Proc Int Conf Intell Syst Mol Biol. 1994;2:53-61. PubMed PMID: 7584418.

Algorithms description

Technical details about how profiles can be constructed and parametrized are summarized in file:

The very first paper describing the PFTOOLS algorithms is

  • Lüthy R, Xenarios I, Bucher P. Improving the sensitivity of the sequence profile method. Protein Sci. 1994 Jan;3(1):139-46. PubMed PMID: 7511453; PubMed Central PMCID: PMC2142471.

The generalized profile alignment method is closely related to other "classical" algorithm for aligning sequences. For example, it encompasses the Smith-Waterman algorithm and the Viterbi decoding of profile-HMM (as implemented in HMMER2 for example). Relationships between these algorithm were investigated in

  • Bucher P, Hofmann K. A sequence similarity search algorithm based on a probabilistic interpretation of an alignment scoring system. Proc Int Conf Intell Syst Mol Biol. 1996;4:44-51. Review. PubMed PMID: 8877503.

  • Bucher P, Karplus K, Moeri N, Hofmann K. A flexible motif search technique based on generalized profiles. Comput Chem. 1996 Mar;20(1):3-23. PubMed PMID: 8867839.

Relatively detailed explanations about the profile normalized scores, as well as its comparisons with other popular statistics for sequence alignments can be found in

  • Pagni M, Jongeneel CV. Making sense of score statistics for sequence alignments. Brief Bioinform. 2001 Mar;2(1):51-67. PubMed PMID: 11465063.

The heuristic score is succinctly described in

  • Schuepbach T, Pagni M, Bridge A, Bougueleret L, Xenarios I, Cerutti L. pfsearchV3: a code acceleration and heuristic to search PROSITE profiles. Bioinformatics. 2013 May 1;29(9):1215-7. doi: 10.1093/bioinformatics/btt129. PubMed PMID: 23505298; PubMed Central PMCID: PMC3634184.

Applications of the Pftools

Two databases were created based on the PFTOOLS technology: PROSITE and HAMAP and they are still actively maintained

  1. https://prosite.expasy.org/
  2. https://hamap.expasy.org/

The PFTOOLS were initially designed with handling capabilities of DNA sequences. The latest released pfsearchV3 feature support for FASTQ and SAM formats. DNA applications are for example given in

  • Pagni M, Niculita-Hirzel H, Pellissier L, Dubuis A, Xenarios I, Guisan A, Sanders IR, Goudet J, Guex N. Density-based hierarchical clustering of pyro-sequences on a large scale - the case of fungal ITS1. Bioinformatics. 2013 May 15;29(10):1268-74. doi: 10.1093/bioinformatics/btt149. PubMed PMID: 23539304 ; PubMed Central PMCID: PMC3654712.

  • Schmid-Siegert E, Richard S, Luraschi A, Mühlethaler K, Pagni M, Hauser PM. Mechanisms of Surface Antigenic Variation in the Human Pathogenic Fungus Pneumocystis jirovecii. MBio. 2017 Nov 7;8(6). pii: e01470-17. doi: 10.1128/mBio.01470-17. PubMed PMID: 29114024; PubMed Central PMCID: PMC5676039.

Authors

Mas:

  • Philipp Bucher developped the Fortran code
  • Thierry Schuepbach developped the C code

Other contributors:

  • Kay Hofmann
  • Volker Flegel
  • Edouard de Castro
  • Lorenzo Cerruti
  • Marco Pagni
  • Sébastien Moretti
  • Jerven Tjalling Bolleman

SIB Swiss Institute of Bioinformatics Vital-IT Group Quartier Sorge - Batiment Amphipole 1015 Lausanne Switzerland

pftools3's People

Contributors

beatrice79 avatar c4chris avatar euphemizm avatar jervenbolleman avatar juke34 avatar mpagni12 avatar smoretti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pftools3's Issues

The new version of pfsearchV3 crashes with big databases

I thought the problem had been solved, but when I donloaded the current version (as of Aug 24), compiled it on linux, and ran it against big databases (uniprot-sized), the process gets killed without producing output. This behavior is independent of the profile and does not occur with small databases.

An older version (from 2019) works fine. If I remember correctly, I had to increase CHAIN_SEGMENT_SIZE and MAX_NUMBER_OF_CHAIN in ReadSequence.c back then, but in the new version, this does not help (and MAX_NUMBER_OF_CHAIN does not exist anymore).

Any ideas?

Build time test issues for version 3.2.6

Hi,
I'm trying to update the Debian package of pftools3. Unfortunately some of the tests are failing:

Running tests...
/usr/bin/ctest --force-new-ctest-process -j4
Test project /build/pftools-3.2.6/obj-x86_64-linux-gnu
    Start 1: execute_test_V2.sh
    Start 2: execute_test_V3.sh
    Start 3: check_output_of_test_V3.sh
    Start 4: execute_test_scan_search.pl
1/5 Test #2: execute_test_V3.sh .................***Failed    0.24 sec
#!/bin/sh -ve

#----------------------------------------------------------------------#
# PATHS declaration (set by cmake/make: do not modify them here)
#----------------------------------------------------------------------#
PFSEARCH=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/pfsearch
PFSCAN=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/pfscan
PFSEARCHV3=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/C/pfsearchV3
PFSCANV3=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/C/pfscanV3
PFCALIBRATEV3=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/C/pfcalibrateV3
PFINDEX=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/C/pfindex

GTOP=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/gtop
HTOP=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/htop
PFSCALE=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/pfscale
PFMAKE=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/pfmake
PSA2MSA=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/psa2msa
PFW=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/pfw
PTOH=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/ptoh
PTOF=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/ptof
P2FT=/build/pftools-3.2.6/obj-x86_64-linux-gnu/src/Fortran/2ft # NB: sh does not allow variable name starting with a digit

SORT_PSA=/build/pftools-3.2.6/src/Perl/sort_fasta.pl # FIXME: use cmake syntax
MAKE_IUPAC_CMP=/build/pftools-3.2.6/src/Perl/make_iupac_cmp.pl # FIXME: use cmake syntax
SCRAMBLE=/build/pftools-3.2.6/src/Perl/scramble_fasta.pl # FIXME: use cmake syntax

CMPDIR=/build/pftools-3.2.6/data/Matrices
TMPDIR=/tmp/test_V3
mkdir -p $TMPDIR

#----------------------------------------------------------------------#
# The PFTOOLS is a powerful software to align biological sequences.
# Owing to the 'generalized profile syntax', it allows the fine-tuning
# of an alignent scoring system, beyond what is feasible by most other
# software. Despite the PFTOOLS are crippled by a lot of legacy code,
# they are still extremely useful for precision work .
#
# Nota bene to use this script as a testsuite:
# (1) The output order of pfsearch is reproducible, as well as the one of
#     pfsearchV3 with -t 1.
# (2) Refrain using any pipe.
#----------------------------------------------------------------------#

#----------------------------------------------------------------------#
# Searching for the occurence of the SH3 domain within the VAV oncogene,
# using pfsearch V2 ...
#----------------------------------------------------------------------#
$PFSEARCH -f ./sh3.prf ./VAV_HUMAN.seq
  11.796    663 pos.      617 -     660 sp|P15498|VAV_HUMAN Proto-oncogene vav OS=Homo sapiens OX=9606 GN=VAV1 PE=1 SV=4
  17.560   1015 pos.      782 -     842 sp|P15498|VAV_HUMAN Proto-oncogene vav OS=Homo sapiens OX=9606 GN=VAV1 PE=1 SV=4

#----------------------------------------------------------------------#
# ...and using pfsearch V3:
#----------------------------------------------------------------------#
$PFSEARCHV3 -n -t 1 -f ./sh3.prf ./VAV_HUMAN.seq
  11.796     663 pos.     617 -     660 sp|P15498|VAV_HUMAN Proto-oncogene vav OS=Homo sapiens OX=9606 GN=VAV1 PE=1 SV=4
  17.560    1015 pos.     782 -     842 sp|P15498|VAV_HUMAN Proto-oncogene vav OS=Homo sapiens OX=9606 GN=VAV1 PE=1 SV=4

#----------------------------------------------------------------------#
# Create a database of sequences and a database of profiles, each one
# with two entries.
#----------------------------------------------------------------------#
cat ./VAV_HUMAN.seq ./VAV_RAT.seq > $TMPDIR/VAV.seq
cat ./sh2.prf       ./sh3.prf     > $TMPDIR/SHX.prf

#----------------------------------------------------------------------#
# The following commands must all produce the same final results
#
# (1) pfsearch (V2) expects ONE profile and MANY sequences:
# (2) pfscan (V2) expects MANY profile and ONE sequences:
# (3) pfsearch (V3) expects ONE profile and MANY sequences:
# (4) pfscan (V3) supports MANY profiles and MANY sequences:
#----------------------------------------------------------------------#
$PFSEARCH -fkxz ./sh2.prf $TMPDIR/VAV.seq > $TMPDIR/SHX.pfsearch2.hit
$PFSEARCH -fkxz ./sh3.prf $TMPDIR/VAV.seq >> $TMPDIR/SHX.pfsearch2.hit

$PFSCAN -fkxz ./VAV_HUMAN.seq $TMPDIR/SHX.prf > $TMPDIR/SHX.pfscan2.hit
$PFSCAN -fkxz ./VAV_RAT.seq $TMPDIR/SHX.prf >> $TMPDIR/SHX.pfscan2.hit

$PFSEARCHV3 -f -n -t 2 -o 6 ./sh2.prf -f $TMPDIR/VAV.seq > $TMPDIR/SHX.pfsearch3.hit
$PFSEARCHV3 -f -n -t 2 -o 6 ./sh3.prf -f $TMPDIR/VAV.seq >> $TMPDIR/SHX.pfsearch3.hit

$PFSCANV3 -f -n -o 6 $TMPDIR/SHX.prf $TMPDIR/VAV.seq > $TMPDIR/SHX.pfscan3.hit

#----------------------------------------------------------------------#
# All these commands produces exactly the same list of matched
# sequences, with the same raw scores and coordinates.
#
# However the output order is not necessarily preserved here.
#
# Let's verify that the output are comparable after fixing FASTA headers
#----------------------------------------------------------------------#
$SORT_PSA -s $TMPDIR/SHX.pfscan2.hit   > $TMPDIR/SHX.pfscan2.out
$SORT_PSA -s $TMPDIR/SHX.pfscan3.hit   > $TMPDIR/SHX.pfscan3.out
$SORT_PSA -s $TMPDIR/SHX.pfsearch2.hit > $TMPDIR/SHX.pfsearch2.out
$SORT_PSA -s $TMPDIR/SHX.pfsearch3.hit > $TMPDIR/SHX.pfsearch3.out
diff $TMPDIR/SHX.pfscan2.out $TMPDIR/SHX.pfscan3.out    # expecting no difference
12,13c12
< YVH-------------------LRLNPGDIVELTKAeAEHTWWEGRNTATNEVGWFPCNR
< VRPYVH
---
> YVH
16,18c15
< ITEKKAFRGLPELVEFYQQNSLKDCFksldtTLQFPYWYAGPMERAGAEGILTNRSD-GT
< YLVRQRVKDTAEFAISIKYNVEVKHIKIMTSE-GLYRITEKKAFRGLPELVEFYQQNSLK
< DCFksldtTLQFPY
---
> ITEKKAFRGLPELVEFYQQNSLKDCFksldtTLQFPY
21,22c18
< DYSKYFGTAKARYDFCARDRSELSLKEGDIIKILNKkGQQGWWRGEIY--GRIGWFPSNY
< VEEDYS
---
> DYS

    Start 5: execute_test_pfsearchV3_iupac.pl
2/5 Test #3: check_output_of_test_V3.sh .........***Failed    0.25 sec

3/5 Test #1: execute_test_V2.sh .................   Passed    0.69 sec
4/5 Test #4: execute_test_scan_search.pl ........   Passed   17.77 sec
5/5 Test #5: execute_test_pfsearchV3_iupac.pl ...   Passed   38.41 sec

60% tests passed, 2 tests failed out of 5

Total Test time (real) =  38.65 sec

The following tests FAILED:
          2 - execute_test_V3.sh (Failed)
          3 - check_output_of_test_V3.sh (Failed)
Errors while running CTest

Any idea what might be wrong here?

Kind regards, Andreas.

pfsearchV3 crashes on big databases

I have been using an older version of pfsearchV3 for a long time without any problems. However, the old version did not correctly identify the number of cores on my new processor, so I just downloaded and compiled the current version. Initially, everything looked fine (after getting used to the renamed command line parameters), but I notice that the program crashes whenever I run a long profile against a long database.

There is no specific error message. I watch the process (by top) to use more and more memory, and after a few seconds hovering at 95% RAM usage (64MB total), the program stops and the terminal just says "killed".

I am using Version 3.2.6 built on Aug 10 2021 (compiled with standard option) and notice the problem in heuristic mode, a command line like pfsearchV3 -H 38388 -L -1 hect.prf /database/nr.seq > output

Edit: the profile is 348aa long, the database is more or less identical to uniprot (114GB fasta file). The problem occurs on two different computers. The old version has no problems, but does not use all cores on the Ryzen 5950x processor (but all cores on an older Core i7 processor)

Any idea what I might have done wrong?

pfscan errors are not catched

As reported here, ps_scan.pl doesn't check if pfscan properly executed.

This is because open2, doesn't detect errors produced by the child. I would suggest to use IPC::Run which seems to handle this quite simply.

using ps_scan.pl to scan prosite patterns

what is the best way to scan prosite patterns with pftools3.
looking at the way ps_scan.pl deals with prosite patterns, without knowing much perl, it seems to me it creates a file for each profile and a file for each input sequence and then scan the sequences against the profile generating a third file for the results. that is a lot of IO.
we run prosite patterns against UniParc so we would be happy to hear if there is a different way of scanning against prosite patterns, with less io overhead

pfscanV3 error

Hi, I'm running a tool called conodictor and it uses pfscan as one of its dependencies. I tried running it but it stopped due to this error:

Traceback (most recent call last): File "/conodictor/conodictor", line 1044, in main() File "/conodictor/conodictor", line 393, in main msg(f"Running PSSM prediction using pfscan v{pfscan_match[0]}") IndexError: list index out of range

Error: Inconsistent alignment found

Hello,

I used ps_scan.pl to search for motifs in a protein against PROSITE release 2021_03, but the program crashed with the following error:

Error: Inconsistent alignment found in alignment 2 - no list produced.
       Alignement should be from 247 to 170!
Thread 107 : Internal error xalip reported no possible alignment for sequence 3(0) (nali=-1)!
>seq for pfscan
Segmentation fault (core dumped)
Could not execute /usr/local/bin/pfscanV3 --matrix-only -o4 -L-1 prosite.dat /tmp/ps569600-2.tmp  > /tmp/ps569600-1.tmp: Inappropriate ioctl for device at ps_scan.pl line 1751.

Here are the details of the run:

  • ps_scan.pl: Revision 1.90

  • pfscanV3: I downloaded pftools3-3.2.6 and built it following the instructions in INSTALL. According to make test, the installation was done properly. A series of commands were installed in /usr/local/bin/

  • ps_scan.pl & pfscanV3: This error was reproduced with the docker image sibswiss/pftools:3.2.6.

  • execution command:

    perl ps_scan.pl -o epff -s --pfscan /usr/local/bin/pfscanV3 test.faa > test.epff
    

    With the older version, there was no error (no hit found, either).

    perl ps_scan.pl -o epff -s --pfscan /usr/local/bin/pfscan test.faa > test.epff
    
  • The input sequence: It seems that the problem occurs when test.faa contains the following.

    >YAL049C
    MASNQPGKCCFEGVCHDGTPKGRREEIFGLDTYAAGSTSPKEKVIVILTDVYGNKFNNVL
    LTADKFASAGYMVFVPDILFGDAISSDKPIDRDAWFQRHSPEVTKKIVDGFMKLLKLEYD
    PKFIGVVGYCFGAKFAVQHISGDGGLANAAAIAHPSFVSIEEIEAIDSKKPILISAAEED
    HIFPANLRHLTEEKLKDNHATYQLDLFSGVAHGFAARGDISIPAVKYAKEKVLLDQIYWF
    NHFSNV
    

Here I have shown a minimal input that reproduces the error. In practice, however, I am scanning so many sequences that it is impractical to exclude error-causing inputs. Any suggestions on how to handle this?

Thank you.

Detecting the number of available cores

Hello,

We use use pftools as bundled in ebi-pf-team/interproscan.

A few weeks ago we started to have issues when using it in a large computing center: pfsearchV3 stopped immediately with the message Error setting affinity!. The system team discovered that pfsearchV3 tries to use more cores that available. This is caused by the fact that pftools uses _SC_NPROCESSORS_CONF to detect the number of cores but on this cluster the number of available cores (_SC_NPROCESSORS_ONLN) is lower than _SC_NPROCESSORS_CONF (I don't know yet of this is normal). We circumvented the problem by recompiling pftools with -DUSE_AFFINITY=OFF.

However, I was wondering if pftools should use _SC_NPROCESSORS_ONLN to detect the number of available cores at runtime.

pfscanV3 message when run with a single profile

the following message is displayed when you run pfscanV3 with a single profile
"pfscanV3 is not meant to be used with a single profile, use pfsearchV3 to get better performance in such case."
I would suggest this message should be prefixed with a '#' as pfsearchV3 continues to run and generates results.

Calibrate DNA built profiles

Hello,

I am trying to build PSSM profiles for DNA sequences. The profile construction ran smoothly. Now I need to calibrate the profile with a database and I really cannot find a way to do that. Can you please show me a way or a database to use?

P.S. The profile was built with partial bacterial DNA sequences from NCBI.

Thanks in advance.

pfscanV3 cannot use motif files that contain patterns

If psfscanV3 (>=3.2.0) is used against e.g. ftp://ftp.expasy.org/databases/prosite/prosite.dat

e.g. pfscanV3 -o3 --matrix-only -L-1 prosite.dat seq.tmp

It will fail with Complementary option may not be used on pattern profile, Error complementing alphabet, Error found reading profilemessages.
(It worked fine with v3.1)

It looks there is a new check performed on all motifs, when it should be done only on matrix (generalized profiles). This matrix specific test will fail if the motif file contains patterns. This test is performed before even considering skipping patterns with --matrix-only.

I can see that during INPUT ANALYSIS ReadProfile method (src/C/utils/io.c) is called to load/check/count motifs, it loops over motifs calling internalReadProfile method. internalReadProfile will systematically check motifs via ComplementAlphabet method. It should do it only for matrix (+ it looks like the first motif is tested over again instead of the current one)

trying:

FROM io.c 1533
if (ComplementAlphabet(prf)!= 0) {
TO io.c 1533
if ( newPrf->Type == PF_MATRIX && ComplementAlphabet(newPrf)!= 0 ) {

fixes the problem ... (& tests pass).

I will check more (side effect etc...), update test suite, update release number? (3.2.2?), then submit a pull request...

p.s. used cmake options: -DC_ONLY=ON -DUSE_32BIT_INTEGER=ON -DUSE_PCRE=OFF

cpu detection failure

Attempts to run pfsearchv3 on an older AMD64 CPU (Opteron 2212HE) fails immediately with:
pfsearch requires at least a CPU capable of SSE 2.
Building on this CPU works without issue.

-DUSE_AFFINITY=OFF -> error: field 'LastModification' has incomplete type

According to this issue, I should compile pftools3 with the cmake option -DUSE_AFFINITY=OFF

My Singularity definition

Bootstrap: docker
From: ubuntu:20.04

%post

    # Install dependencies
    ## From apt
    export DEBIAN_FRONTEND=noninteractive
    apt-get -qq update && apt-get -qq upgrade -y
    apt-get -qq install -y wget tar


    # PFTOOLS3
    # @TODO: remove when this issue is fixed: https://github.com/ebi-pf-team/interproscan/issues/215
    # According to: https://github.com/sib-swiss/pftools3/blob/master/Docker/Dockerfile
    apt-get -qq install -y --no-install-recommends \
      build-essential \
      libpcre++-dev \
      gfortran \
      libgfortran5 \
      ca-certificates \
      git \
      cmake \
      zlib1g-dev \
      libpng-dev \
      libfile-slurp-perl
    cd /opt/
    wget https://github.com/sib-swiss/pftools3/archive/refs/tags/v3.2.10.tar.gz
    tar -zxf v3.2.10.tar.gz
    mkdir pftools3-3.2.10/build
    cd pftools3-3.2.10/build
    cmake .. -DCMAKE_INSTALL_PREFIX:PATH=/opt/interproscan/prosite/ -DCMAKE_BUILD_TYPE=Release -DUSE_GRAPHICS=OFF -DUSE_PDF=OFF -DUSE_AFFINITY=OFF
    make
    make install
    make test
    cd /opt/
    rm -rf v3.2.10.tar.gz pftools3-3.2.10

The compilation output:

+ cmake .. -DCMAKE_INSTALL_PREFIX:PATH=/opt/interproscan/prosite/ -DUSE_AFFINITY=OFF
-- +--------------------------------------------------------------------+
-- |                          PfTools   v3.2.10                          |
-- +--------------------------------------------------------------------+
-- |     (C) Copyright SIB Swiss Institute of Bioinformatics            |
-- |         Thierry Schuepbach ([email protected])                  |
-- |                                                                    |
-- |     PfTools is available from                                      |
-- |         https://github.com/sib-swiss/pftools3                      |
-- |     under the GPL v2. See LICENSE.                                 |
-- |                                                                    |
-- +--------------------------------------------------------------------+
-- The Fortran compiler identification is GNU 9.3.0
-- The C compiler identification is GNU 9.3.0
-- Check for working Fortran compiler: /usr/bin/f95
-- Check for working Fortran compiler: /usr/bin/f95  -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Checking whether /usr/bin/f95 supports Fortran 90
-- Checking whether /usr/bin/f95 supports Fortran 90 -- yes
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Compilation on architecture x86_64.
-- Performing Test HANDLE_SSE
-- Performing Test HANDLE_SSE - Success
-- Performing Test HANDLE_SSE2
-- Performing Test HANDLE_SSE2 - Success
-- testing flag -msse4.1...
-- Performing Test HANDLE_SSE41
-- Performing Test HANDLE_SSE41 - Success
-- Performing Test HANDLE_C99
-- Performing Test HANDLE_C99 - Success
-- Add -std=c99 to C compiler options
-- Add SSE2 to C compiler options
-- Performing Test MS_EXTENSION
-- Performing Test MS_EXTENSION - Success
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found Perl: /usr/bin/perl (found version "5.30.0")
-- Perl found - full test suite usable
-- Found PCRE: /usr/lib/x86_64-linux-gnu/libpcre.so
-- Looking for include file emmintrin.h
-- Looking for include file emmintrin.h - found
-- Looking for include file smmintrin.h
-- Looking for include file smmintrin.h - found
-- Looking for include file mm_malloc.h
-- Looking for include file mm_malloc.h - found
-- Looking for include file alloca.h
-- Looking for include file alloca.h - found
-- Check Perl script syntax
       compare_2_profiles.pl
           - syntax OK
       fasta_to_fastq.pl
           - syntax OK
       make_iupac_cmp.pl
           - syntax OK
       ps_scan.pl
           - syntax OK
       scramble_fasta.pl
           - syntax OK
       sort_fasta.pl
           - syntax OK
       split_profile_file.pl
           - syntax OK
--
-- PfTools Software Suite configuration summary:
--
--   System name ......................: Linux
--   Build type .......................: Release
--
--   Integer format ...................: 16 bits
--
--   Install prefix .................. : /opt/interproscan/prosite
--   C compiler ...................... : /usr/bin/cc
--   Fortran compiler ................ : /usr/bin/f95
--   C compiler flags ................ :  -std=c99 -msse2 -fms-extensions -O3 -DNDEBUG
--   C compiler SSE2 flag. ........... : -msse2
--   C compiler SSE 4.1 flag.......... : -msse4.1
--   Fortran compiler flags .......... :  -O3 -DNDEBUG -O3
--
--   Use file memory mapping ..........: ON
--   Use thread affinity setting ......: OFF
--
--   Build shared libs ............... : OFF
--   Build static libs ............... : OFF
--   Build static executables......... : OFF
--
--   Link with pcre .................. : ON
--
!! CPU thread affinity is disabled, hence performance penalty may apply on many-core architecture.
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/pftools3-3.2.10/build
+ make
-- +--------------------------------------------------------------------+
-- |                          PfTools   v3.2.10                          |
-- +--------------------------------------------------------------------+
-- |     (C) Copyright SIB Swiss Institute of Bioinformatics            |
-- |         Thierry Schuepbach ([email protected])                  |
-- |                                                                    |
-- |     PfTools is available from                                      |
-- |         https://github.com/sib-swiss/pftools3                      |
-- |     under the GPL v2. See LICENSE.                                 |
-- |                                                                    |
-- +--------------------------------------------------------------------+
-- Compilation on architecture x86_64.
-- testing flag -msse4.1...
-- Add -std=c99 to C compiler options
-- Add SSE2 to C compiler options
-- Perl found - full test suite usable
-- Building test suite for pfsearch in /opt/pftools3-3.2.10/Tests
-- Check Perl script syntax
       compare_2_profiles.pl
           - syntax OK
       fasta_to_fastq.pl
           - syntax OK
       make_iupac_cmp.pl
           - syntax OK
       ps_scan.pl
           - syntax OK
       scramble_fasta.pl
           - syntax OK
       sort_fasta.pl
           - syntax OK
       split_profile_file.pl
           - syntax OK
--
-- PfTools Software Suite configuration summary:
--
--   System name ......................: Linux
--   Build type .......................: Release
--
--   Integer format ...................: 16 bits
--
--   Install prefix .................. : /opt/interproscan/prosite
--   C compiler ...................... : /usr/bin/cc
--   Fortran compiler ................ : /usr/bin/f95
--   C compiler flags ................ :  -std=c99 -msse2 -fms-extensions -O3 -DNDEBUG
--   C compiler SSE2 flag. ........... : -msse2
--   C compiler SSE 4.1 flag.......... : -msse4.1
--   Fortran compiler flags .......... :  -O3 -DNDEBUG -O3
--
--   Use file memory mapping ..........: ON
--   Use thread affinity setting ......: OFF
--
--   Build shared libs ............... : OFF
--   Build static libs ............... : OFF
--   Build static executables......... : OFF
--
--   Link with pcre .................. : ON
--
!! CPU thread affinity is disabled, hence performance penalty may apply on many-core architecture.
-- Configuring done
-- Generating done
-- Build files have been written to: /opt/pftools3-3.2.10/build
Scanning dependencies of target REGEXP
[  1%] Building C object src/C/CMakeFiles/REGEXP.dir/utils/pfregexp.c.o
[  1%] Built target REGEXP
Scanning dependencies of target OUTPUT_FORMAT
[  2%] Building C object src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/output.c.o
In file included from /opt/pftools3-3.2.10/src/C/utils/output.c:14:
/opt/pftools3-3.2.10/src/C/utils/../include/pfSequence.h:50:18: error: field 'LastModification' has incomplete type
   50 |  struct timespec LastModification;
      |                  ^~~~~~~~~~~~~~~~
make[2]: *** [src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/build.make:63: src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/output.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1579: src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/all] Error 2
make: *** [Makefile:163: all] Error 2
FATAL:   While performing build: while running engine: exit status 2

It succeed without the -DUSE_AFFINITY=OFF option

Internal error xalip when setting cutoff level to -1

Hello,

I am trying to reproduce ScanProsite results with pfscanV3. However, for some sequences, I am encountering an error when running pfscanV3 with -L -1 (to run the scan at a low confidence cut-off).

Getting data

$ wget ftp://ftp.expasy.org/databases/prosite/prosite.dat
$ wget https://rest.uniprot.org/uniprotkb/Q840Q1.fasta

Testing with Conda/Mamba

Create a new environment:

$ mamba create -n ps -c bioconda --quiet --yes pftools
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

Activate it:

source activate ps

Run pfscanV3:

$ pfscanV3 --matrix-only -o 4 -L -1 prosite.dat Q840Q1.fasta > /dev/null
Error: Inconsistent alignment found in alignment 1 - no list produced.
       Alignement should be from 367 to 165!
Thread 3 : Internal error xalip reported no possible alignment for sequence 3(0) (nali=-1)!
>tr|Q840Q1|Q840Q1_STRGR DAGKc domain-containing protein OS=Streptomyces griseus subsp. griseus OX=67263 PE=4 SV=1
Segmentation fault (core dumped)

Testing with Docker

$ docker run --rm --quiet \
>     -v $PWD:/data \
>     sibswiss/pftools:3.2.12 \
>     pfscanV3 --matrix-only -o 4 -L -1 /data/prosite.dat /data/Q840Q1.fasta > /dev/null
Error: Inconsistent alignment found in alignment 1 - no list produced.
       Alignement should be from 367 to 165!
Thread 3 : Internal error xalip reported no possible alignment for sequence 3(0) (nali=-1)!
>tr|Q840Q1|Q840Q1_STRGR DAGKc domain-containing protein OS=Streptomyces griseus subsp. griseus OX=67263 PE=4 SV=1

Could you please have a look?

Thank you


Edit: I just realized this issue describes the same error as reported in #22.

Error compiling on OSX

building info:

cmake -DSTANDALONE=ON ..
-- +--------------------------------------------------------------------+
-- |                          PfTools   v3.2.10                          |
-- +--------------------------------------------------------------------+
-- |     (C) Copyright SIB Swiss Institute of Bioinformatics            |
-- |         Thierry Schuepbach ([email protected])                  |
-- |                                                                    |
-- |     PfTools is available from                                      |
-- |         https://github.com/sib-swiss/pftools3                      |
-- |     under the GPL v2. See LICENSE.                                 |
-- |                                                                    |
-- +--------------------------------------------------------------------+
-- Compilation on architecture x86_64.
-- testing flag -msse4.1...
-- Add -std=c99 to C compiler options
-- Add SSE2 to C compiler options
-- Perl found - full test suite usable
-- libPCRE will be built in /Users/jacda119/git/pftools3/build/libPCRE
-- Building test suite for pfsearch in /Users/jacda119/git/pftools3/Tests
-- Check Perl script syntax
       compare_2_profiles.pl
           - syntax OK
       fasta_to_fastq.pl
           - syntax OK
       make_iupac_cmp.pl
           - syntax OK
       ps_scan.pl
           - syntax OK
       scramble_fasta.pl
           - syntax OK
       sort_fasta.pl
           - syntax OK
       split_profile_file.pl
           - syntax OK
-- 
-- PfTools Software Suite configuration summary:
-- 
--   System name ......................: Darwin
--   Build type .......................: Release
-- 
--   Integer format ...................: 16 bits
-- 
--   Install prefix .................. : /usr/local
--   C compiler ...................... : /Users/jacda119/miniconda3/envs/pftools/bin/x86_64-apple-darwin13.4.0-clang
--   Fortran compiler ................ : /Users/jacda119/miniconda3/envs/pftools/bin/x86_64-apple-darwin13.4.0-gfortran
--   C compiler flags ................ : -march=core2 -mtune=haswell -mssse3 -ftree-vectorize -fPIC -fPIE -fstack-protector-strong -O2 -pipe -isystem /Users/jacda119/miniconda3/envs/pftools/include -std=c99 -msse2 -O3 -DNDEBUG
--   C compiler SSE2 flag. ........... : -msse2
--   C compiler SSE 4.1 flag.......... : -msse4.1
--   Fortran compiler flags .......... : -march=core2 -mtune=haswell -ftree-vectorize -fPIC -fstack-protector -O2 -pipe -isystem /Users/jacda119/miniconda3/envs/pftools/include -O3 -DNDEBUG -O3
-- 
--   Use file memory mapping ..........: ON
--   Use thread affinity setting ......: OFF
-- 
--   Build shared libs ............... : OFF
--   Build static libs ............... : OFF
--   Build static executables......... : ON
-- 
--   Link with pcre .................. : Internaly built
-- 
!! CPU thread affinity is disabled, hence performance penalty may apply on many-core architecture.
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/jacda119/git/pftools3/build

compiling with make:

[ 33%] Built target SEQUENCES_EXTRA
[ 34%] Building C object src/C/utils/CMakeFiles/SEQUENCES.dir/ReadSequence.c.o
/Users/jacda119/git/pftools3/src/C/utils/ReadSequence.c:213:36: error: no member named 'st_mtim' in 'struct stat'
        Info->LastModification = FileStat.st_mtim;
                                 ~~~~~~~~ ^
/Users/jacda119/git/pftools3/src/C/utils/ReadSequence.c:423:36: error: no member named 'st_mtim' in 'struct stat'
        Info->LastModification = FileStat.st_mtim;
                                 ~~~~~~~~ ^
/Users/jacda119/git/pftools3/src/C/utils/ReadSequence.c:571:36: error: no member named 'st_mtim' in 'struct stat'
        Info->LastModification = FileStat.st_mtim;
                                 ~~~~~~~~ ^
/Users/jacda119/git/pftools3/src/C/utils/ReadSequence.c:631:41: error: no member named 'st_mtim' in 'struct stat'
        if (st.st_size != Info->FileSize || st.st_mtim.tv_nsec != Info->LastModification.tv_nsec ||
                                            ~~ ^
/Users/jacda119/git/pftools3/src/C/utils/ReadSequence.c:632:6: error: no member named 'st_mtim' in 'struct stat'
                st.st_mtim.tv_sec != Info->LastModification.tv_sec) {
                ~~ ^
5 errors generated.
make[2]: *** [src/C/utils/CMakeFiles/SEQUENCES.dir/ReadSequence.c.o] Error 1
make[1]: *** [src/C/utils/CMakeFiles/SEQUENCES.dir/all] Error 2
make: *** [all] Error 2

incomplete type compilation issue when compiled on Rocky Linux

When the pftools are compiled on Rocky Linux (tested on version 8.4) the compilation failed:

Scanning dependencies of target OUTPUT_FORMAT
[ 20%] Building C object src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/output.c.o
In file included from /pftools3/src/C/utils/output.c:14:
/pftools3/src/C/utils/../include/pfSequence.h:50:18: error: field 'LastModification' has incomplete type
  struct timespec LastModification;
                  ^~~~~~~~~~~~~~~~
make[2]: *** [src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/build.make:82: src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/output.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:1654: src/C/utils/CMakeFiles/OUTPUT_FORMAT.dir/all] Error 2
make: *** [Makefile:182: all] Error 2

@tschuepb @schoopy do you have an idea what could cause this?
Rocky Linux 8 is based on CentOS 8, so maybe pftoolsV3 is not compile-able on this OS.

Increase maximum filename size

As reported here, pfscan generates a segmentation fault when the path of the input files exceeds a certain number of characters. It seems to be related to the use of fixed size strings (length 512) for filenames in pfscan.f (and probably other programs).

A common (but not perfect) practice is to rely on the value of MAX_PATH from <limits.h> (see here). I don't know how to get this value in Fortran (maybe detecting it from CMake and passing it to the compiler is the best option).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.