Git Product home page Git Product logo

mitos.jl's Introduction

MIToS

Mutual Information Tools for protein Sequence analysis

Julia 1.0 or higher: MIToS v2

Linux, OSX & Windows: Status

Code Coverage: Coverage Status codecov.io

NOTE: Julia has reached a stable 1.0 version and MIToS v2 supports it. You need to do using Pkg; Pkg.add("MIToS") to install MIToS on Julia v1 or using Pkg; Pkg.update() to update your installed version.

Some breaking changes were introduced in MIToS v2.3. See the NEWS.md file and the documentation to migrate code from an old version of MIToS. If you need more help to migrate code from MIToS 1.0 in Julia 0.4 to the newer MIToS v2 in Julia v1, you can write a mail to diegozea at gmail dot com asking for assistance.

DOCUMENTATION:
Documentation for MIToS 1.0 in Julia 0.4
Documentation for MIToS v2 or greater in Julia 0.5 or greater:

MIToS is an environment for Mutual Information (MI) analysis and implements several useful tools for Multiple Sequence Alignments (MSAs) and PDB structures management in the Julia language [1]. MI allows determining covariation between positions in a MSA. MI derived scores are good predictors of residue contacts and functional sites in proteins [2,3].

MIToS starting point was an improvement of the algorithm published by Buslje et. al. [2]. A BLOSUM62-based pseudocount strategy, similar to Altschul et. al.[4], was implemented for a better performance in the range of MSAs with low number of sequences. MIToS offers all the necessary tools for using, developing and testing MI based scores, in different modules.

Modules

MIToS tools are separated on different modules, related to different tasks.

  • MSA This module defines multiple functions and types for dealing with MSAs and their annotations. It also includes facilities for sequence clustering.
  • PDB This module defines types and methods to work with protein structures from PDB.
  • SIFTS This module allows access to SIFTS residue-level mapping of UniProt, Pfam and other databases with PDB entries.
  • Information This module defines residue contingency tables and methods on them to estimate information measure from MSAs. It includes functions to estimate corrected mutual information (ZMIp, ZBLMIp) between MSA columns.
  • Pfam This module use the previous modules to work with Pfam MSAs. It also has useful parameter optimization functions to be used with Pfam alignments.
  • Utils MIToS has also an Utils module with common utils functions and types used in this package.

Scripts

MIToS implements several useful scripts for command line execution (without requiring Julia coding):

  • Buslje09.jl : Calculates the corrected MI/MIp described on Buslje et. al. 2009 [2].
  • BLMI.jl : Calculates corrected mutual information using BLOSUM62 based-pseudocounts.
  • Conservation.jl : Calculates the Shannon entropy and the Kullback-Leibler divergence of each MSA column.
  • DownloadPDB.jl : Downloads gzipped files from PDB.
  • Distances.jl : Calculates residues distances in a PDB file.
  • SplitStockholm.jl : Splits a Stockholm file with multiple alignments into one compressed file per MSA
  • AlignedColumns.jl : Creates a Stockholm file with the aligned columns from a Pfam Stockholm file (insertions are deleted) saving the mapping (residue number in UniProt) and the columns in the original MSA.
  • PercentIdentity.jl : Calculates the percentage identity between all the sequences of an MSA and saves mean, median, minimum, etc.
  • MSADescription.jl : Calculates the number of columns, sequences and clusters after Hobohm I clustering at 62% identity given a stockholm file as input [5]. It also gives the percent indentity mean and mean, standard deviation and quantiles of: sequence coverage of the MSA and gap percentage.

Citation

If you use MIToS, please cite:

Diego J. Zea, Diego Anfossi, Morten Nielsen, Cristina Marino-Buslje; MIToS.jl: mutual information tools for protein sequence analysis in the Julia language, Bioinformatics, Volume 33, Issue 4, 15 February 2017, Pages 564–565, https://doi.org/10.1093/bioinformatics/btw646

References

  1. Zea, Diego Javier, et al. "MIToS. jl: mutual information tools for protein sequence analysis in the Julia language." Bioinformatics 33, no. 4 (2016): 564-565.
  2. Buslje, Cristina Marino, et al. "Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information." Bioinformatics 25.9 (2009): 1125-1131.
  3. Buslje, Cristina Marino, et al. "Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification." PLoS Comput Biol 6.11 (2010): e1000978.
  4. Altschul, Stephen F., et al. "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic acids research 25.17 (1997): 3389-3402.
  5. Hobohm, Uwe, et al. "Selection of representative protein data sets." Protein Science 1.3 (1992): 409-417.

Structural Bioinformatics Unit

FIL

mitos.jl's People

Contributors

brnmdrt avatar cossio avatar cristinot avatar diegozea avatar elin- avatar ellisvalentiner avatar femtocleaner[bot] avatar github-actions[bot] avatar javieriserte avatar juliatagbot avatar kool7d avatar timholy avatar tkelman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mitos.jl's Issues

IO performance

The updated mitos-benchmarks/MIToS benchmark shows small regression in read/write of MSA for Julia 1.0.2 and MIToS 2.3.1. Maybe, it's related to the use of length(::String): https://discourse.julialang.org/t/performance-of-length-string/12672
Also, @profile shows that most of the time is spent in cache inefficient access to the big MSA matrix: setindex! in _convert_to_matrix_residues.
Maybe https://github.com/JuliaArrays/TiledIteration.jl or using 8 bits instead of 32/64 bits for Residue can help to avoid cache related problems.

using MIToS.MSA, Profile, ProfileView
@profile read("../data/PF00089.sth", Stockholm, MultipleSequenceAlignment, deletefullgaps=false)

image

322 ./task.jl:259; (::getfield(Revise, Symbol("##58#60")){REPL.REPLBackend})()
 322 /home/elin/.julia/packages/Revise/gStbk/src/Revise.jl:771; run_backend(::REPL.REPLBackend)
  322 ...rker/package_linux64/build/usr/share/julia/stdlib/v1.0/REPL/src/REPL.jl:85; eval_user_input(::Any, ::REPL.REPLBackend)
   322 ./boot.jl:319; eval(::Module, ::Any)
    322 ...ckage_linux64/build/usr/share/julia/stdlib/v1.0/Profile/src/Profile.jl:25; top-level scope
     322 ./none:0; #read
      322 /home/elin/.julia/packages/MIToS/fEKlk/src/Utils/Read.jl:109; #read#10(::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:deletefullgaps,),Tuple{Bool}}}, ::...
       322 ./none:0; (::getfield(MIToS.Utils, Symbol("#kw##_read")))(::NamedTuple{(:deletefullgaps,),Tuple{Bool}}, ::typeof(MIT...
        322 /home/elin/.julia/packages/MIToS/fEKlk/src/Utils/Read.jl:79; #_read#9(::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:deletefullgaps,),Tuple{Bool}}}, ::...
         322 ./none:0; (::getfield(Base, Symbol("#kw##parse")))(::NamedTuple{(:deletefullgaps,),Tuple{Bool}}, ::typeof(parse), :...
          322 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:113; #parse#27
           322 ./none:0; #parse
            86  /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:101; #parse#26(::Bool, ::Function, ::IOStream, ::Type{Stockholm}, ::Type{NamedArrays.NamedArray{Residue,2,A...
             54 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:75; _pre_readstockholm_sequences(::IOStream)
              1  /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:4; _fill_with_sequence_line!
               1 ./strings/util.jl:25; startswith(::String, ::Char)
              48 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:5; _fill_with_sequence_line!
               48 /home/elin/.julia/packages/MIToS/fEKlk/src/Utils/GeneralUtils.jl:36; get_n_words(::String, ::Int64)
                22 ./strings/string.jl:250; getindex(::String, ::UnitRange{Int64})
                 22 ./strings/string.jl:60; _string_n
                  1 ./essentials.jl:355; cconvert
                   1 ./number.jl:7; convert
                    1 ./boot.jl:722; Type
                     1 ./boot.jl:692; toUInt64
                      1 ./boot.jl:581; check_top_bit
                       1 ./boot.jl:571; is_top_bit_set
                26 ./strings/string.jl:253; getindex(::String, ::UnitRange{Int64})
                 1  ./pointer.jl:118; unsafe_store!
                 1  ./range.jl:575; iterate
                  1 ./promotion.jl:425; ==
                 15 ./strings/string.jl:87; codeunit
                  15 ./strings/basic.jl:193; checkbounds
                   9 ./strings/basic.jl:185; checkbounds
                    8 ./int.jl:428; <=
              3  /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:7; _fill_with_sequence_line!
               3 /home/elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_set.jl:26; in
                3 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:367; haskey
                 2 ...elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:232; ht_keyindex(::OrderedCollections.OrderedDict{String,Nothing}, ::String, ::Bool)
                  2 ./array.jl:731; getindex
                 1 ...elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:234; ht_keyindex(::OrderedCollections.OrderedDict{String,Nothing}, ::String, ::Bool)
              1  /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:14; _fill_with_sequence_line!
               1 ./array.jl:856; push!
                1 ./array.jl:814; _growend!
              1  /home/elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_set.jl:28; _fill_with_sequence_line!
               1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:307; setindex!(::OrderedCollections.OrderedDict{String,Nothing}, ::Nothing, ::String)
                1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:276; _setindex!(::OrderedCollections.OrderedDict{String,Nothing}, ::Nothing, ::String, ::Int64)
             1  /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:76; _pre_readstockholm_sequences(::IOStream)
              1 ./strings/util.jl:53; startswith
             31 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:77; _pre_readstockholm_sequences(::IOStream)
              31 ./io.jl:881; iterate(::Base.EachLine{IOStream}, ::Nothing)
               31 ./none:0; #readline
                31 ./iostream.jl:433; #readline#296
            236 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Stockholm.jl:102; #parse#26(::Bool, ::Function, ::IOStream, ::Type{Stockholm}, ::Type{NamedArrays.NamedArray{Residue,2,A...
             234 ...e/elin/.julia/packages/MIToS/fEKlk/src/MSA/GeneralParserMethods.jl:132; _generate_named_array(::Array{String,1}, ::OrderedCollections.OrderedSet{String})
              7   /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Residues.jl:177; _convert_to_matrix_residues(::Array{String,1}, ::Tuple{Int64,Int64})
              23  /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Residues.jl:249; _convert_to_matrix_residues(::Array{String,1}, ::Tuple{Int64,Int64})
               23 ./boot.jl:409; Type
                23 ./boot.jl:396; Type
              3   /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Residues.jl:251; _convert_to_matrix_residues(::Array{String,1}, ::Tuple{Int64,Int64})
               3 ./iterators.jl:138; iterate
                3 ./iterators.jl:139; iterate
                 3 ./strings/string.jl:176; iterate
                  3 ./strings/string.jl:176; iterate
                   3 ./operators.jl:286; >
                    3 ./int.jl:49; <
              201 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Residues.jl:252; _convert_to_matrix_residues(::Array{String,1}, ::Tuple{Int64,Int64})
               173 ./array.jl:771; setindex!
                5 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Residues.jl:176; convert
                 5 ./char.jl:50; Type
                  5 ./char.jl:53; codepoint
                   5 ./char.jl:114; Type
                9 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/Residues.jl:178; convert
                 9 ./array.jl:731; getindex
               23  ./iterators.jl:139; iterate
                1  ./strings/string.jl:176; iterate
                9  ./strings/string.jl:178; iterate
                 9 ./int.jl:450; <<
                  9 ./int.jl:443; <<
                13 ./strings/string.jl:179; iterate
                 13 ./strings/string.jl:17; between
                  13 ./int.jl:429; <=
             2   ...e/elin/.julia/packages/MIToS/fEKlk/src/MSA/GeneralParserMethods.jl:133; _generate_named_array(::Array{String,1}, ::OrderedCollections.OrderedSet{String})
              2 /home/elin/.julia/packages/MIToS/fEKlk/src/MSA/GeneralParserMethods.jl:116; _ids_ordered_dict(::OrderedCollections.OrderedSet{String}, ::Int64)
               1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:301; setindex!(::OrderedCollections.OrderedDict{String,Int64}, ::Int64, ::String)
                1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:253; ht_keyindex2(::OrderedCollections.OrderedDict{String,Int64}, ::String)
                 1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/dict_support.jl:6; hashindex
                  1 ./hashing.jl:18; hash
                   1 ./hashing2.jl:179; hash
               1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:307; setindex!(::OrderedCollections.OrderedDict{String,Int64}, ::Int64, ::String)
                1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:290; _setindex!(::OrderedCollections.OrderedDict{String,Int64}, ::Int64, ::String, ::Int64)
                 1 ...elin/.julia/packages/OrderedCollections/Pr9Pa/src/ordered_dict.jl:183; rehash!(::OrderedCollections.OrderedDict{String,Int64}, ::Int64)
                  1 .../elin/.julia/packages/OrderedCollections/Pr9Pa/src/dict_support.jl:6; hashindex
                   1 ./hashing.jl:18; hash
                    1 ./hashing2.jl:179; hash

downloadpfam fails for full alignments

downloadpfam seems to fail when attempting to download full alignments:

MIToS.Pfam.downloadpfam("PF11591"; alignment="full")

ERROR: HTTP.Exceptions.StatusError(404, "GET", "/interpro/wwwapi/entry/pfam/PF11591/?annotation=alignment:full&download", HTTP.Messages.Response:
"""
HTTP/1.1 404 Not Found
Server: gunicorn
Vary: Accept,Cookie,Origin
Content-Type: application/json
InterPro-Version: 99.0
Access-Control-Expose-Headers: Content-Length, InterPro-Version, InterPro-Version-Minor, Cached
Strict-Transport-Security: max-age=0
Date: Sun, 28 Apr 2024 10:53:34 GMT
Access-Control-Max-Age: 3600
InterPro-Version-Minor: 0
Access-Control-Allow-Origin: *
X-Content-Type-Options: nosniff
Connection: Keep-Alive
Allow: GET, HEAD
X-Frame-Options: DENY
Access-Control-Allow-Methods: GET, HEAD, OPTIONS, POST, PUT
Access-Control-Allow-Headers: x-requested-with, Content-Type, origin, authorization, accept, client-security-token
Referrer-Policy: same-origin
Content-Length: 35

""")

However, downloading seed alignments works fine:

MIToS.Pfam.downloadpfam("PF11591"; alignment="seed") # succeeds

New zlib error reading Stockholm files

I think this is a new error associated with a recent change regarding headers.

Input code:

downloadpfam("PF00062")
msa1 = MIToS.MSA.read("PF00062.stockholm.gz", Stockholm, generatemapping=true, useidcoordinates=true)

Error output:

ERROR: zlib error: incorrect header check (code: -3)
Stacktrace:
  [1] changemode!(stream::TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompressor, IOStream}, newmode::Symbol)
    @ TranscodingStreams C:\Users\kool7\.julia\packages\TranscodingStreams\MQucZ\src\stream.jl:742
  [2] callprocess(stream::TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompressor, IOStream}, inbuf::TranscodingStreams.Buffer, outbuf::TranscodingStreams.Buffer)
    @ TranscodingStreams C:\Users\kool7\.julia\packages\TranscodingStreams\MQucZ\src\stream.jl:668
  [3] fillbuffer(stream::TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompressor, IOStream}; eager::Bool)
    @ TranscodingStreams C:\Users\kool7\.julia\packages\TranscodingStreams\MQucZ\src\stream.jl:596
  [4] fillbuffer
    @ C:\Users\kool7\.julia\packages\TranscodingStreams\MQucZ\src\stream.jl:582 [inlined]
  [5] eof(stream::TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompressor, IOStream})
    @ TranscodingStreams C:\Users\kool7\.julia\packages\TranscodingStreams\MQucZ\src\stream.jl:201
  [6] eof(stream::TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompressor, IOStream})
    @ TranscodingStreams C:\Users\kool7\.julia\packages\TranscodingStreams\MQucZ\src\stream.jl:199
  [7] iterate (repeats 2 times)
    @ .\io.jl:1062 [inlined]
  [8] _pre_readstockholm(io::TranscodingStreams.TranscodingStream{CodecZlib.GzipDecompressor, IOStream})
    @ MIToS.MSA C:\Users\kool7\.julia\packages\MIToS\LlUD0\src\MSA\Stockholm.jl:55
  [9] #parse#25
    @ C:\Users\kool7\.julia\packages\MIToS\LlUD0\src\MSA\Stockholm.jl:89 [inlined]
 [10] #parse#29
    @ C:\Users\kool7\.julia\packages\MIToS\LlUD0\src\MSA\Stockholm.jl:131 [inlined]
 [11] _read(::String, ::String, ::Type{Stockholm}; kargs::Base.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:generatemapping, :useidcoordinates), Tuple{Bool, Bool}}})
    @ MIToS.Utils C:\Users\kool7\.julia\packages\MIToS\LlUD0\src\Utils\Read.jl:100
 [12] read(::String, ::Type{Stockholm}; kargs::Base.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:generatemapping, :useidcoordinates), Tuple{Bool, Bool}}})
    @ MIToS.Utils C:\Users\kool7\.julia\packages\MIToS\LlUD0\src\Utils\Read.jl:130

ArgParse needed for MIToS' scripts

D:\>"C:\Users\Diego\AppData\Local\Julia-1.2.0\bin\julia.exe" C:\Users\Diego\.julia\packages\MIToS\Hha81\scripts\SplitStockholm.jl -h
ERROR: LoadError: ArgumentError: Package ArgParse not found in current path:
- Run `import Pkg; Pkg.add("ArgParse")` to install the ArgParse package.

PFAM is not always round-trippable?

If I load a Pfam file, perform some manipulations, and then write it back out to disk, I can't load what I write:

julia> using MIToS, MIToS.Pfam, MIToS.MSA

julia> pfamfile = "PF18883.alignment.full.gz"
"PF18883.alignment.full.gz"

julia> pfamfile = downloadpfam("PF18883"; filename=pfamfile)
"PF18883.alignment.full.gz"

julia> msa = read(pfamfile, Stockholm, generatemapping=true, useidcoordinates=true);

julia> filtersequences!(msa, coverage(msa) .>= 0.9);

julia> open("PF18883.alignment.fullcoverage", "w") do io
           print(io, msa, Stockholm)
       end

julia> msa2 =  read("PF18883.alignment.fullcoverage", Stockholm, generatemapping=true, useidcoordinates=true)
ERROR: The last residue number 685 in sequence 1 isn't the end coordinate 686
Stacktrace:
 [1] _to_msa_mapping(sequences::Vector{String}, ids::Vector{String})
   @ MIToS.MSA ~/.julia/dev/MIToS/src/MSA/GeneralParserMethods.jl:56
 [2] _generate_annotated_msa(annot::Annotations, IDS::Vector{String}, SEQS::Vector{String}, keepinserts::Bool, generatemapping::Bool, useidcoordinates::Bool, deletefullgaps::Bool)
   @ MIToS.MSA ~/.julia/dev/MIToS/src/MSA/GeneralParserMethods.jl:142
 [3] #parse#25
   @ ~/.julia/dev/MIToS/src/MSA/Stockholm.jl:93 [inlined]
 [4] #parse#29
   @ ~/.julia/dev/MIToS/src/MSA/Stockholm.jl:133 [inlined]
 [5] _read(::String, ::String, ::Type{Stockholm}; kargs::Base.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:generatemapping, :useidcoordinates), Tuple{Bool, Bool}}})
   @ MIToS.Utils ~/.julia/dev/MIToS/src/Utils/Read.jl:105
 [6] read(::String, ::Type{Stockholm}; kargs::Base.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:generatemapping, :useidcoordinates), Tuple{Bool, Bool}}})
   @ MIToS.Utils ~/.julia/dev/MIToS/src/Utils/Read.jl:135
 [7] top-level scope
   @ REPL[7]:1

Error reading structures generated with extract_from_pdb

Reading a pdb file created using extract_from_pdb (from T-Coffee) with MIToS throw errors, as the ATOM lines have only 66 characters. However, BioStructures' read function works fine. So, the following function uses BioStructures to save an intermediate file that MIToS can read.

"""
This function reads a PDB file created with extract_from_pdb and returns a MIToS' PDB object.
"""
function read_pdb(pdb_file::String)
    # Read the PDB file with BioStructures
    tmp = BioStructures.read(pdb_file, BioStructures.PDB)
    # Save the BioStructures' PDB object into a temporary file
    tmp_file = tempname()
    BioStructures.writepdb(tmp_file, tmp)
    # Read the temporary file with MIToS
    pdb = read(tmp_file, PDBFile)
    # Delete the temporary file
    rm(tmp_file)
    return pdb
end

From one of those PDB files:

REMARK Output of the program extract_from_pdb (Version 2.43)
REMARK Legal PDB format not Guaranteed
REMARK This format is not meant to be used in place of the PDB format
REMARK The header refers to the original entry
REMARK The sequence from the original file has been taken in the field: ATOM
REMARK extract_from_pdb, 2001, 2002, 2003, 2004, 2005 2006 (c) CNRS and Cedric Notredame

Extra whitespace when writing PDB files with specific coordinates

When reading the this PDB file, changing its coordinates and writing these modified versions, PDBs with coordinates 10.000 get an extra whitespace before the coordinates.

Grepping the output files:

grep "10\.000" *pdb
14_1svd_1M14_A.pdb:ATOM    386  CB  THR    49      37.515 -10.000  36.203  1.00  0.00           C
19_1svd_1M14_A.pdb:ATOM   2169  N   ASP   272       10.000  13.255  76.512  1.00  0.00           N
45_1svd_1M14_A.pdb:ATOM    780  CA  ARG   101      10.000  11.768  53.211  1.00  0.00           C
51_1svd_1M14_A.pdb:ATOM    659  OE1 GLN    85      30.041 -10.000  45.708  1.00  0.00           O
55_1svd_1M14_A.pdb:ATOM    853  CA  GLN   110       10.000  11.317  70.605  1.00  0.00           C
58_1svd_1M14_A.pdb:ATOM      6  CG  LEU     1      38.475 -10.000  63.420  1.00120.89           C
62_1svd_1M14_A.pdb:ATOM    909  CZ3 TRP   115      12.990   10.000  62.725  1.00  0.00           C
69_1svd_1M14_A.pdb:ATOM   1248  C   LYS   158      40.377  10.000  59.300  1.00  0.00           C
70_1svd_1M14_A.pdb:ATOM    748  CB  LEU    97      16.920  10.000  54.280  1.00  0.00           C
82_1svd_1M14_A.pdb:ATOM    819  CA  ASP   105       2.233   10.000  60.879  1.00  0.00           C

Files 19, 55, 62 and 82 are misaligned. I included these files on the public guist.

Note that the problem does not appear when writing files with coordinates 20.000, 30.000, etc...

> grep "20\.000" *pdb 
100_1svd_1M14_A.pdb:ATOM    287  N   LYS    37      18.454 -20.000  51.996  1.00  0.00           N
100_1svd_1M14_A.pdb:ATOM    700  N   MET    91      20.000  -4.533  61.817  1.00  0.00           N
10_1svd_1M14_A.pdb:ATOM    308  CB  PRO    39      20.000  -9.583  60.310  1.00  0.00           C
33_1svd_1M14_A.pdb:ATOM   1340  C   ALA   169      42.771  20.000  50.788  1.00  0.00           C
35_1svd_1M14_A.pdb:ATOM   1651  CA  THR   207      11.747  20.000  57.741  1.00  0.00           C
40_1svd_1M14_A.pdb:ATOM    245  O   PRO    31      29.420 -20.000  61.352  1.00  0.00           O
45_1svd_1M14_A.pdb:ATOM   1125  O   LEU   142      20.000   4.175  60.041  1.00  0.00           O
5_1svd_1M14_A.pdb:ATOM    694  C   LEU    90      20.000  -4.129  55.154  1.00  0.00           C
54_1svd_1M14_A.pdb:ATOM   1927  C   MET   243      20.000  26.687  68.258  1.00  0.00           C
55_1svd_1M14_A.pdb:ATOM   1698  CB  TYR   213      20.000  25.617  49.568  1.00  0.00           C
56_1svd_1M14_A.pdb:ATOM   1604  N   VAL   202      20.000  17.469  61.130  1.00  0.00           N
57_1svd_1M14_A.pdb:ATOM    281  CA  VAL    36      22.390 -20.000  61.070  1.00  0.00           C
59_1svd_1M14_A.pdb:ATOM   1094  O   ALA   138      20.000   9.640  56.193  1.00  0.00           O
63_1svd_1M14_A.pdb:ATOM   1680  C   LYS   211      16.675  20.000  50.749  1.00  0.00           C
81_1svd_1M14_A.pdb:ATOM   2080  O   GLU   261      25.065  20.000  73.731  1.00  0.00           O
8_1svd_1M14_A.pdb:ATOM    918  CA  VAL   117      20.000  10.717  69.046  1.00  0.00           C
94_1svd_1M14_A.pdb:ATOM   1932  CE  MET   243      20.000  32.105  66.505  1.00  0.00           C
96_1svd_1M14_A.pdb:ATOM   2129  CZ  PHE   266      20.000  20.517  68.683  1.00  0.00           C
98_1svd_1M14_A.pdb:ATOM    900  O   TRP   115      20.000  10.810  68.305  1.00  0.00           O

Write PDB files

A function for writing PDB can be useful for passing structural information to external programs; useful for contact determination, h-bonds prediction, visualization and others.

Roadmap to MIToS v1.0

  • Scripts
    • Write documentation
    • Use -p in scripts
    • Move general parts to Utils module
    • Use STDIN & STDOUT (pipes)
    • Use real flags
  • MSA
    • Add keepinserts option to read/parse
    • Test mean PID
  • PDB
    • distance and contact taking vectors

Optional

  • cMI
  • Pfam Doc
    • pMI

MSA shown representation

It looks like how the MSAs are shown as changed with Julia 1.8:

julia> msa
AnnotatedMultipleSequenceAlignment with 2 annotations : Residue[V L … - -; P L … - -; … ; V L … S -; P I … - -]

Make hobohmI() faster

A lot of time of hobohmI looks to be spent on percentageidentity. Maybe making percentageidentity faster can help, but:

  • Check lengths in percentageidentity is not a problem, takes the same.
  • @inbounds doesn't help in percentageidentity.

Maybe using msa.msa' can help in some way for getting the sequences faster...

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Deal with None in SIFTS

Example:

shell> zgrep "None" /home/diego/SIFTS/3cx5.sifts.xml.gz | head -3
          <crossRefDb dbSource="NCBI" dbCoordSys="UniProt" dbAccessionId="10090" dbResNum="None" dbResName="None"/>
          <crossRefDb dbSource="InterPro" dbCoordSys="UniProt" dbAccessionId="IPR013783" dbResNum="None" dbResName="None" dbEvidence="G3DSA:2.60.40.10"/>
          <crossRefDb dbSource="InterPro" dbCoordSys="UniProt" dbAccessionId="IPR007110" dbResNum="None" dbResName="None" dbEvidence="PS50835"/>

This causes:

ArgumentError("invalid base 10 digit 'N' in \"None\"")

Using a limit distance for speed contact calculation between residues

Calculate CA distance between two residues A and B and contrast against the maximum distance between CA for a pair of residues (A, B) for a limit contact distance

maximum distances ~ len(A) + len(B) + limit + 2 * van der Waals radius + error

Using FAUJ880104 from AAindex for getting len(A) and len(B). len is not taking into account post-translational modifications.

Carbon van der Waals radius is 1.7

Error should be large enough for not missing real contacts, but low enough for speed the calculation avoiding the calculation of not useful distances.

404 error from function 'getpdbdescription'

I am getting a 404 error from function 'getpdbdescription'.

`
HTTP/1.1 404 Not Found
X-Powered-By: Express
Strict-Transport-Security: max-age=31536000000; includeSubDomains
Cache-Control: private, no-cache, max-age=0, must-revalidate
Content-Type: text/html; charset=utf-8
Content-Length: 39674
ETag: W/"9afa-t3cWyFWTs7aO0Iy60YWTWEN5EGo"
Vary: Accept-Encoding
Date: Thu, 31 Dec 2020 00:07:16 GMT
X-Backend-Server: 10.100.50.12

""")
in top-level scope at Scratch1.jl:302
in getpdbdescription at MIToS\IWuWP\src\PDB\PDBMLParser.jl:269
in #getpdbdescription#36 at MIToS\IWuWP\src\PDB\PDBMLParser.jl:269
in downloadpdbheader at MIToS\IWuWP\src\PDB\PDBMLParser.jl:244
in #downloadpdbheader#35 at MIToS\IWuWP\src\PDB\PDBMLParser.jl:248
in download_file at MIToS\IWuWP\src\Utils\Read.jl:28
in #download_file#3 at MIToS\IWuWP\src\Utils\Read.jl:28
in open at HTTP\IAI92\src\HTTP.jl:348
in #open#6 at HTTP\IAI92\src\HTTP.jl:348
in request##kw at HTTP\IAI92\src\HTTP.jl:314
in #request#4 at HTTP\IAI92\src\HTTP.jl:314
in at HTTP\IAI92\src\RedirectRequest.jl:21
in #request#1 at HTTP\IAI92\src\RedirectRequest.jl:24
in at HTTP\IAI92\src\BasicAuthRequest.jl:21
in #request#1 at HTTP\IAI92\src\BasicAuthRequest.jl:28
in request##kw at HTTP\IAI92\src\MessageRequest.jl:28
in #request#1 at HTTP\IAI92\src\MessageRequest.jl:51
in at HTTP\IAI92\src\RetryRequest.jl:31
in #request#1 at HTTP\IAI92\src\RetryRequest.jl:44
in 56#58##kw at base\error.jl:284
in at base\error.jl:288
in at HTTP\IAI92\src\ExceptionRequest.jl:19
in #request#1 at HTTP\IAI92\src\ExceptionRequest.jl:22`

Make use of PkgBenchmark

We've started to port MIToS benchmarks to this repository (in the benchmark folder) to use PkgBenchmark.

Actually, old bechmarks for MIToS are in:

We should update those repositories to newer MIToS and Julia versions. Also, we need to include some of those MIToS' bechmarks into the benchmark suite of this package. That is going to allow us to use PkgBenchmark to test improvements or regressions on speed.

Benchmarks for other packages/languages must not be here. We only need to integrate small MIToS' benchmarks in this package. Using data in the test folder would be useful.

`count!` performance

count!, when GAPs aren't used in the ResidueContingencyTables, tests if each residue is a gap in order to avoid indexing outside the bonds. This is expensive and prevent us to use @simd. ResidueContingencyTables should have always 22x22 values, even if only 20 will be used.

Distances.jl: Incremental RAM usage on file loop

Since all the structure is loaded and compared against itself, the scripts use a lot of RAM. But when is used with a list of files, some data is not garbage collected in each iteration (using gc() and empty!() doesn't work). The script is killed by the system because of this (after ~1000 iterations on a PC with 60 Gb of RAM). This is not a problem for small list or for using with -f.

Increase test coverage to 95%

Actual test coverage is ~89%. That's great! However, would be great to increase that number to make this package more reliable and avoid future regressions ;)

It's easy to see what lines are missing tests using codecov and coveralls.

We need to find a way to test MIToS.Utils.Scripts and plot recipes. Also we need to test more Base.show and Base.print (example to test show).

SIFTS download error

Here is the code first:
sfile = downloadsifts("2VB1")

I am getting this error:

ERROR: ArgumentError: missing or unsupported scheme in URL (expected http(s) or ws(s)): ftp://ftp.ebi.ac.uk/pub/databases/msd/sifts/split_xml/vb/2vb1.xml.gz

This happens with basically any of the stuff that requires downloading from ftp. Also I can't find their XML files, it's weird. Only JSON.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.