Git Product home page Git Product logo

unipept-cli's Introduction

Unipept

The Unipept web application supports biodiversity and functional analysis of large and complex metaproteome samples and the analysis of peptidomes.

The 4.0 release of Unipept brings functional analysis to the tool.

An API and command line tool are available for integration in other programs.

Contributing

Found a bug or have an idea for an awesome new feature? File an issue using the github issue tracker or drop us a line at [email protected].

If you're willing to get your hands dirty, you might of course also send us a pull request!

Installation

This application is deployed and fully functional at unipept.ugent.be. If for some reason you wish to run your own instance, you can do so by deploying this rails application and setting up a database. This isn't straightforward and you'll probably want some help, so contact us at [email protected] before you attempt an installation.

Check our Wiki-pages for a variety of different installation guides.

Who made this app?

Unipept is a research project of the computational biology group at Ghent University. If you use this application, please cite:

Current team:

  • Bart Mesuere (@bmesuere): Postdoc and lead developer
  • Pieter Verschaffelt (@pverscha): PhD student
  • Tibo Vande Moortele (@tibvdm): PhD student
  • Peter Dawyndt (@pdawyndt): Group leader and PhD supervisor

Other contributions from:

  • Felix Van der Jeugt (@ninewise): Master's student 2014 - 2016 and PhD student 2016 - 2022
  • Robbert Gurdeep Singh (@beardhatcode): Developer 2017-2018
  • Tom Naessens (@silox): Master's student 2014-2015
  • Toon Willems (@nudded): Master's student 2013-2014
  • Ewan Higgs (@ehiggs): Ghent University HPC team
  • Peter Vandamme: PhD co-supervisor of Bart
  • Bart Devreese: PhD co-supervisor of Bart

For code contributions, the contributors graph is the place to be.

unipept-cli's People

Contributors

bmesuere avatar dependabot-preview[bot] avatar dependabot[bot] avatar pcvthien avatar pdawyndt avatar pverscha avatar tibvdm avatar tomnaessens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

pcvthien kamurani

unipept-cli's Issues

Add more output options to the uniprot command

The Uniprot website offers a list of export formats. These could be added as output formats to the uniprot command:

Original issue by @bmesuere on Sat Aug 16 2014 at 14:56.
Closed by an unknown user on Wed Apr 22 2015 at 15:42.

improve help output of uniprot command

The first sentence should state the purpose of the command: fetch uniprot records based on accession numbers. The current description is way too fuzzy:

NAME
    uniprot - Command line interface to Uniprot web services.

USAGE
    uniprot [options]

DESCRIPTION
    The uniprot command is a command line wrapper around the Uniprot web
    services. The command expects a list of Uniprot Accession Numbers that
    are passed

    - as separate command line arguments

    - to standard input

    The command will give priority to the first way Uniprot Accession Numbers
    are passed, in the order as listed above. The standard input should have
    one Uniprot Accession Number per line.

    The uniprot command yields just the protein sequences as a default, but
    can return several formats.

Original issue by @bmesuere on Mon Jun 29 2015 at 10:17.
Closed by @bmesuere on Tue Jul 14 2015 at 14:15.

Building the CLI in WSL2 (Ububtu 2204.2.33.0)

I get the following error when trying to build the CLI in WSL2:

$ sudo gem install unipept
Building native extensions. This could take a while...
ERROR:  Error installing unipept:
        ERROR: Failed to build gem native extension.

    current directory: /var/lib/gems/3.0.0/gems/ffi-1.15.5/ext/ffi_c
/usr/bin/ruby3.0 -I /usr/lib/ruby/vendor_ruby -r ./siteconf20230831-9902-z89lod.rb extconf.rb
checking for ffi.h... *** extconf.rb failed ***
Could not create Makefile due to some reason, probably lack of necessary
libraries and/or headers.  Check the mkmf.log file for more details.  You may
need configuration options.

Provided configuration options:
        --with-opt-dir
        --without-opt-dir
        --with-opt-include
        --without-opt-include=${opt-dir}/include
        --with-opt-lib
        --without-opt-lib=${opt-dir}/lib
        --with-make-prog
        --without-make-prog
        --srcdir=.
        --curdir
        --ruby=/usr/bin/$(RUBY_BASE_NAME)3.0
        --with-ffi_c-dir
        --without-ffi_c-dir
        --with-ffi_c-include
        --without-ffi_c-include=${ffi_c-dir}/include
        --with-ffi_c-lib
        --without-ffi_c-lib=${ffi_c-dir}/lib
        --enable-system-libffi
        --disable-system-libffi
        --with-libffi-config
        --without-libffi-config
        --with-pkg-config
        --without-pkg-config
        --with-ffi-dir
        --without-ffi-dir
        --with-ffi-include
        --without-ffi-include=${ffi-dir}/include
        --with-ffi-lib
        --without-ffi-lib=${ffi-dir}/lib
/usr/lib/ruby/3.0.0/mkmf.rb:471:in `try_do': The compiler failed to generate an executable file. (RuntimeError)
You have to install development tools first.
        from /usr/lib/ruby/3.0.0/mkmf.rb:613:in `try_cpp'
        from /usr/lib/ruby/3.0.0/mkmf.rb:1124:in `block in have_header'
        from /usr/lib/ruby/3.0.0/mkmf.rb:971:in `block in checking_for'
        from /usr/lib/ruby/3.0.0/mkmf.rb:361:in `block (2 levels) in postpone'
        from /usr/lib/ruby/3.0.0/mkmf.rb:331:in `open'
        from /usr/lib/ruby/3.0.0/mkmf.rb:361:in `block in postpone'
        from /usr/lib/ruby/3.0.0/mkmf.rb:331:in `open'
        from /usr/lib/ruby/3.0.0/mkmf.rb:357:in `postpone'
        from /usr/lib/ruby/3.0.0/mkmf.rb:970:in `checking_for'
        from /usr/lib/ruby/3.0.0/mkmf.rb:1123:in `have_header'
        from extconf.rb:10:in `system_libffi_usable?'
        from extconf.rb:42:in `<main>'

To see why this extension failed to compile, please check the mkmf.log which can be found here:

  /var/lib/gems/3.0.0/extensions/x86_64-linux/3.0.0/ffi-1.15.5/mkmf.log

extconf failed, exit code 1

Gem files will remain installed in /var/lib/gems/3.0.0/gems/ffi-1.15.5 for inspection.
Results logged to /var/lib/gems/3.0.0/extensions/x86_64-linux/3.0.0/ffi-1.15.5/gem_make.out

where the log file contains the following:

$ cat  /var/lib/gems/3.0.0/extensions/x86_64-linux/3.0.0/ffi-1.15.5/mkmf.log
package configuration for libffi is not found
"x86_64-linux-gnu-gcc -o conftest -I/usr/include/x86_64-linux-gnu/ruby-3.0.0 -I/usr/include/ruby-3.0.0/ruby/backward -I/usr/include/ruby-3.0.0 -I. -Wdate-time -D_FORTIFY_SOURCE=2   -g -O2 -ffile-prefix-map=/build/ruby3.0-ohOwi0/ruby3.0-3.0.2=. -fstack-protector-strong -Wformat -Werror=format-security -fPIC conftest.c  -L. -L/usr/lib/x86_64-linux-gnu -L. -Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,-z,now -fstack-protector-strong -rdynamic -Wl,-export-dynamic     -lruby-3.0  -lm   -lc"
checked program was:
/* begin */
1: #include "ruby.h"
2:
3: int main(int argc, char **argv)
4: {
5:   return !!argv[argc];
6: }
/* end */

I have the following versions of ruby and gem installed:

$ ruby -v
ruby 3.0.2p107 (2021-07-07 revision 0db68f0233) [x86_64-linux-gnu]

and

$ gem -v
3.3.5

and I have the latest ruby-dev installed as well (1:3.0~exp1).

Any ideas? Has anyone installed unipept in Ubuntu in WSL2 before?

memory leak

There seems to be a memory leak:
When running the unipept pept2lca command on a 3.5GB fasta file, after 2 hours, ruby used 13GB of memory.

Original issue by @bmesuere on Fri Mar 27 2015 at 13:12.
Closed by @silox on Tue Apr 21 2015 at 13:52.

Ruby - unknown keyword: :permitted_classes (ArgumentError)

Hi,

I've tried to run Unipept Cli on a centos server.
The Ruby and Unipept Installation worked well, Unipept --version or --help returned the expected outputs and a typical
"unipept pept2lca AALTER" also worked... at first.

I then tried to give a list of peptide from a file as input and to print the results in an output file and I received this error:
Traceback (most recent call last):
13: from /home/users/bkunath/.gem/ruby/2.7.0/bin/unipept:23:in main
12: from /home/users/bkunath/.gem/ruby/2.7.0/bin/unipept:23:in load
11: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/unipept-3.0.0/bin/unipept:8:in <top (required)>
10: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/unipept-3.0.0/lib/commands/unipept.rb:373:in run
9: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/unipept-3.0.0/lib/commands/unipept.rb:43:in run
8: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/cri-2.15.11/lib/cri/command.rb:316:in run
7: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/cri-2.15.11/lib/cri/command.rb:298:in run
6: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/cri-2.15.11/lib/cri/command.rb:362:in run_this
5: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/cri-2.15.11/lib/cri/command_dsl.rb:294:in block in runner
4: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/cri-2.15.11/lib/cri/command_dsl.rb:294:in new
3: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/unipept-3.0.0/lib/commands/unipept/api_runner.rb:9:in initialize
2: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/unipept-3.0.0/lib/commands/unipept/api_runner.rb:9:in new
1: from /mnt/irisgpfs/users/bkunath/.gem/ruby/2.7.0/gems/unipept-3.0.0/lib/configuration.rb:17:in initialize
/mnt/irisgpfs/apps/resif/aion/2020b/epyc/software/Ruby/2.7.2-GCCcore-10.2.0/lib/ruby/2.7.0/psych.rb:576:in load_file: unknown keyword: :permitted_classes (ArgumentError)

And from now on, even running the command "unipept pept2lca AALTER" returns the same issue.
Do you guys have any idea why I have that? I guess it's due to ruby but I don't know enough about it to try to fix it.

Thanks a lot for any support you may bring.

Ben

Export as visualization

As of right now, we are building a new feature that allows users to construct a lineage tree from a list of NCBI taxonomy id's. This tree can be exported to an HTML-file which contains the Unipept visualizations. Some other commands provided by the CLI could also benefit from this.

(requested by @tivdnbos)

Extra whitespace after go terms

When running pept2go, extra whitespace is included after the list of go terms:

AAAINTIAHSTGAAK,46,GO:0051287 GO:0050661 GO:0006006 GO:0016620 GO:0004365   ,46 46 46 27 20

unipept command does not properly combine option letters

The UniPept CLI does not properly combine options if the last option has a required argument. The following two command lines are expected to give the same outcome.

$ unipept taxonomy -a -s phylum_name 35493
phylum_name
Streptophyta
$ unipept taxonomy -as phylum_name 35493
taxonomy: option requires an argument -- s

Original issue by @pdawyndt on Thu Apr 23 2015 at 15:32.

pept2prot -a no output

The following peptides do not output anything (use together) when using pept2prot with the -a option, whereas without the -a option it does work.

>M00270:109:000000000-A8DMU:1:1101:18871
NLDAK
DLPLLLLITVIAAWQLR

Original issue by calmalak on Tue Dec 08 2015 at 13:56.
Closed by @bmesuere on Tue Dec 08 2015 at 14:41.

Real FASTA output

(moved from https://github.ugent.be/unipept/unipept-metagenomics-scripts/issues/17)

At the moment we have FASTA in, CSV (with a FASTA column out) when one might expect a FASTA out file.

Now:

[10:17] silox`@Gallifrey` ~/Documents/Projects ✔ unipept pept2prot -i test.fst 
fasta_header,peptide,uniprot_id,taxon_id
>a|1,IITHPNFNGNTLDNDIMLIK,P00761,9823
>a|1,IITHPNFNGNTLDNDIMLIK,F1SRS2,9823
>a|1,IITHPNFNGNTLDNDIMLIK,C5IWV5,9823
>b|2,IITHPNFNGNTLDNDIMLIK,P00761,9823
>b|2,IITHPNFNGNTLDNDIMLIK,F1SRS2,9823
>b|2,IITHPNFNGNTLDNDIMLIK,C5IWV5,9823

Later:

[10:17] silox`@Gallifrey` ~/Documents/Projects ✔ unipept pept2prot -i test.fst 
fasta_header,peptide,uniprot_id,taxon_id
>a|1
IITHPNFNGNTLDNDIMLIK,P00761,9823
IITHPNFNGNTLDNDIMLIK,F1SRS2,9823
IITHPNFNGNTLDNDIMLIK,C5IWV5,9823
>b|2
IITHPNFNGNTLDNDIMLIK,P00761,9823
IITHPNFNGNTLDNDIMLIK,F1SRS2,9823
IITHPNFNGNTLDNDIMLIK,C5IWV5,9823

Original issue by @ninewise on Fri Jun 15 2018 at 17:21.
Closed by @ninewise on Sat Jun 16 2018 at 23:07.

1.0 checklist

List of things preventing a public 1.0 release:
bugs

  • bmesuere/unipept#305: show help when running without (valid) options
  • bmesuere/unipept#335: disable service message

features

  • dedicated machine for api.unipept.ugent.be
  • fix readme and license
  • bmesuere/unipept#229: installation documentation
  • bmesuere/unipept#257: rewrite CLI help (#6)
  • bmesuere/unipept#261: check if the fasta parsing is correct
  • add documentation to the unipept home page
  • bmesuere/unipept#343: add tests
  • #23: retry requests
  • #15: update batch sizes based on testing
  • #41: add the fasta headers to the xml and json output

All old remaining issues were moved from unipept/unipept. The old ones can be found under https://github.ugent.be/bmesuere/unipept/issues?milestone=18&state=closed

Some external people are already using the gem, so from now on a changelog should be kept.

Original issue by @bmesuere on Fri Aug 15 2014 at 14:51.
Closed by @bmesuere on Tue Jul 14 2015 at 14:17.

Output does not seem to be grouped correctly

When having a fasta file containing peptides and running it through one of the tools, the output should be grouped does not seem to be grouped. For example if I have a file containing

>|ABO11181.1
AEFSAQK
ASDIAVFETTSLQDYYFVIDAIR
DSEFCTVPYFEFVEIIPAIEDGYVEYES
LEPIFEK
MALAR
WYDVEAFSTK
YPSVK
>|ABO11182.2
ADVFIV
AEQYDLLDQLILNWAK
AFGTDDSVSFK
AFTDLSDLVNYHPDILTNWLPEGK
DFNFIFADK
DGNDINYSATK
DGSQASNGYAALAEFDSNGDGK
DIYIFQSGHGQDIINDK
EAAALSPELAETLK
EANEVR
EFYSWFPDDWNPWK
EGIYTGLLYQTR
ENLNQFVSFEK
ESQAMLDK
FEGANFTDAQFLR
GIDQQIAQLSLISTGLGFTIEFAK
GPQIADALANALSEIQK
IHHILPILDAFSGSKPSQIYYENEADIK
ILAGFGNDATGGNGVDTWSDFNISQGDK
ILEQSGFIIGTK
INISELIIGPASK
LDDTLNGTK
LDDYLAEIELNLQATDPTELFNYSGIQEK
LEQIFIINPEK
LLILNNQTNVNSLNDLIK
LNGGAGQDTLIGGSGNAVMTGGDYEK
MGSSLIIQANELYSTVNDLNSAISQGDGK
MQHDGTFPELDPGLMAGFAGEVLEGWSK
MVSWVK
NGDGVINDGSELFGDSVTLK
NNDTLNGGWGNDK
NSDGVK
NVFNIINNAYVSLK
NYLNDNDSYSR
NYSNAETK
SGSTVTLSLDR
SPEAVAGLTGLK
SYDWTNLQYFNDVK
TANEGIALTPGQAAIVTLAAPLSK
TDPDFK
TLLIQFLNEVIEQGLWDDLASK
TLTTADVMNIVIPLNGTDGNDVQNGWK
TSTGWVGSDDGILVLDR
VGNDLVIK
VNAEDTNFEQLK
WAGVLFDHDNDGIR
YHIYDPVVLDLDGDGIETIAANK
>|ABO11183.2
AEFSAQK
ALPAWLALAR
ASDIAVFETTSLQDYYFVIDAIR
DSEFCTVPYFEFVEIIPAIEDGYVEYESSL
LEPIFEK
TYTFFIHLR         
WYDVEAFSTK   
YPTVK                                                                     
>|ABO11183.2,AEFSAQK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11183.2,ASDIAVFETTSLQDYYFVIDAIR,469,Acinetobacter,genus
>|ABO11181.1,DSEFCTVPYFEFVEIIPAIEDGYVEYES,470,Acinetobacter baumannii,species
>|ABO11183.2,LEPIFEK,2,Bacteria,superkingdom
>|ABO11181.1,MALAR,1,root,no rank
>|ABO11183.2,WYDVEAFSTK,469,Acinetobacter,genus
>|ABO11181.1,YPSVK,1,root,no rank
>|ABO11182.2,ADVFIV,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,AEQYDLLDQLILNWAK,470,Acinetobacter baumannii,species
>|ABO11182.2,AFGTDDSVSFK,470,Acinetobacter baumannii,species
>|ABO11182.2,AFTDLSDLVNYHPDILTNWLPEGK,470,Acinetobacter baumannii,species
>|ABO11182.2,DFNFIFADK,470,Acinetobacter baumannii,species
>|ABO11182.2,DGNDINYSATK,470,Acinetobacter baumannii,species
>|ABO11182.2,DGSQASNGYAALAEFDSNGDGK,470,Acinetobacter baumannii,species
>|ABO11182.2,DIYIFQSGHGQDIINDK,470,Acinetobacter baumannii,species
>|ABO11182.2,EAAALSPELAETLK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,EANEVR,1,root,no rank
>|ABO11182.2,EFYSWFPDDWNPWK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,EGIYTGLLYQTR,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,ENLNQFVSFEK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,ESQAMLDK,470,Acinetobacter baumannii,species
>|ABO11182.2,FEGANFTDAQFLR,470,Acinetobacter baumannii,species
>|ABO11182.2,GIDQQIAQLSLISTGLGFTIEFAK,470,Acinetobacter baumannii,species
>|ABO11182.2,GPQIADALANALSEIQK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,IHHILPILDAFSGSKPSQIYYENEADIK,470,Acinetobacter baumannii,species
>|ABO11182.2,ILAGFGNDATGGNGVDTWSDFNISQGDK,470,Acinetobacter baumannii,species
>|ABO11182.2,ILEQSGFIIGTK,470,Acinetobacter baumannii,species
>|ABO11182.2,INISELIIGPASK,470,Acinetobacter baumannii,species
>|ABO11182.2,LDDTLNGTK,470,Acinetobacter baumannii,species
>|ABO11182.2,LDDYLAEIELNLQATDPTELFNYSGIQEK,470,Acinetobacter baumannii,species
>|ABO11182.2,LEQIFIINPEK,470,Acinetobacter baumannii,species
>|ABO11182.2,LLILNNQTNVNSLNDLIK,470,Acinetobacter baumannii,species
>|ABO11182.2,LNGGAGQDTLIGGSGNAVMTGGDYEK,470,Acinetobacter baumannii,species
>|ABO11182.2,MGSSLIIQANELYSTVNDLNSAISQGDGK,470,Acinetobacter baumannii,species
>|ABO11182.2,MQHDGTFPELDPGLMAGFAGEVLEGWSK,470,Acinetobacter baumannii,species
>|ABO11182.2,MVSWVK,1,root,no rank
>|ABO11182.2,NGDGVINDGSELFGDSVTLK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,NNDTLNGGWGNDK,470,Acinetobacter baumannii,species
>|ABO11182.2,NSDGVK,1,root,no rank
>|ABO11182.2,NVFNIINNAYVSLK,470,Acinetobacter baumannii,species
>|ABO11182.2,NYLNDNDSYSR,470,Acinetobacter baumannii,species
>|ABO11182.2,NYSNAETK,470,Acinetobacter baumannii,species
>|ABO11182.2,SGSTVTLSLDR,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,SPEAVAGLTGLK,470,Acinetobacter baumannii,species
>|ABO11182.2,SYDWTNLQYFNDVK,470,Acinetobacter baumannii,species
>|ABO11182.2,TANEGIALTPGQAAIVTLAAPLSK,470,Acinetobacter baumannii,species
>|ABO11182.2,TDPDFK,1,root,no rank
>|ABO11182.2,TLLIQFLNEVIEQGLWDDLASK,470,Acinetobacter baumannii,species
>|ABO11182.2,TLTTADVMNIVIPLNGTDGNDVQNGWK,470,Acinetobacter baumannii,species
>|ABO11182.2,TSTGWVGSDDGILVLDR,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,VGNDLVIK,469,Acinetobacter,genus
>|ABO11182.2,VNAEDTNFEQLK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,WAGVLFDHDNDGIR,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11182.2,YHIYDPVVLDLDGDGIETIAANK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11183.2,AEFSAQK,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11183.2,ALPAWLALAR,469,Acinetobacter,genus
>|ABO11183.2,ASDIAVFETTSLQDYYFVIDAIR,469,Acinetobacter,genus
>|ABO11183.2,DSEFCTVPYFEFVEIIPAIEDGYVEYESSL,909768,Acinetobacter calcoaceticus/baumannii complex,species group
>|ABO11183.2,LEPIFEK,2,Bacteria,superkingdom
>|ABO11183.2,TYTFFIHLR,469,Acinetobacter,genus
>|ABO11183.2,WYDVEAFSTK,469,Acinetobacter,genus
>|ABO11183.2,YPTVK,1,root,no rank

Original issue by @silox on Sun Mar 29 2015 at 21:12.
Closed by @silox on Sun Mar 29 2015 at 21:40.

improve error message for unipept taxa2lca with empty input

Provide better error message is no input is passed to unipept taxa2lca command.

$ unipept taxa2lca
received a non-successful http response 500, continuing anyway, but results might be incomplete
API request failed! log can be found in /users/p/pdawyndt/.unipept/unipept-2015-07-10-16:30:07.log

Original issue by @pdawyndt on Fri Jul 10 2015 at 16:32.
Closed by @bmesuere on Tue Jul 14 2015 at 14:12.

unipept pept2lca --select bug with fasta headers

When I tried running the command cat test | unipept pept2lca --select taxon_id I got fasta_header,taxon_id as output.

Contents of the test file:

>MG00HS19:771:HKNTHBCXX:2:1101:8323:3895_1_317_+
VGFLVQGTLYPDVIESASPK
GGPSVTIK
THHNVGGLPEQLPFK
LIEPLR
ELGLPEEMVGR
HPFPGPGLAIR
ILGEVTQDQLDILR
ADHIYTE
>MG00HS19:771:HKNTHBCXX:2:1101:8643:3889_1_129_-
EQDGGTSR
ALAESEGTTK
QAQQEAAR
IAFER
LLLEESQEGS

This seems to only happen when I add fasta headers to the file.

investigate the performance of prot2pept

Splitting peptides is slow. If I recall correctly, we only achieved +- 2MB/s. A simple reimplementation in sed was equally slow. It's worth experimenting with faster implementations because this is becoming a bottleneck.

One option might be to stop using regular expressions and just iterate over all characters of the string to determine the split sites. The current implementation can be used as a fallback when a user supplies its own pattern.

Original issue by @bmesuere on Sun Dec 20 2015 at 15:12.

Results with more than one peptide per fasta header aren't assigned a header

For example:

>a|1
AALTER
AAALTER
>b|2
AALTER
>c|3
AENSGVDLPR

results into

fasta_header,peptide,taxon_id,taxon_name,taxon_rank
>a|1,AALTER,1,root,no rank
>a|1,AAALTER,1,root,no rank
>b|2,AALTER,1,root,no rank
>c|3,AENSGVDLPR,469,Acinetobacter,genus

for pept2lca, but for pept2prot which returns multiple results per header:

fasta_header,peptide,uniprot_id,taxon_id
>a|1,AALTER,Q9C7G1,3702
>b|2,AALTER,A6W6A1,266940
,AALTER,Q3ME15,240292
,AALTER,Q05858,9031
,AALTER,P78330,9606
,AALTER,Q8YWF0,103690
,AALTER,Q5RB83,9601
,AALTER,Z9JY27,396014
...
>a|1,AAALTER,Q5ZI11,9031
,AAALTER,X0Q443,1219028
,AAALTER,U2FTN9,1033802
,AAALTER,Q0S341,101510
,AAALTER,H2K0Z4,1133850
,AAALTER,J2J5K2,745408
,AAALTER,N0AT69,1249661
,AAALTER,W9ANG4,258533
...
,AALTER,Q9C7G1,3702
,AALTER,A6W6A1,266940
,AALTER,Q3ME15,240292
,AALTER,Q05858,9031
,AALTER,P78330,9606
,AALTER,Q8YWF0,103690
,AALTER,Q5RB83,9601
,AALTER,Z9JY27,396014
...
>c|3,AENSGVDLPR,B0VAF3,509173
,AENSGVDLPR,A3M0Q4,400667
,AENSGVDLPR,B2HZA7,405416
,AENSGVDLPR,B7GUX5,557600
,AENSGVDLPR,Q6FG21,62977
,AENSGVDLPR,B0VMK0,509170
,AENSGVDLPR,B7IBH7,480119
,AENSGVDLPR,N8VQM3,1144664
,AENSGVDLPR,N9MRP1,1217694
,AENSGVDLPR,A0A015C9Y8,1311004
...

Original issue by @silox on Fri Apr 10 2015 at 13:45.
Closed by @silox on Tue Apr 21 2015 at 14:09.

Commands return too few results

For example:
unipept pept2go AALTER
returns:
peptide,total_protein_count,go_term,go_protein_count AALTER,1425,GO:0003677 GO:0006351 GO:0005524 GO:0006355 GO:0016021 GO:0005622 GO:0004842 GO:0000160,268 189 151 148 136 110 108 88

This should return a lot more go terms. The same happens with the other functional annotations.

Should we run prot2pept automatically?

@tivdnbos suggested that a lot of potential users are confused by the Unipept CLI when they are trying to use it with non-tryptic peptides. The expected behaviour for them is that the inputted peptides are automatically cleaved and that for each of the resulting tryptic peptides the desired results are returned. This issue serves as a discussion point for this behaviour and wether we should update our CLI such that it automatically performs a prot2pept-step when needed?

Unipept Taxonomy doesn't work with fastafiles

(Notice: I ran this on an older version (before #12) as on the master HEAD)

When running a FASTA file trough the unipept taxonomy command, no fasta headers are prepended.

For example:

test.fst

>a|1
1
2
3
1
>b|2
4
5
>c|3
1

Output:

fasta_header,taxon_id,taxon_name,taxon_rank
,1,root,no rank
,2,Bacteria,superkingdom

Original issue by @silox on Sat Apr 11 2015 at 10:48.
Closed by @silox on Tue Apr 21 2015 at 14:09.

Make tests independent of live database

Right now, the tests are using the database that's live on the server, which means that the results produced by the CLI are dependent on the data in this database. Updating the database, means that the tests will fail.

By filling a test database with dummy data during the tests (and running the tests on this database), we can alleviate this problem.

empty fields don't have comma's in pept2go

when running pept2go (maybe others) and selecting the go term fields, if there are no go terms the fields aren't outputted. I would expect there to be 2 comma's to indicate the empty fields. See the last line in this example:

peptide,total_protein_count,go_term,go_protein_count
AAAINTIAHSTGAAK,46,GO:0051287 GO:0050661 GO:0006006 GO:0016620 GO:0004365   ,46 46 46 27 20
AAAINTIPHSTGAAK,50,GO:0051287 GO:0050661 GO:0006006 GO:0016620 GO:0004365   ,50 50 50 28 22
AAALNIVPNSTGAAK,85,GO:0051287 GO:0016620 GO:0050661 GO:0006006 GO:0004365   ,83 83 83 83 1
AAAMSMIPTSTGAAK,460,GO:0051287 GO:0050661 GO:0006006 GO:0016620 GO:0004365 GO:0043891 GO:0047100 GO:0005737,402 402 402 372 65 16 1 1
AAANESFGYNEDEIVSSDIVGMR,56,GO:0051287 GO:0016620 GO:0050661 GO:0006006    ,56 56 56 56
AAANYLDIPLYR,123,GO:0000287 GO:0004634 GO:0000015 GO:0006096 GO:0005576 GO:0009986  ,123 123 123 123 118 118
AAAVNIVPNSTGAAK,383,GO:0016620 GO:0051287 GO:0050661 GO:0006006    ,381 375 375 375
AADAAAAIGEGLQAFCIPGSVADHR,0

CLI sends fasta headers to the server when the fasta header is the last entry in a batch

Input gets split up into batches. While iterating over a batch from a fastafile, a hash (fasta_mapper) is computed by mapping each peptide to its fasta header (or headers in #12). Afterwards, the values of this hash gets substracted from the input, yielding only the peptides without fasta headers.

This however doesn't work in the case when the fasta header is the last entry in a batch. For example:

Batch:

>|ABO10505.2
LNFSAEDK
MLTTQNGTNYEVVGIVQIGLAYLFVR
MMEEWALAAK
NLNSIEK
QGVYDATMMSVLK
QLPQNFAMVK
SGESYK
SWQADITLIPFQDEALVDR
TNFECTLTGE
>|ABO10506.2

fasta_mapper:

{"LNFSAEDK"=>">|ABO10505.2", "MLTTQNGTNYEVVGIVQIGLAYLFVR"=>">|ABO10505.2", "MMEEWALAAK"=>">|ABO10505.2", "NLNSIEK"=>">|ABO10505.2", "QGVYDATMMSVLK"=>">|ABO10505.2", "QLPQNFAMVK"=>">|ABO10505.2", "SGESYK"=>">|ABO10505.2", "SWQADITLIPFQDEALVDR"=>">|ABO10505.2", "TNFECTLTGE"=>">|ABO10505.2"}

Batch after substracting the values from the batch:

LNFSAEDK
MLTTQNGTNYEVVGIVQIGLAYLFVR
MMEEWALAAK
NLNSIEK
QGVYDATMMSVLK
QLPQNFAMVK
SGESYK
SWQADITLIPFQDEALVDR
TNFECTLTGE
>|ABO10506.2

The >|ABO10506.2 isn't subtracted as there is no peptide in the batch corresponding to this fasta header.

Original issue by @silox on Sun Apr 05 2015 at 12:14.
Closed by @silox on Wed Apr 08 2015 at 18:17.

taxonomy command does shuffles order of incoming taxa

To be compliant with the other commands in the Unipept CLI, the taxonomy command should preserve the order and frequency of the incoming taxa. It should also return an entry for those taxa that could not be retrieved from the Unipept database, to make the number of input lines equal the number of output lines.

Original issue by @pdawyndt on Fri May 08 2015 at 13:31.
Closed by @bmesuere on Tue May 19 2015 at 21:04.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.