Git Product home page Git Product logo

bioprov's Introduction

Hi, I'm Vini (he/him)

Microbial oceanography and research software engineering. Bridging the gap between metagenomics and oceanographic data.

I use VinΓ­cius W. Salazar in formal documents and texts, and Vini Salazar for everything else.

If you'd like to contact me, please write to vinicius.salazar [at] unimelb.edu.au.

bioprov's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

mym88mym

bioprov's Issues

Package requirements

Hello!

While reviewing your package, I have noticed that the requirements are not clearly stated anywhere (just several packages mentioned here and there). Please add such a section to your documentation and README file. You can just copy-paste the requirements from your setup.py file.

openjournals/joss-reviews#3622

genome_annotation workflow gives unexpected error

Usually when I run a shell command without arguments I expect it to show me all possible flags. Running bioprov kaiju does just that:

usage: kaiju [-h] -i INPUT [-o OUTPUT_DIRECTORY] -db KAIJU_DB -no NODES -na
             NAMES [--kaiju_params KAIJU_PARAMS]
             [--kaiju2table_params KAIJU2TABLE_PARAMS] [-t TAG] [-v]
             [-p THREADS]
kaiju: error: the following arguments are required: -i/--input, -db/--kaiju_db, -no/--nodes, -na/--names

However the same is not true when running bioprov genome_annotation:

Traceback (most recent call last):
  File "/home/jvfe/miniconda3/envs/bioprov/bin/bioprov", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "bioprov/bioprov", line 14, in <module>
    main()
  File "bioprov/bioprov.py", line 54, in main
    parser.parse_options(args)
  File "bioprov/workflows/wf_parser.py", line 66, in parse_options
    subparsers[options.subparser_name](options)
  File "bioprov/workflows/wf_parser.py", line 61, in <lambda>
    "genome_annotation": lambda _options: self._genome_annotation(_options),
  File "bioprov/workflows/wf_parser.py", line 33, in _genome_annotation
    main.run_steps(steps)
  File "bioprov/src/workflow.py", line 181, in run_steps
    self.generate_sampleset()
  File "bioprov/src/workflow.py", line 117, in generate_sampleset
    self.sampleset = _generate_sampleset[self.input_type]()
  File "bioprov/src/workflow.py", line 255, in _load_dataframe_input
    assert path.isfile(input_), Warnings()["not_exist"]
  File "/home/jvfe/miniconda3/envs/bioprov/lib/python3.7/genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Though bioprov genome_annotation --help does the job:

usage: genome_annotation [-h] [-i INPUT] [-c CPUS] [--verbose] [-t TAG]
                         [--steps STEPS]

Genome annotation with Prodigal, Prokka and the COG database.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input file, may be a tab delimited file or a
                        directory. If a file, must contain column 'sample-id'
                        for sample ID and 'assembly' for files. See program
                        help for information. (default: None)
  -c CPUS, --cpus CPUS  Default is set in BioProv config (half of the CPUs).
                        (default: 2)
  --verbose             More verbose output (default: False)
  -t TAG, --tag TAG     A tag for the dataset (default: None)
  --steps STEPS         A comma-delimited string of which steps will be run in
                        the workflow. Possible steps: ['prodigal'] (default:
                        ['prodigal'])

I don't know enough about argparse to think of a fix for this, though checking for len(sys.argv) might be a possible workaround.

Any reason to use .format() instead of fstrings?

Hi,

I've noticed most (if not all) string interpolation in your package uses the .format() style, instead of fstrings. I don't think it's for backwards compatibility, since your package already requires Python>=3.6 and fstrings were implemented in 3.6, so is it a question of personal preference using .format()? I ask because, even though there isn't a clear advantage of one over the other, I find fstrings more concise and easier to read.

If you plan on changing them, the output of grep -R "[\"|'].format(" . on the root directory would show all occurrences.

Environment.yml fails to build development environment

  • Bug:

When I try conda env create --file environment.yml I get several installations failing:

Solving environment: failed

ResolvePackageNotFound: 
  - openssl==1.1.1h=haf1e3a3_0
  - certifi==2020.6.20=py37h2987424_2
  - libcxx==11.0.0=h439d374_0
  - setuptools==49.6.0=py37h2987424_2
  - xz==5.2.5=haf1e3a3_1
  - tk==8.6.10=hb0a8c7a_1
  - prodigal==2.6.3=h01d97ff_2
  - python==3.7.8=hc9dea61_1_cpython
  - zlib==1.2.11=h7795811_1010
  - sqlite==3.33.0=h960bd1c_1
  - libffi==3.2.1=hb1e8313_1007
  - readline==8.0=h0678c8f_2
  - ncurses==6.2=hb1e8313_2
  • Specs:
    OS: Linux Debian 10 (Buster)
    conda version: 4.8.5

Apparently this is a common issue with conda (Unfortunately), here's an example.

I've found that removing the version hashes in the environment file (i.e. prodigal=2.6.3=h516909a_2 -> prodigal=2.6.3) fixes it and gets me an environment with the same software and versions. Though this is a very weird and not very practical solution, I can submit a pull request for it.

Add support for ProvStore API

ProvStore is an open provenance repository for W3C-PROV documents. BioProv should add support for CRUD operations using the ProvStore API.

Submit a PR that will:

  • Manage configuration of the ProvStore credentials [ ]
  • Allow CRUD operations with the ProvStore API [ ]

Non-code improvements that would be nice to have

Hi, great work on this package!

I think a few simple things that don't mess with the source code would greatly improve it:

  • Adding a CONTRIBUTING.md to the root

This would make this repo much more inviting for beginners that want to contribute (Such as myself!). I really like biopython's contribution guide, for example.

  • Adding a link to the documentation in the README.

I find the notebooks you made really nice but I've also noticed you already have some docs on readthedocs, maybe add a badge on the README to refer to them? Some people like this more "traditional" type of documentation hahah. And you could even turn the current notebooks into .rst to have a static version of them over there.

  • Improve the project's page on PyPI

Currently, your project page only has a very short description, even though your README is much more informative! You could try using it as your long description, see one of my packages for example. It would only require a simple change on:

BioProv/setup.py

Lines 16 to 21 in 178bef7

long_description=(
"BioProv is a toolkit for capturing and extracting provenance data from"
" bioinformatics workflows."
"\n"
"To know more about BioProv, please visit the Homepage."
),

  • Add an environment file for development

This one is a bit tougher, since the tests require some external software, but I believe having a conda "dev_environment.yml" file would make it a lot easier for collaborators to work on this, run tests, add features and what not. Edit: Thinking back on it, this isn't really necessary as long as it's well described in the contribution guidelines what packages/software is needed for testing, code styling, etc.

Sorry for this huge wall of text, they are mostly just suggestions, feel free to disregard any of them!

That being said, I'd be happy to draft a PR to tackle the first three suggestions, if you don't mind.

Add GitHub action "create-release"

To make this repository citable through Zenodo, it must have GitHub release. However, creating GH releases manually is a bit time consuming. Ideally, whenever a tag gets pushed, GH should create a new release.

Create a PR that will:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.