vinisalazar / bioprov Goto Github PK
View Code? Open in Web Editor NEWA provenance library for bioinformatics workflows ๐งฌ ๐ ๐
Home Page: https://bioprov.readthedocs.io/
License: MIT License
A provenance library for bioinformatics workflows ๐งฌ ๐ ๐
Home Page: https://bioprov.readthedocs.io/
License: MIT License
When I try conda env create --file environment.yml
I get several installations failing:
Solving environment: failed
ResolvePackageNotFound:
- openssl==1.1.1h=haf1e3a3_0
- certifi==2020.6.20=py37h2987424_2
- libcxx==11.0.0=h439d374_0
- setuptools==49.6.0=py37h2987424_2
- xz==5.2.5=haf1e3a3_1
- tk==8.6.10=hb0a8c7a_1
- prodigal==2.6.3=h01d97ff_2
- python==3.7.8=hc9dea61_1_cpython
- zlib==1.2.11=h7795811_1010
- sqlite==3.33.0=h960bd1c_1
- libffi==3.2.1=hb1e8313_1007
- readline==8.0=h0678c8f_2
- ncurses==6.2=hb1e8313_2
Apparently this is a common issue with conda (Unfortunately), here's an example.
I've found that removing the version hashes in the environment file (i.e. prodigal=2.6.3=h516909a_2
-> prodigal=2.6.3
) fixes it and gets me an environment with the same software and versions. Though this is a very weird and not very practical solution, I can submit a pull request for it.
ProvStore is an open provenance repository for W3C-PROV documents. BioProv should add support for CRUD operations using the ProvStore API.
Submit a PR that will:
v0.1.13 will include a new dependency, TinyDB, which will be the database system for BioProv.
Must update environment.yml with the following packages:
To make this repository citable through Zenodo, it must have GitHub release. However, creating GH releases manually is a bit time consuming. Ideally, whenever a tag gets pushed, GH should create a new release.
Create a PR that will:
Hello!
While reviewing your package, I have noticed that the requirements are not clearly stated anywhere (just several packages mentioned here and there). Please add such a section to your documentation and README file. You can just copy-paste the requirements from your setup.py file.
Usually when I run a shell command without arguments I expect it to show me all possible flags. Running bioprov kaiju
does just that:
usage: kaiju [-h] -i INPUT [-o OUTPUT_DIRECTORY] -db KAIJU_DB -no NODES -na
NAMES [--kaiju_params KAIJU_PARAMS]
[--kaiju2table_params KAIJU2TABLE_PARAMS] [-t TAG] [-v]
[-p THREADS]
kaiju: error: the following arguments are required: -i/--input, -db/--kaiju_db, -no/--nodes, -na/--names
However the same is not true when running bioprov genome_annotation
:
Traceback (most recent call last):
File "/home/jvfe/miniconda3/envs/bioprov/bin/bioprov", line 7, in <module>
exec(compile(f.read(), __file__, 'exec'))
File "bioprov/bioprov", line 14, in <module>
main()
File "bioprov/bioprov.py", line 54, in main
parser.parse_options(args)
File "bioprov/workflows/wf_parser.py", line 66, in parse_options
subparsers[options.subparser_name](options)
File "bioprov/workflows/wf_parser.py", line 61, in <lambda>
"genome_annotation": lambda _options: self._genome_annotation(_options),
File "bioprov/workflows/wf_parser.py", line 33, in _genome_annotation
main.run_steps(steps)
File "bioprov/src/workflow.py", line 181, in run_steps
self.generate_sampleset()
File "bioprov/src/workflow.py", line 117, in generate_sampleset
self.sampleset = _generate_sampleset[self.input_type]()
File "bioprov/src/workflow.py", line 255, in _load_dataframe_input
assert path.isfile(input_), Warnings()["not_exist"]
File "/home/jvfe/miniconda3/envs/bioprov/lib/python3.7/genericpath.py", line 30, in isfile
st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
Though bioprov genome_annotation --help
does the job:
usage: genome_annotation [-h] [-i INPUT] [-c CPUS] [--verbose] [-t TAG]
[--steps STEPS]
Genome annotation with Prodigal, Prokka and the COG database.
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input file, may be a tab delimited file or a
directory. If a file, must contain column 'sample-id'
for sample ID and 'assembly' for files. See program
help for information. (default: None)
-c CPUS, --cpus CPUS Default is set in BioProv config (half of the CPUs).
(default: 2)
--verbose More verbose output (default: False)
-t TAG, --tag TAG A tag for the dataset (default: None)
--steps STEPS A comma-delimited string of which steps will be run in
the workflow. Possible steps: ['prodigal'] (default:
['prodigal'])
I don't know enough about argparse to think of a fix for this, though checking for len(sys.argv)
might be a possible workaround.
Hi, great work on this package!
I think a few simple things that don't mess with the source code would greatly improve it:
This would make this repo much more inviting for beginners that want to contribute (Such as myself!). I really like biopython's contribution guide, for example.
I find the notebooks you made really nice but I've also noticed you already have some docs on readthedocs, maybe add a badge on the README to refer to them? Some people like this more "traditional" type of documentation hahah. And you could even turn the current notebooks into .rst to have a static version of them over there.
Currently, your project page only has a very short description, even though your README is much more informative! You could try using it as your long description, see one of my packages for example. It would only require a simple change on:
Lines 16 to 21 in 178bef7
This one is a bit tougher, since the tests require some external software, but I believe having a conda "dev_environment.yml" file would make it a lot easier for collaborators to work on this, run tests, add features and what not. Edit: Thinking back on it, this isn't really necessary as long as it's well described in the contribution guidelines what packages/software is needed for testing, code styling, etc.
Sorry for this huge wall of text, they are mostly just suggestions, feel free to disregard any of them!
That being said, I'd be happy to draft a PR to tackle the first three suggestions, if you don't mind.
Hi,
I've noticed most (if not all) string interpolation in your package uses the .format() style, instead of fstrings. I don't think it's for backwards compatibility, since your package already requires Python>=3.6 and fstrings were implemented in 3.6, so is it a question of personal preference using .format()? I ask because, even though there isn't a clear advantage of one over the other, I find fstrings more concise and easier to read.
If you plan on changing them, the output of grep -R "[\"|'].format(" .
on the root directory would show all occurrences.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.