nf-core / tools Goto Github PK
View Code? Open in Web Editor NEWPython package with helper tools for the nf-core community.
Home Page: https://nf-co.re
License: MIT License
Python package with helper tools for the nf-core community.
Home Page: https://nf-co.re
License: MIT License
The linting tests give lots of information on pull requests, but you have to dig into the travis results to find them. It would be super cool if we could set up a GitHub app or something which the nf-core lint
service could use to post a comment with the linting results to the PR directly.
Low priority, but would be fun ๐ค
We should really do a release on at least PyPI soon.
xref: nf-core/rnaseq#32
It could be nice to have a subcommand to help with the installation of required software. Something like:
Each should be optional and preferably work on linux and mac. These commands can also then be used in the CI tests (eg. to install nextflow and singularity).
One of the most common problems for users with singularity is that they don't have overlayFS configured, so the pipeline fails with directory mounting issues.
This subcommand would generate a Singularity image script based on the main pipeline, but dynamically adding in all base directories (detected or specified) so that mounting works properly.
Steps:
nf-core
tools subcommand fix-overlay
, takes name of target pipelineSingularity
file from the target pipelineFrom
and %post
statements in Singularity file/tmp
and run singularity build
to make a new image with the required mount pointsFrom @ewels on July 12, 2018 7:5
The docs are a little sparse in a few places, and some other pipelines have better, more comprehensive stuff written already (such as the methylseq pipeline). Would be good to go through all pipelines and homogenise the docs across all of them.
Copied from original issue: nf-core/cookiecutter#38
Previously, we created a conda environment with a specific name and then manually added this directory to the PATH
. This worked fine, but felt a little hacky. After some discussion (@apeltzer @sven1103 @ewels @pditommaso and others), we removed it. Instead, we now install the environment to the base conda environment.
This all seemed fine, until we just came across a strange bug. After some investigation, it seems that the host filesystem conda installation was taking priority over the base conda environment (due to configuration files in the home directory). Combined with conflicting path mounts we ended up with a steaming mess and non-functional software. This issue creates two problems:
Either we need a fix, or we can revert to the previous method. Manually prepending to the PATH
env skips all of these problems, as we're essentially not using conda any more.
The old syntax, e.g.
process$name
Will be replaced by
process:withName
Which currently leads to an error while linting, see here
ICGC-featureCounts> nf-core lint . (nf-core-lint)
,--./,-.
___ __ __ __ ___ /,-._.--~\
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
Running pipeline tests [##################------------------] 50% 'check_nextflow_config'
CRITICAL: Critical error: ('`nextflow config` returned non-zero error code: %s,\n %s', 1, b"ERROR ~ Unable to parse config file: '/home/alex/IDEA/nf-core/ICGC-featureCounts/nextflow.config'\n\n Compile failed for sources FixedSetSources[name='/groovy/script/ScriptACEE592A55CA6E05E4ED54DBAB544DAD']. Cause: org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:\n /groovy/script/ScriptACEE592A55CA6E05E4ED54DBAB544DAD: 26: expecting '}', found ':' @ line 26, column 12.\n .withName:fetch_encrypted_s3_url {\n ^\n \n 1 error\n\n\n")
INFO: Stopping tests...
INFO: ===========
LINTING RESULTS
=================
16 tests passed 0 tests had warnings 0 tests failed
Should we support both or just the new syntax?
From @sven1103 on July 9, 2018 8:46
Dear all,
just started to port a pipeline into nf-core, and realized, that it might me a cool addition to have //TODO <description>
comment lines in the cookiecutter template, for example for the help message function you need to adapt, and so on.
Then you could easily display them in your favourite IDE and don't forget something to implement/change. Moreover, we could also check on a release with nf-core/tools, that all //TODO
tags are removed in the code?
Just brainstorming here, happy for your feedback!
Have a tag //TODO nf-core:
.
Were?:
main.nf
Dockerfile
Singularity
nextflow config
Best, Sven
Copied from original issue: nf-core/cookiecutter#34
Currently the create command makes sure that the TEMPLATE
branch is possible to create in an easy way. I think we should take it a step further and the very least recommend the user to create it and push it to github.
Related to #88.
@pditommaso has added a new core nextflow feature for checking the required version of nextflow, see nextflow-io/nextflow#752
This means that we need to update the current linting test for our nextflow version check, and update all the pipelines.
A common problem that is only going to get worse is that we want to make a change in some code that is shared across all pipelines.
To facilitate this, we need a tool which uses the git history to find changes that can also be applied in other pipelines.
Ideally, this will be used by an automated GitHub robot of some kind which will automatically make the changes an open PRs in branches on all pipelines (as done in conda-forge and several other communities).
From @ewels on August 8, 2018 9:31
Normally, ignored tasks are a bad thing in nf-core pipelines. It would be good to add some template code that flags a big warning when pipelines finish if any tasks errored and were ignored.
Copied from original issue: nf-core/cookiecutter#58
Travis builds on pushes, PRs and also tags (which for us are the same as releases). It would be great if there could be a set of tests which are specific to releases.
Suggestions:
--release
flag for nf-core lint--release
flag, when git activity happens on the master
branch (check env var TRAVIS_BRANCH
)On --release
set:
latest
Bonus points for challenging ones:
Need to update version on homebrew-bio
From @apeltzer on July 28, 2018 16:38
The files local.md
and adding_your_own.md
have similar (if not identical) sections for Docker and Singularity - we should remove these in one location at least :-)
Copied from original issue: nf-core/cookiecutter#46
From @apeltzer on August 2, 2018 13:6
Hi!
I think we could have a generic AWSBatch configuration in cookiecutter.
Did so for ICGC-featureCounts but we could use a similar way (open for ideas if thats not optimal in your opinion):
AWSBatch config:
https://github.com/nf-core/ICGC-featureCounts/blob/master/conf/awsbatch.config
Main Nextflow.config / setting some defaults:
https://github.com/nf-core/ICGC-featureCounts/blob/master/nextflow.config
And then allowing users to specify the required params /also using that in the summary if the proper profile is used:
I guess this could as well be extrapolated to other pipelines easily and could be in cookiecutter therefore!
I'd be happy to contribute that but would like to have some more feedback/ideas on this before moving on...
Copied from original issue: nf-core/cookiecutter#48
If pipelines have a bioconda environment.yml
, we should be able to check:
When running with --release
:
From @ewels on May 24, 2018 15:19
Will soon be available in release 0.30.x
. See:
https://github.com/nextflow-io/nextflow/blob/master/docs/conda.rst
https://github.com/nextflow-io/nextflow/blob/master/docs/process.rst#conda
https://github.com/nextflow-io/nextflow/blob/master/docs/config.rst#scope-conda
Copied from original issue: nf-core/cookiecutter#27
conda-forge
should now have highest priority, see bioconda/bioconda-recipes#10924 (comment)
Since we switched, we should replace docker hub with docker cloud everywhere in the docs etc.
Would be quite a nice test case for the new automation :-)
The new checkIfExists option was added in nextflow-io/nextflow#666 and is now released. Would be great to use it!
See https://gitter.im/nextflow-io/nextflow?at=5ae09da61130fe3d361684a1 for reason:
star_index = Channel
.fromPath(params.star_index)
.ifEmpty { exit 1, "STAR index not found: ${params.star_index}" }
However, as far as I can see, this does not work if a wrong file path is given since fromPath()
doesn't check for file existence and therefore, the channel is not empty.
It would be great if we could check that all options (params
) are described in the markdown documentation. We'll need the ability to have a list of exceptions, but it would ensure that the docs are kept up to date.
Related to #155 - we need to update all occurrences of the https://hub.docker.com URLs from the pipeline documentation and website. These should now point to https://cloud.docker.com
Just a thought:
Maybe we could have the Dockerfiles in a similar fashion than here, to keep things consistent.
I thought about having the following:
conda
all required packages)conda
packages to be installedThis way we could checksum the Dockerfile (or even provide it via cookiecutter) and then have people only add what kind of dependencies they want in a specific version.
This also provides us with the possibility to create a Tag that is identical to the Tag for the pipeline in NXF = keeping things consistently pulled as well!
Thoughts?
I will create another issue in cookiecutter to link to this here...
Here's how to reproduce the bugs referenced in #19
python --version
Python 3.6.3 :: Anaconda, Inc.
git clone https://github.com/SciLifeLab/NGI-NeutronStar.git
cd NGI-NeutronStar
git reset --hard 59bfe4e717419d1e3667422cd486071073b41bcd
nf-core lint .
Traceback (most recent call last):
File "/Users/remi-andreolsen/miniconda3/envs/py3/bin/nf-core", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/Users/remi-andreolsen/code/nf-core-tools/scripts/nf-core", line 10, in <module>
import nf_core.lint
File "/Users/remi-andreolsen/code/nf-core-tools/nf_core/lint.py", line 272
except AssertionError, KeyError:
^
SyntaxError: invalid syntax
nf-core lint .
Running pipeline tests [######------------------------------] 16%
Traceback (most recent call last):
File "/Users/remi-andreolsen/miniconda3/envs/py3/bin/nf-core", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/Users/remi-andreolsen/code/nf-core-tools/scripts/nf-core", line 50, in <module>
nf_core_cli()
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Users/remi-andreolsen/code/nf-core-tools/scripts/nf-core", line 34, in lint
lint_obj.lint_pipeline()
File "/Users/remi-andreolsen/code/nf-core-tools/nf_core/lint.py", line 64, in lint_pipeline
getattr(self, fname)()
File "/Users/remi-andreolsen/code/nf-core-tools/nf_core/lint.py", line 107, in check_files_exist
self.files.append(f)
NameError: name 'f' is not defined
f
-> files
git reset --hard c92ce0d99baff39bfcbb36c64be03160dc0331f8
touch .travis.yml CHANGELOG.MD docs/README.md docs/output.md docs/usage.md
nf-core lint .
Running pipeline tests [########################------------] 66%
Traceback (most recent call last):
File "/Users/remi-andreolsen/miniconda3/envs/py3/bin/nf-core", line 6, in <module>
exec(compile(open(__file__).read(), __file__, 'exec'))
File "/Users/remi-andreolsen/code/nf-core-tools/scripts/nf-core", line 50, in <module>
nf_core_cli()
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/remi-andreolsen/miniconda3/envs/py3/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/Users/remi-andreolsen/code/nf-core-tools/scripts/nf-core", line 34, in lint
lint_obj.lint_pipeline()
File "/Users/remi-andreolsen/code/nf-core-tools/nf_core/lint.py", line 64, in lint_pipeline
getattr(self, fname)()
File "/Users/remi-andreolsen/code/nf-core-tools/nf_core/lint.py", line 213, in check_config_vars
k, v = l.split(' = ', 1)
TypeError: a bytes-like object is required, not 'str'
It's good to always have a badge on the readme saying what version of Nextflow is required. We should be able to double check this version against what we have in the config file too.
We need to get all our repositories set up at Docker Cloud instead of Docker hub.
Reason is, for repositories that were e.g. renamed (RNAseq -> rnaseq), the automated builds break.
It is impossible to add a new source repository on Docker Hub, but on Docker Cloud you can. Both are interfaces for the same backend - so we don't loose anything but gain something. I only have to set up the tags/branch builds for all repositories again once.
....
From @apeltzer on February 23, 2018 12:17
After the discussion in the NGI-RNASeq Gitter (credits to @rfenouil ), we might want to set some standards for naming e.g. channels/processes/variables. I opened this ticket to keep track of the ideas and we could maybe integrate these soon to make them mandatory once we agreed on defaults:
Some ideas:
ch_
prefix (to hinder confusing them with variables)Other suggestions/ideas for variable / process / ... naming conventions would be great too!
Copied from original issue: nf-core/nf-core.github.io#9
Even with GitHub protected branches, we make mistakes with pushing to master
fairly frequently. We should only ever have commits to master
coming as pull-requests from the branch dev
.
We should be able to test for this using Travis ENV
variables quite specifically, and fail the test if the commit is coming from anywhere else (eg. a PR from a fork).
It's common for us to commit to the master
branch with development code before the first release is made. The tests always fail here, which is kind of annoying. It would be good if before failing, the linting tool can check whether there are any pipeline releases.
Hi!
just had a look at EAGER2, and the build fails (expectedly...) using newest nf-core linting tool:
https://travis-ci.org/nf-core/EAGER2/jobs/400150354
What concerns me is this here as a warning:
83 tests passed 2 tests had warnings 1 tests failed
Using --release mode linting tests
WARNING: Test Warnings:
http://nf-co.re/errors#8: Conda package is not latest available: gatk4=4.0.5.1, 4.0.5.2 available
http://nf-co.re/errors#8: Conda package is not latest available: multiqc=1.5, 1.6a0 available
Checking the multiqc page and the bioconda package page, there is no version 1.6a0 for MultiQC!
https://bioconda.github.io/recipes/multiqc/README.html
So I guess, we're parsing something incorrectly :-(
From @ewels on May 24, 2018 15:25
In other pipelines we are adding the summary
variables to MultiQC as a config block. See https://github.com/wikiselev/rnaseq/blob/eaebf588e83e2f78cea0a4451db2d4eea5789493/main.nf#L1036-L1054
Add this into the cookiecutter recipe. @pditommaso thinks it "could be replaced by a few lines of groovy" so probably room for some refactoring too ๐
Copied from original issue: nf-core/cookiecutter#28
The onComplete
email code was written before nextflow had native support. It adds loads of boiler plate code to every pipeline which can be almost entirely removed now.
Needs refactoring, and quite a bit of testing. We want to try to keep as much of the email contents the same if possible, which may require some PRs to core nextflow.
From @ewels on August 2, 2018 14:19
See example of where I started doing this on the rnaseq pipeline here: https://github.com/ewels/nf-core-rnaseq/compare/config_refactor#diff-c79fe4336e72c04860afccd21f4ae1c5R17
Copied from original issue: nf-core/cookiecutter#50
We can't (as of now) store trace files on an S3 Bucket, so specifying one to store data on when running pipelines will crash in many cases where we store the trace file in the --outdir
on S3.
From @ewels on August 2, 2018 14:8
We should be using the new withName
syntax in the configuration files, instead of the older process$name
syntax.
Copied from original issue: nf-core/cookiecutter#49
Expected behaviour:
The command bump-version
(earlier release
) adjusts the environment PATH for conda with the correct bumped version number.
Actual behaviour:
Both PATHs statements in the Dockerfile and Singularity file remain unchanged.
It'd be cool if the travis config could detect whether environment.yml
or Dockerfile
had changed for the pipeline test, then build locally if so. Then tests would be properly using the correct software.
It may still time out of course.
Hi everyone,
an idea which was already briefly discussed here:
We could produce a generic tool in tools
that can produce a graphical user interface for end users to generate an appropriate configuration for the pipeline:
Definition could be like Phil proposed it here:
Happy for any feedback - I thought about this for EAGER2.0, but would be happy to make it generic for all kinds of nf-core pipelines!
@ewels @andreas-wilm @sven1103 @maxulysse (and others who are interested?)
Let me know what you think!
From @ewels on July 30, 2018 6:8
Singularity and docker container addresses are currently not handled in the best way. This is made difficult by the fact that we have nfcore
on dockerhub and nf-core
on singularity hub ๐คฆโโ๏ธ
This issue continues discussion started on PR nf-core/cookiecutter#42
Copied from original issue: nf-core/cookiecutter#47
From @ewels on August 3, 2018 16:6
Docker hub works nicely with automated builds based on GitHub releases which are tagged with the same name.
Singularity hub doesn't support the same method and instead has nasty filename based tagging support. However, there is a very nice CLI that interacts with the API.
We should be able to use this CLI to automatically build and tag singularity hub containers when GitHub releases are prepared, using Travis CI.
Copied from original issue: nf-core/cookiecutter#53
It's currently kind of annoying and easy to forget to update the website docs every time a new lint test is added. It should be possible to automatically generate this somehow.
Description: Although the Python environment is loaded correctly during the CI build, it seems that nose2
is running tests under the loaded Python environment, but Python 2.7.
So we are missing Syntax errors during CI testing, which only occur in specific Python versions.
Suggestion: Call nose2
as module in the loaded Python environment with python -m nose2 ...
Status: Only conda package dependencies are linted at the moment, sub-list, such as derived from additional pip
installations are not possible at the moment.
Suggestion: Allow pip
sub-list for dependency and use the PyPi API
http://pypi.python.org/pypi/<package_name>/<version>/json
From @ewels on July 11, 2018 11:12
It would be nice to add in some checks for pipeline output files when -profile test
is used. This will need to be specific for all pipelines, but we could add an example here.
I think the simplest would be to have a bunch of processes which are downstream of the normal ones and have when: workflow.process.contains('test')
set so that they don't run normally.
See extensive discussion on nextflow gitter from here.
Copied from original issue: nf-core/cookiecutter#36
We currently have our Dockerfile for the nf-core/base image (that we use for all pipelines) in our central nf-core/tools repository. However, that is kind of a bad practice, since we can't tag this properly without conflicting with releases of nf-core/tools.
I, therefore, suggest that we generate a novel repository (e.g. nest, since this is where all of our pipelines "nest" within), to hold the dockerfile. That would make tagging, releasing and building with docker cloud possible quite easily and we don't have these conflicts and can rely on properly tagged base images for our project here.
Any ideas would be appreciated - also comments on this one.
We can even fix certain versions of conda in these base images and have a changelog over here.
The code in this tool to fetch the pipeline names and releases uses the GitHub APIs. It works fine, but it was from before we had the new website. Now the website does the same calls and provides everything we need in a single JSON file which we can fetch with a single API call.
URL: http://nf-co.re/pipelines.json
Pros:
Cons:
Can keep the existing code as a backup if we want? Though to be honest probably easiest to strip out the GitHub stuff and keep it simple..
When nextflow.config
contains this:
manifest {
homePage = '{{ cookiecutter.pipeline_url }}'
description = '{{ cookiecutter.pipeline_short_description }}'
mainScript = 'main.nf'
}
The linting fails unfortunately. This is a problem in cookiecutter, see ticket nf-core/cookiecutter#19 for a start.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.