scilifelab / taca Goto Github PK

This project forked from guillermo-carrasco/taca

Tool for the Automation of Cleanup and Analyses: tools for projects and data management at NGI Stockholm

License: MIT License

Shell 0.03% Python 98.58% HTML 1.22% Dockerfile 0.17%

taca's Introduction

Tool for the Automation of Cleanup and Analyses

This package contains several tools for projects and data management in the National Genomics Infrastructure in Stockholm, Sweden.

Install for development

You can install your own fork of taca in for instance a local conda environment for development. Provided you have conda installed:

# clone the repo
git clone https://github.com/<username>/TACA.git

# create an environment
conda create -n taca_dev python=2.7
conda activate taca_dev

# install TACA and dependencies for developoment
cd TACA
python setup.py develop
pip install -r ./requirements-dev.txt

# Check that tests pass:
cd tests && nosetests -v -s

There is also a plugin for the deliver command. To install this in the same development environment:

# Install taca delivery plugin for development
git clone https://github.com/<username>/TACA.git
cd ../taca-ngi-pipeline
python setup.py develop
pip install -r ./requirements-dev.txt

# add required config files and env for taca delivery plugin
echo "foo:bar" >> ~/.ngipipeline/ngi_config.yaml
mkdir ~/.taca && cp tests/data/taca_test_cfg.yaml ~/.taca/taca.yaml
export CHARON_BASE_URL="http://tracking.database.org"
export CHARON_API_TOKEN="charonapitokengoeshere"

# Check that tests pass:
cd tests && nosetests -v -s

For a more detailed documentation please go to the documentation page.

taca's People

Contributors

Stargazers

Watchers

Forkers

ewels b97pla galithil vezzi kate-v-stepanova sylvinite jfnavarro hammarn chuan-wang sofiahag aanil zhanglingfei alneberg ssjunnebo perlundmark franbonath kedhammar

taca's Issues

Implement pm production touch-finished

archive funcitonality not looking at the days

Even though it is in the argument list

def archive_to_swestore(days, run=None)

It is not used in this method (it is in cleanup), so basically it will archive all the runs, regardless whatever you specify as old.

Describe TACA configuration file

pm is not logging to a file

Even though it is specified in the configuration file:

# This section overrides the default login parameters in Cement
log.logging:
    file: /home/hiseq.bioinfo/log/pm.log
    rotate: True

Implement pm report report-to-gdocs

That might not be true for the latest versions, but if you want to make the samplesheets HAS compatible, you need a key named "Workflow" under the [Header] key, and possibly a [Settings] key before [Data]

Infer samplesheet run run folder name

Currently there is a bug as I do not take into account the A or B in front of the actual flowcell name

Implement pm qc upload-qc

Remove contributors from README

What do you think? It is implicit in the commit history. Actually, it is availably in the "Contributors" tab on the repository so... one less thing to keep up to date.

run_tracker - Change imports to pm instead of ngi_pipeline

when possible

Check swestore instead of days old for cleanup processing runs

Instead of removing runs older than X days in nosync in the preprocessing servers, check that they have been archived to swestore.

Install iCommands in the processing servers
Init iCommands (credentials and so)
Implement code

Implement project report sample-status

Move run_tracker functionalities to a new subcommand

In reference to this trello card.

The subcommand, something like:

$> taca analysis demultiplex [options]

Should act exactly as run_tracker.py is acting now.

Implement pm qc multiplex-qc

Implement pm project bpreport

Optional parameter to limit the number of runs proceed simultaneously

As the way it is we can either archive all runs at a time or one at a time. Its good to have more sophisticated options.

Check if the samplesheet is present in the run directory

And do that before copying it from MFS (if it exists there), to give it priority.

Load YAML configuration files

Instead of the weird format that Cement accepts by default. Take a look at this

Implement pm production run

Implement pm deliver best-practice

Explore built in Cement logging functionality

Detach iput command

This command takes ages for a HiSeq/XTen run, and it only uses one core, so I think that we could detach it and continue to tarball the next run. So basically at a given point we would have just one run being compressed (using several cores), but several being sent at the same time to swestore.

If we don't do like this, the risk of creating a queue of pm processes is high.

PM - Check if run exists in Swestore

Now it will crash if the run already exists in Swestore:

ERROR: putUtil: put error for /ssUppnexZone/proj/a2010002/141120_M01548_0038_000000000-AB8D9.tar.bz2, status = -312000 status = -312000 OVERWRITE_WITHOUT_FORCE_FLAG

Implement pm qc update

-s --server-type flag to diferenciate between nas and processing server for cleanup

-r run option for taca analysis demultiplex

If one wants to demultiplex a single run, should be possible

Implement pm report closed-projects

Update documentation

And document all the configuration options in the config file!!!!!

added thread options in run_tracker.yaml but looks like the command created does not care

title explains ....

Docs docs docs

Hmmm this is just a question: Do you think it is enough with the help of the package?

(master) ~/repos_and_code/TACA (master) ~> taca --help
Usage: taca [OPTIONS] COMMAND [ARGS]...

  Tool for the Automation of Storage and Analyses

Options:
  --version                   Show the version and exit.
  -c, --config-file FILENAME  Path to TACA configuration file
  --help                      Show this message and exit.

Commands:
  analysis  Analysis methods entry point
  storage   Storage management methods and utilities

etc. Or do you think we should add a page per subcommand in the documentation? Like one page for taca storage, one page per taca analysis, etc.

I don't want to over-document, thats the thing, but I don't want either that subcommands or options become forgotten. On the other hand... is a subcommand becomes forgotten is basically because it is not used, so it shouldn't be there....

what do you think? @senthil10 @vezzi @ewels @mariogiov

(master)hiseq.bioinfo@seq-nas-3:/srv/illumina/hiseq_data/nosync$ taca storage archive-to-swestore -r 150113_D00456_0058_AC6KUBANXX.tar.bz2
Traceback (most recent call last):
  File "/home/hiseq.bioinfo/.anaconda/envs/master/bin/taca", line 5, in <module>
    pkg_resources.run_script('taca==1.0', 'taca')
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg/pkg_resources.py", line 534, in run_script
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg/pkg_resources.py", line 1434, in run_script
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/taca-1.0-py2.7.egg/EGG-INFO/scripts/taca", line 38, in <module>
    app.run()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/foundation.py", line 694, in run
    self.controller._dispatch()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/controller.py", line 455, in _dispatch
    return func()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/controller.py", line 461, in _dispatch
    return func()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/taca-1.0-py2.7.egg/taca/controllers/storage.py", line 56, in archive_to_swestore
    self._archive_run(self.pargs.run)
AttributeError: 'StorageController' object has no attribute 'pargs'

Dunno man...