Git Product home page Git Product logo

taca's Introduction

Tool for the Automation of Cleanup and Analyses

PyPI version Documentation Status codecov

This package contains several tools for projects and data management in the National Genomics Infrastructure in Stockholm, Sweden.

Install for development

You can install your own fork of taca in for instance a local conda environment for development. Provided you have conda installed:

# clone the repo
git clone https://github.com/<username>/TACA.git

# create an environment
conda create -n taca_dev python=2.7
conda activate taca_dev

# install TACA and dependencies for developoment
cd TACA
python setup.py develop
pip install -r ./requirements-dev.txt

# Check that tests pass:
cd tests && nosetests -v -s

There is also a plugin for the deliver command. To install this in the same development environment:

# Install taca delivery plugin for development
git clone https://github.com/<username>/TACA.git
cd ../taca-ngi-pipeline
python setup.py develop
pip install -r ./requirements-dev.txt

# add required config files and env for taca delivery plugin
echo "foo:bar" >> ~/.ngipipeline/ngi_config.yaml
mkdir ~/.taca && cp tests/data/taca_test_cfg.yaml ~/.taca/taca.yaml
export CHARON_BASE_URL="http://tracking.database.org"
export CHARON_API_TOKEN="charonapitokengoeshere"

# Check that tests pass:
cd tests && nosetests -v -s

For a more detailed documentation please go to the documentation page.

taca's People

Contributors

aanil avatar alneberg avatar b97pla avatar chuan-wang avatar ewels avatar franbonath avatar galithil avatar guillermo-carrasco avatar hammarn avatar ingkebil avatar jfnavarro avatar kate-v-stepanova avatar kedhammar avatar parlundin avatar pekrau avatar remiolsen avatar robinandeer avatar senthil10 avatar ssjunnebo avatar sylvinite avatar vezzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

taca's Issues

archive funcitonality not looking at the days

Even though it is in the argument list

def archive_to_swestore(days, run=None)

It is not used in this method (it is in cleanup), so basically it will archive all the runs, regardless whatever you specify as old.

pm is not logging to a file

Even though it is specified in the configuration file:

# This section overrides the default login parameters in Cement
log.logging:
    file: /home/hiseq.bioinfo/log/pm.log
    rotate: True

Samplesheets for HAS

That might not be true for the latest versions, but if you want to make the samplesheets HAS compatible, you need a key named "Workflow" under the [Header] key, and possibly a [Settings] key before [Data]

Remove contributors from README

What do you think? It is implicit in the commit history. Actually, it is availably in the "Contributors" tab on the repository so... one less thing to keep up to date.

Detach iput command

This command takes ages for a HiSeq/XTen run, and it only uses one core, so I think that we could detach it and continue to tarball the next run. So basically at a given point we would have just one run being compressed (using several cores), but several being sent at the same time to swestore.

If we don't do like this, the risk of creating a queue of pm processes is high.

PM - Check if run exists in Swestore

Now it will crash if the run already exists in Swestore:

ERROR: putUtil: put error for /ssUppnexZone/proj/a2010002/141120_M01548_0038_000000000-AB8D9.tar.bz2, status = -312000 status = -312000 OVERWRITE_WITHOUT_FORCE_FLAG

Docs docs docs

Hmmm this is just a question: Do you think it is enough with the help of the package?

(master) ~/repos_and_code/TACA (master) ~> taca --help
Usage: taca [OPTIONS] COMMAND [ARGS]...

  Tool for the Automation of Storage and Analyses

Options:
  --version                   Show the version and exit.
  -c, --config-file FILENAME  Path to TACA configuration file
  --help                      Show this message and exit.

Commands:
  analysis  Analysis methods entry point
  storage   Storage management methods and utilities

etc. Or do you think we should add a page per subcommand in the documentation? Like one page for taca storage, one page per taca analysis, etc.

I don't want to over-document, thats the thing, but I don't want either that subcommands or options become forgotten. On the other hand... is a subcommand becomes forgotten is basically because it is not used, so it shouldn't be there....

what do you think? @senthil10 @vezzi @ewels @mariogiov

Demultiplexing should be machine agnostic

Baically, taca analysis demultiplex -r <HiSeq run> should work as taca analysis demultiplex -r <MiSeq run> and taca analysis demultiplex -r <XTen run> without the user having to specify the run type.

Implement delivery routine

Delivery of analysis data, as outlined in NGI delivery policies document, should be implemented and managed with TACA.

-r option not working properly

(master)hiseq.bioinfo@seq-nas-3:/srv/illumina/hiseq_data/nosync$ taca storage archive-to-swestore -r 150113_D00456_0058_AC6KUBANXX.tar.bz2
Traceback (most recent call last):
  File "/home/hiseq.bioinfo/.anaconda/envs/master/bin/taca", line 5, in <module>
    pkg_resources.run_script('taca==1.0', 'taca')
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg/pkg_resources.py", line 534, in run_script
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/setuptools-3.6-py2.7.egg/pkg_resources.py", line 1434, in run_script
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/taca-1.0-py2.7.egg/EGG-INFO/scripts/taca", line 38, in <module>
    app.run()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/foundation.py", line 694, in run
    self.controller._dispatch()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/controller.py", line 455, in _dispatch
    return func()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/cement/core/controller.py", line 461, in _dispatch
    return func()
  File "/home/hiseq.bioinfo/.anaconda/envs/master/lib/python2.7/site-packages/taca-1.0-py2.7.egg/taca/controllers/storage.py", line 56, in archive_to_swestore
    self._archive_run(self.pargs.run)
AttributeError: 'StorageController' object has no attribute 'pargs'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.