Git Product home page Git Product logo

ggd's Introduction

ggd's People

Contributors

arq5x avatar brentp avatar saketkc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ggd's Issues

set exit codes on errors.

there are a lot of messages to stderr, but we should also return non-zero exit codes on failures.

make and get sections

make: should have the entire provenance of the recipe

an optional get: section will have just a url that is the result of running the make command. So data hosters can choose to host a made dataset but still share the provenance.

The file from the get: should have the same sha1 and the result of the make.

The client will default to get if it's available and otherwise to make unless a flag is specified.

dependencies on other recipes and on softare

we will have e.g. a recipe for reference sequence.
then, in dependent recipes we will use that reference in vt normalize -r $REFERENCE to normalize the variants. We need to specify the data and software dependencies.

Comments on example command

python ggd.py

Install a ggd shell script and use instead ggd install โ€ฆ?

ucsc.human.b38.cpg

The underlying URL of the formula uses slashes to separate components. Why use dots here? Consider instead ucsc/human/b38/cpg.

human

Common names are ambiguous. Use binomial names instead?

b38

Use the full name GRCh38 instead?

source.species.genomebuild.name

Perhaps this format could optionally include the GitHub username and repo to support installing formula from non-arq5x/ggd-recipes locations.

user/repo/source/species/genomebuild/name

Make species the top level in hierarchy

Hey this is a cool project. In the figure you show species as the top level directory but the recipes aren't following that scheme currently? I Can see advantages for both approaches, although I think organizing by species first is the worth considering because:

  • there are several species specific repositories (e.g. Wormbase, flybase, etc.). This would keep those folders from cluttering up the top level directory.
  • it also helps to make folks aware of what resources are available for their species of interest.

Sorry if this was already in the works!

Also what about adding a description line in the recipes which could be queried and displayed from the CLI?

template variables

all recipe commands will actually be templates. We need this to handle data dependencies. Among other things.
The template variables that will be filled by GGD are:

  • ${DATE}
  • ${version} # pulled from the attributes section in the yaml
  • ${name} # pulled from the attributes section in the yaml
  • ${GGD_DATA} path to the data directory (usually ~/ggd_data/) this will allow recipes to specify paths to existing ggd resources that have already been installed.
  • ${sha1} -- the sha for the current entry under commands.

recipe responsible for writing files

so a recipe should look like:

attributes:
    name: hg38_reference
    version: 1
    sha1:
        - efaaea68910ee444b2756062b2ae2b990d5cdb71
        - 8c6e9635f50256e4ecd84bee2bfb1cb27cc8bbd1

recipe:
    full:
        recipe_type: bash
        recipe_cmds:
            - wget -O hg38.decoy.hla.fa ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
            - samtools faidx hg38.decoy.hla.fa
        recipe_outfiles:
            - hg38.decoy.hla.fa
            - hg38.decoy.hla.fa.fai

and ggd knows that the output files are hg38.decoy.hla.fa and .fai by the recipe_outfiles section.
This is instead of the current behavior where ggd captures the output of each command and assumes that it is the file.

this makes it more obvious since the user can implement a full recipe in bash and then translate into the yaml section. I believe this is also what @chapmanb has implemented.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.