Git Product home page Git Product logo

ggd-cli's People

Contributors

brentp avatar jbelyeu avatar mikecormier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ggd-cli's Issues

CL interface

The executable will be ggd, written in python. The available sub-commands are to be:

  • ggd from_bash make a new ggd recipe from a bash script
--species | species recipe is for, must be present in ggd-recipes/genomes
--genome_build | version of genome to use. must be present in ggd-recipes/genome
--dependency | any data or bioconda dependencies needed to build the recipe
--extra-file | any files that the recipe creates that are not a *.gz and *.gz.tbi pair. May be used more than once
--summary | a comment describing the recipe
--keyword | a keyword to associate with the recipe. may be specified more that once.
--author | a recipe author
script | bash script that contains the commands to build
  • ggd build build and test a ggd recipe locally
  • ggd install wrapper for conda install
  • ggd search search by name and/or keyword

storing bgzip compressed genomes?

hi!
this project looks super interesting!
One issue that would personally concern me if I were to use ggd is that the existing recipes store genomes in an uncompressed form. My concern is that, with potentially many genomes that my lab would have to deal with, the library will take a lot of space; moreover, given that we use network storage, storing data uncompressed will reduce the I/O performance.

Have you considered allowing optional compression of genomes with bgzip? bgzip plays well with faidx/pyfaidx and does not have any downsides, at least as much as we're concerned.

Thank you!
Anton.

show-env testing

Just realized that show-env is missing pattern recognition support. I'm throwing a quicck fix together, but it needs to be pushed and tested.

finish search

Like list-files, search uses glob.glob to list files locally available. This introduces a slight internal inconsistency as results are verified by calling conda search, which treats patterns differently. Specifically, glob.glob supports shell-style wildcards (most usefully the * character to match anything), while conda search requires valid regex patterns (matching anywhere within the recipe name). Currently the script converts empty strings to * characters for glob and removes those * characters for conda search, as empty strings fail in glob. This is a fairly hack-feeling solution, however, that should be improved on.

Furthermore, if the user inputs a real, valid regex pattern, it will likely fail, because glob will treat the special characters it doesn't know as literals. This might be okay because they can use shell wildcards, but I think at the very least it needs to be well-documented and bulletproofed some more.

list-files should be case-insensitive

list-files depends on python's glob.glob to list the files present. This simplifies the code considerably, but makes a case-insensitive search more difficult.

Version coordination

Versions are supported now in recipe installation. The version number (which allows a string but not whitespace) is parsed out of the meta.yaml file by check-recipe and included in the installation directory structure. This needs to be thought out more fully, however. The environment variables created by prelink.sh don't all include the version number currently and there may be additional issues caused by inclusion of the version number that I haven't thought of yet.

Should installed recipe versions be removed when a different version is installed?

Adding the option of maintaining different versions of recipes means there could be multiple versions of a dataset installed at once. We could use a post-link script (http://conda.pydata.org/docs/building/build-scripts.html) to handle this by removing old versions. We could also maintain those other versions (updating the environment variable to the most recently installed version) and allow them to be access via the ggd list-files command.

env_vars.sh cleanup

When installing a recipe a new environment variable is appended to conda_root/etc/conda/activate.d/env_vars.sh and conda_root/etc/conda/deactivate.d/env_vars.sh. Since this is only appended, old copies of vars will be maintained in the file. In the interest of neatness, we should remove old var declarations from these files on adding a new one.

Add data version to install dir structure

Need to include the version of the data in the directory structure.

i.e. /scratch/ucgd/lustre/u1072557/a2/share/ggd/Homo_sapiens/hg19/hg19-cpg-islands/1/{files}
instead of
/scratch/ucgd/lustre/u1072557/a2/share/ggd/Homo_sapiens/hg19/hg19-cpg-islands/{files}

install does not include required packages

I see:

ggd --version
Traceback (most recent call last):
  File "/home/brentp/miniconda3/bin/ggd", line 7, in <module>
    from ggd.__main__ import main
  File "/home/brentp/miniconda3/lib/python3.7/site-packages/ggd/__main__.py", line 6, in <module>
    from . list_files import add_list_files
  File "/home/brentp/miniconda3/lib/python3.7/site-packages/ggd/list_files.py", line 14, in <module>
    from .search import load_json_from_url, search_packages
  File "/home/brentp/miniconda3/lib/python3.7/site-packages/ggd/search.py", line 11, in <module>
    from fuzzywuzzy import fuzz
ModuleNotFoundError: No module named 'fuzzywuzzy'

after doing the install as suggested in the readme.

doc issues

  • a detailed figure should not be in a "quick-start" section. In quickstart, show a few command-line examples. Save the image for a more detailed section
  • the section indicating how to install should be titled "Installation" and made a super-heading closer to the top of the document.
  • readme for ggd-cli and ggd-recipes should be updated to point to or match whats in gogetdata.github.io
  • the contents appears under "Contributing" it should be its own header and should be only 1 level deep (so we see each command, not example and "Using ..." for each command. Or, the contents section may be removed.
  • not clear how the icons adn the columns interact on this page: https://gogetdata.github.io/recipes.html (e.g. why do we need a linux column, and a linux icon). I recipe is either OSX, Linux or noarch, so why not have a single "arch" column?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.