Git Product home page Git Product logo

gpf's Introduction

GPF: Genotypes and Phenotypes in Families

The Genotype and Phenotype in Families (GPF) system manages large databases of genetic variants and phenotypic measurements obtained from collections of families and individual family members.

The main application of the system has been in managing the data gathered from the Simons Simplex Collection, a collection of ~2,600 families with one child diagnosed with autism.

Information on how to use GPF can be found in the GPF documentation.

Development

We recommend using Anaconda environment](https://www.anaconda.com/) for creation of GPF development environment.

Install GPF dependencies

Create a conda gpf environment with all of the conda package dependencies from environment.yml and dev-environment.yml files. From gpf root directory run:

mamba env create --name gpf --file ./environment.yml
mamba env update --name gpf --file ./dev-environment.yml

To use this environment, you need to activate it using the following command:

conda activate gpf

The following commands are going to install GPF dae`` and wdae`` packages for development usage. (You need to install GPF packages in the development gpf conda environment.)

for d in dae wdae dae_conftests; do (cd $d; pip install -e .); done

Additional GPF genotype storages

There are some additional genotype storages that are not included in the default GPF installation and if you plan to use or develop features for these genotype storages you need to install their dependencies.

Apache Impala genotype storage

To use ore develop features for GPF impala genotype storage you need some additional dependencies installed. From gpf root directory update your gpf conda environment using:

mamba env update --name gpf --file ./impala_storage/impala-environment.yml

and install the gpf_impala_storage package using:

pip install -e impala_storage

GCP genotype storage

If you want support for genotype storage on Google Cloud Platform (GCP) using the Google BigQuery for querying variants you need to install more dependencies in your development environment:

mamba env update --name gpf --file ./gcp_storage/gcp-environment.yml

and install gcp_genotype_storage package using:

pip install -e gcp_storage

To run the tests you need to authenticate for seqpipe-gcp-storage-testing project:

gcloud config list project
[core]
project = seqpipe-gcp-storage-testing

Your active configuration is: [default]

using

gcloud auth application-default login

To run the GCP storge tests you should enter into the gpf/gcp_storage directory and run:

py.test -v gcp_storage/tests/

To run the intergration tests use:

py.test -v ../dae/tests/ gcp_storage/tests/gpf_storage.yaml

gpf's People

Contributors

ailieva avatar deepsourcebot avatar egotsev avatar ilinagergova avatar iordanivanov avatar iossifov avatar ivostefanov avatar ivotod avatar joankosev avatar kevinduringwork avatar lchorbadjiev avatar livovachkov avatar manifold90 avatar marchit avatar migglu avatar nikidimi avatar nikolaystanishev avatar qweqq avatar svetlin-mladenov avatar yamrom avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gpf's Issues

Standardize pedigree file loading

There are currently two different implementations for pedigree file loading. Use the implementation from "dae.variants.family" and adapt it to support all formats.

[ENH] support for `family_variants_count`, `seen_as_denovo` and `seen_in_status` attributes of summary variants in impala schema2

Is your feature request related to a problem? Please describe.

At the moment impala schema2 genotype storage does not support storing some of the summary variants attributes:

  • family_variants_count
  • seen_in_denovo
  • seen_in_status

Describe the solution you'd like

Import variants in schema2 should support adding these attributes with appropriate values.

Additional context
Because these attributes are missing the gene browser does not show the correct family variants counts and colors for displayed summary variants when working with impala schema2 genotype storage.

[ENH] Add `grr-browse` CLI tool to browse the currently configured genomic resources repository

Is your feature request related to a problem? Please describe.
A CLI tool grr-browse for browsing the resources that are available in the currently configured genomic resources repository.

Describe the solution you'd like
The grr-browse should locate the definition of GRR following the standard rules and show a list of all genomic resources available from the configured GRR.

The user should be able to supply to the grr-browse tool GRR definition file.

Describe alternatives you've considered
The grr-manage command accepts a list subcommand, that is very close to the requested command. The difference is that the grr-manage command expects a genomic resources repository to manage and does not look into the default GRR configuration.

[ENH] pytest markers to specify storage type subset to use in an integration test

Is your feature request related to a problem? Please describe.
Currently, the parametrization of integration tests with a genotype storage fixture uses all storage types. We need a way to restrict storage types for a specific test and make it run for a subset of storage types.

Describe the solution you'd like
Add pytest markers for each of the registered genotype storage types:

  • filesytem
  • impala
  • impala2
    and adjust the genotype_storage fixture to filter injected genotype storages accordingly.

Describe alternatives you've considered
No idea of an alternative solution at the moment.

Additional context
We want to evolve and enhance our new genotype storages without backporting features to obsolete and deprecated storages.

[ENH] simplify definition of reference genome and gene models in integration testing

Is your feature request related to a problem? Please describe.
Most integration tests require the definition of a reference genome and gene models. We should have an easy way to define these objects in a testing environment.

Describe the solution you'd like
Define functions:

  • setup_reference_genome
  • setup_gene_models

that simplifies the definition of these objects in a testing environment.

Additional context
It would be good if there is a way to pass these definitions when creating a testing GPF instance and import project.

[ENH] support for `family_variants_count`, `seen_as_denovo` and `seen_in_status` attributes of summary variants in filesystem/inmemory genotype storage

Is your feature request related to a problem? Please describe.

At the moment, filesystem/inmemory genotype storage does not support some of the summary variants' attributes:

  • family_variants_count
  • seen_in_denovo
  • seen_in_status

Describe the solution you'd like

RawVariants in filesystem genotype storage should support adding these attributes with appropriate values.

Additional context
Because these attributes are missing, the gene browser does not show the correct family variants counts and colors for displayed summary variants when working with filesystem genotype storage.

[ENH] log_filter from wdae.utils.logger should accept positional arguments

Is your feature request related to a problem? Please describe.
The log_filter function accepts only a single argument for the message that should be displayed, forcing string construction to happen outside of the function, before it is called. This is not optimal as the strings will be calculated even if the function is not called (i.e. there is nothing to log).

Describe the solution you'd like
Allow log_filter to accept positional arguments instead of a single message argument and then construct the string inside the function itself.

Describe alternatives you've considered
N/A

Additional context
N/A

[ENH] Add support for locating GPF instance directory by searching for `gpf_instance.yaml` file

Is your feature request related to a problem? Please describe.
Currently, the supported way to locate the GPF instance directory is to use the DAE_DB_DIR environment variable. We want to add the option to find the instance directory by looking for the gpf_instance.yaml file in the current working directory and its parents.

Describe the solution you'd like
Add a factory method that locates the GPF instance directory using all supported options and creates the GPFInstance passing the located directory.

Describe alternatives you've considered
It is possible to embed this logic into the GPFInstance constructor. But the constructor will become too complicated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.