Git Product home page Git Product logo

dress's Introduction

pytest

DRESS - Deep learning-based Resource for Exploring Splicing Signatures

This software aims to provide a flexible framework for RNA splicing research guided by performant deep learning models.

At its core, dress relies on genetic programming to perform data augmentations that create diversity in the semantic space of a deep learning oracle (e.g., SpliceAI, Pangolin). The augmentations - created through perturbations on exon triplets - are constrained with grammars, which restrict perturbations based on knowledge about the splicing domain. The package is build on top of GeneticEngine, a framework for Genetic Programming in Python.

Importantly, the expressive power of these grammars enhances model interpretability. The genotype of individuals (synthetic sequences) in the genetic programming population is closely related to RNA splicing concepts.

These ideas were accepted for publication at Genetic and Evolutionary Computation Conference (GECCO'24). The preprint is available at arXiv. The sotware includes an extensive set of options to control the evolutionary search based on the desired in silico experiments and problem specific requirements. For example, by default dress avoids perturbing splice site regions (within a given genomic window), but that can easily be changed to avoid perturbing entire exons, forcing the search to focus on upstream/downstream introns of alternatively spliced exons.

The software can be used for several applications, including:

  • Generate local datasets for explainableAI (e.g., train intrinsically interpretable models, like LIME).
  • Generate synthetic sequences for feature attribution analysis and motif discovery (e.g., TF-MoDISCo).
  • Ablation studies for studying alternative splicing.

As for now, the package contains two modules:

  • generate is the main command. It generates synthetic data from a start sequence using a grammar and a deep learning oracle.
  • filter command filters sequences based on desired levels of splice site probability, PSI or dPSI.

Installation

Clone the repo, take care of dependencies with conda or mamba and install the package with pip:

git clone https://github.com/PedroBarbosa/dress.git
cd dress
conda env create -f conda_env.yml
conda activate dress
pip install .

Running example

dress accepts bed or tabular (see input example here) inputs, and requires a locally stored pre-computed exon cache, along with the human reference genome.

First, download the necessary files. The required transcript structure cache (from GENCODE v44) can be downloaded from here. Then, download the human genome hg38 (for example from here, uncompress it and optionally simplify chromosome headers with sed '/^>/s/ .*//'. Place both files in a single directory, which is given in --cache_dir. By default, it expects this data to be in data/cache.

To run an evolutionary search with an input exon (for example, the exon 6 of FAS gene), we simply call:

dress generate data/examples/generate/raw_input/FAS_exon6/data.tsv

The full list of argument options can be inspected with dress generate --help or by looking at one of the pre-configured yaml files at dress/configs/generate*.

Full documentation and tutorials detailing the complete capabilities of this software will be available soon.

dress's People

Contributors

pedrobarbosa avatar

Watchers

Alcides Fonseca avatar  avatar

dress's Issues

Motif-based grammar

Implement a grammar that works at the motif level using PWMs:

  • SNVs only performed at positions with motif hits
  • Allow for motif swap, insertion, deletion, shuffling

Constrain exonic perturbations considering the resulting amino acid sequence.

Make more biologically realistic perturbations, regardless of their effect on splicing.

For example, give more probability to exon perturbations that would result in:

  • synonymous substitutions;
  • change to an aminoacid that is biochemically similar to the start sequence;
  • change to an aminoacid that is biochemically more distant;
  • do not allow for the appearance of a stop codon;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.