Git Product home page Git Product logo

orthovar's Introduction

orthoVar Package

orthoVar package provides functions for generating genome-wide multiple sequence alignments (msa) and finding orthologous variants between species. There are two functions in this package. orthoMSA is used to generate msa tables formatted for use in orthoFind function as an input.

orthoMSA function takes following arguments as input:

  • species1: Scientific name of the species whose sequence data will be base for alignment to build upon. Homo sapiens as default. This will also take other species in future releases.

  • species: A character string or character vector specifying scientific name of the species whose protein sequences will be aligned. Valid inputs are accessible with listSpecies() command.

  • humanSeqFile: Path of fasta file consisting of human protein sequences. Default is NA, which downloads file from NCBI.

  • seqFiles: A character string or character vector specifying path of fasta files consisting of protein sequences of other species specified in species argument. Default is NA, which downloads files from NCBI.

  • annot: Annotation source. Either ncbi or ensembl.

  • customOrt: A data frame consisting of gene orthology data for given species. Default is NA, which takes data from AllianceGenome: Alliance of Genome Resources (alliancegenome.org). This can either be ensembl or custom data, which should be in the same format as the data provided below, including column names:

Gene1Symbol Gene1SpeciesName Gene2Symbol Gene2SpeciesName
PSMB6 Homo sapiens PRE3 Saccharomyces cerevisiae
RPN1 Homo sapiens OST1 Saccharomyces cerevisiae
COX16 Homo sapiens COX16 Saccharomyces cerevisiae
SYS1 Homo sapiens SYS1 Saccharomyces cerevisiae
PHLPP2 Homo sapiens CYR1 Saccharomyces cerevisiae

The output is a data frame, with first two columns represent protein id and sequence for Homo sapiens. Other columns follow the same patterns, where every two columns belong to a species.

Example usage: hum_mouse <- orthoMSA(species1 = "Homo sapiens", species = "Mus musculus", customOrt = "ensembl", annot = "ensembl")

orthoFind function to find orthologous variants

orthoFind function can be used to find orthologous variants between species. msa table generated by orthoMSA function is used as an input. Other required files are:

  • Variant data (df1, df2) as data.table or data.frame, with at least following columns:

    • Refseq_ID: NCBI reference sequence id of proteins.

    • aapos: Amino acid position of variants.

    • from: Reference amino acid.

    • to: Converted amino acid.

Id aapos Allele_frequency from to Refseq_ID Gene_name type Phenotype Source
rs1185396016 240 5.50e-06 N S NP_003261 TSPAN6 unknown Unknown gnomAD
rs138104330 239 5.50e-06 N S NP_003261 TSPAN6 unknown Unknown gnomAD
rs778356735 237 5.50e-06 I T NP_003261 TSPAN6 unknown Unknown gnomAD
rs745504645 235 5.50e-06 R L NP_003261 TSPAN6 unknown Unknown gnomAD
rs745504645 235 1.47e-05 R H NP_003261 TSPAN6 unknown Unknown gnomAD

All arguments are listed below:

  • df1: Variant data for the first organism

  • df2: Variant data for the second organism

  • org1: Scientific name of the first organism.

  • org2: Scientific name of the second organism.

  • msa: msa table which is the output of orthoMSA function

  • ort: Should data be filtered according to type of variant (conserved and non-conserved). Default is TRUE. Refer to paper for detailed explanation: <https://doi.org/10.1101/2021.01.07.424951>

Output is data.table with each row representing a variant-orthologous variant combination. Below is an example output:

C_elegans_ID C_elegans_aapos C_elegans_from C_elegans_to Human_ID Human_aapos Human_from Human_to msa_id
NP_510365 80 P S NP_000545 64 P A 20
NP_510365 80 P S NP_000545 64 P T 20
NP_510365 256 P T NP_000545 272 P T 20
NP_510365 256 P T NP_000545 272 P S 20
NP_510365 80 P S NP_000545 64 P A 21

orthovar's People

Contributors

mustafapir avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.