`orthoVar` Package

orthoVar package provides functions for generating genome-wide multiple sequence alignments (msa) and finding orthologous variants between species. There are two functions in this package. orthoMSA is used to generate msa tables formatted for use in orthoFind function as an input.

orthoMSA function takes following arguments as input:

species1: Scientific name of the species whose sequence data will be base for alignment to build upon. Homo sapiens as default. This will also take other species in future releases.
species: A character string or character vector specifying scientific name of the species whose protein sequences will be aligned. Valid inputs are accessible with listSpecies() command.
humanSeqFile: Path of fasta file consisting of human protein sequences. Default is NA, which downloads file from NCBI.
seqFiles: A character string or character vector specifying path of fasta files consisting of protein sequences of other species specified in species argument. Default is NA, which downloads files from NCBI.
annot: Annotation source. Either ncbi or ensembl.
customOrt: A data frame consisting of gene orthology data for given species. Default is NA, which takes data from AllianceGenome: Alliance of Genome Resources (alliancegenome.org). This can either be ensembl or custom data, which should be in the same format as the data provided below, including column names:

Gene1Symbol	Gene1SpeciesName	Gene2Symbol	Gene2SpeciesName
PSMB6	Homo sapiens	PRE3	Saccharomyces cerevisiae
RPN1	Homo sapiens	OST1	Saccharomyces cerevisiae
COX16	Homo sapiens	COX16	Saccharomyces cerevisiae
SYS1	Homo sapiens	SYS1	Saccharomyces cerevisiae
PHLPP2	Homo sapiens	CYR1	Saccharomyces cerevisiae

The output is a data frame, with first two columns represent protein id and sequence for Homo sapiens. Other columns follow the same patterns, where every two columns belong to a species.

Example usage: hum_mouse <- orthoMSA(species1 = "Homo sapiens", species = "Mus musculus", customOrt = "ensembl", annot = "ensembl")

`orthoFind` function to find orthologous variants

orthoFind function can be used to find orthologous variants between species. msa table generated by orthoMSA function is used as an input. Other required files are:

Variant data (df1, df2) as data.table or data.frame, with at least following columns:
- Refseq_ID: NCBI reference sequence id of proteins.
- aapos: Amino acid position of variants.
- from: Reference amino acid.
- to: Converted amino acid.

Id	aapos	Allele_frequency	from	to	Refseq_ID	Gene_name	type	Phenotype	Source
rs1185396016	240	5.50e-06	N	S	NP_003261	TSPAN6	unknown	Unknown	gnomAD
rs138104330	239	5.50e-06	N	S	NP_003261	TSPAN6	unknown	Unknown	gnomAD
rs778356735	237	5.50e-06	I	T	NP_003261	TSPAN6	unknown	Unknown	gnomAD
rs745504645	235	5.50e-06	R	L	NP_003261	TSPAN6	unknown	Unknown	gnomAD
rs745504645	235	1.47e-05	R	H	NP_003261	TSPAN6	unknown	Unknown	gnomAD

All arguments are listed below:

df1: Variant data for the first organism
df2: Variant data for the second organism
org1: Scientific name of the first organism.
org2: Scientific name of the second organism.
msa: msa table which is the output of orthoMSA function
ort: Should data be filtered according to type of variant (conserved and non-conserved). Default is TRUE. Refer to paper for detailed explanation: <https://doi.org/10.1101/2021.01.07.424951>

Output is data.table with each row representing a variant-orthologous variant combination. Below is an example output:

C_elegans_ID	C_elegans_aapos	C_elegans_from	C_elegans_to	Human_ID	Human_aapos	Human_from	Human_to	msa_id
NP_510365	80	P	S	NP_000545	64	P	A	20
NP_510365	80	P	S	NP_000545	64	P	T	20
NP_510365	256	P	T	NP_000545	272	P	T	20
NP_510365	256	P	T	NP_000545	272	P	S	20
NP_510365	80	P	S	NP_000545	64	P	A	21

mustafapir / orthovar Goto Github PK

orthovar's Introduction

`orthoVar` Package

`orthoFind` function to find orthologous variants

orthovar's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

mustafapir / orthovar Goto Github PK

orthovar's Introduction

orthoVar Package

orthoFind function to find orthologous variants

orthovar's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org

`orthoVar` Package

`orthoFind` function to find orthologous variants