Git Product home page Git Product logo

aa_comp's Introduction

AA_Comp

Evolutionary changes in amino acid composition of orthologous genes during vertebrate evolution

Project description

AA_Comp (Amino Acid Composition) is an analysis of the amino acid content of orthologous proteins from vertebrates species. The working hypothesis is that significant differences in amino acid composition among orthologs could be correlated with functional or structural changes that have occurred in different groups of vertebrates after their separation. This work would provide a method for the identification of new protein functions that could be relevant for the evolution of amniotes. We considered three classes of vertebrates:ย Actinopterygii, Sauropsida and Mammalia. The scripts allow to work also with different classifications.

AA_Comp is composed of two parts. The first part involves 1) acquisition of protein sequences and related information from the database OrthoDB, 2) preparation and cleaning of the dataset, 3) statistical analysis. The second part involves visualization of results through different types of graphical representations.

Usage

I. AMINO ACID COMPOSITION ANALYSIS

Required R packages:

- rjson
- httr
- Biostrings
- dplyr

1 - Get orthologous group (orthogroup) information and FASTA sequences from OrthoDB:

$ Rscript Get_universal_singlecopy_orthogroups.R

The program recovers all the orthogroups from the server OrthoDB using API. Parameters: vertebrate level, single copy gene, orthogroup present in 90% of the species. The program creates a folder named data containing three files .fa with FASTA sequences (if a directory named data already exists it has to be renamed).

2 - Obtain amino acid count for each orthogroup and organize data into dataframes

$ Rscript AA_Comp_Analysis.R

Output: AA_Comp_nofilter in which downloaded data are organized, AA_Comp that is the filtered dataframe.

Understanding the dataset AA_Comp

3 - Statistical analysis: T-TEST and Log2 FOLD CHANGE

In the same script AA_Comp_Analysis.R, the instruction to perform the statistical analysis is given. The program creates a new dataframe Res with pvalue (t-test) and Log2 fold change results, obtained by pairwise comparisons between the three different classes.

Understanding the dataset Res

II. ANNOTATION AND VISUALIZATION OF RESULTS

Bar plots

Required R packages:

-ggplot2
-dplyr
$ Rscript Utilities/Barplot.R csv_name

Three bar plots with vertical bars based on pairwise comparisons are created. Each bar represents the number of orthogroups (y axis) with relevant values of t-test and Log2FC for every amino acid (x axis).

IMG5

Box plots

Required R packages:

- ggplot2
- dplyr
- gridExtra
- grid
$ Rscript Utilities/Boxplot.R csv_name pub_og_id aa

example

$ Rscript Utilities/Boxplot.R AA_Comp.csv 193525at7742 P

if pub_og_id and aa are not specified the program will analyze all orthogroups for each amino acid.

IMG6

Heatmap

Volcano plots

Required R packages:

- EnhancedVolcano
- Biostrings
- dplyr
$ Rscript Utilities/VolcanoPlot.R csv_name

The program creates PDF files with the pairwise comparison plot related to a single amino acid. In every single plot it is combined the value of t-test (y axis) and log2FC (x axis), orthogroups with relevant results are located in the side squares of the graphic and are pointed out with red dots.

IMG8

Multiple sequence alignment

Required R packages:

- DECIPHER
- msa
- odseq
- taxizedb

Source script Align_shade.R. The script has to be modified by writing the amino acid/amino acids to focus on in aa and the identifier of the orthogroup to align in og_id. Example:

aa="KIN"
og_id="238395at7742"

Run the script. The program takes the best sequences from each class and creates four files: multiple alignment is the PDF file.

IMG7

aa_comp's People

Contributors

percud avatar jasmineragazzini avatar cianissimo avatar giulyetta avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.