AA_Comp (Amino Acid Composition) is an analysis of the amino acid content of orthologous proteins from vertebrates species. The working hypothesis is that significant differences in amino acid composition among orthologs could be correlated with functional or structural changes that have occurred in different groups of vertebrates after their separation. This work would provide a method for the identification of new protein functions that could be relevant for the evolution of amniotes. We considered three classes of vertebrates:ย Actinopterygii, Sauropsida and Mammalia. The scripts allow to work also with different classifications.
AA_Comp is composed of two parts. The first part involves 1) acquisition of protein sequences and related information from the database OrthoDB, 2) preparation and cleaning of the dataset, 3) statistical analysis. The second part involves visualization of results through different types of graphical representations.
Required R packages:
- rjson
- httr
- Biostrings
- dplyr
$ Rscript Get_universal_singlecopy_orthogroups.R
The program recovers all the orthogroups from the server OrthoDB using API.
Parameters: vertebrate level, single copy gene, orthogroup present in 90% of the species.
The program creates a folder named data
containing three files .fa
with FASTA sequences (if a directory named data
already exists it has to be renamed).
$ Rscript AA_Comp_Analysis.R
Output: AA_Comp_nofilter
in which downloaded data are organized, AA_Comp
that is the filtered dataframe.
Understanding the dataset AA_Comp
In the same script AA_Comp_Analysis.R, the instruction to perform the statistical analysis is given.
The program creates a new dataframe Res
with pvalue (t-test) and Log2 fold change results, obtained by pairwise comparisons between the three different classes.
Required R packages:
-ggplot2
-dplyr
$ Rscript Utilities/Barplot.R csv_name
Three bar plots with vertical bars based on pairwise comparisons are created. Each bar represents the number of orthogroups (y axis) with relevant values of t-test and Log2FC for every amino acid (x axis).
Required R packages:
- ggplot2
- dplyr
- gridExtra
- grid
$ Rscript Utilities/Boxplot.R csv_name pub_og_id aa
example
$ Rscript Utilities/Boxplot.R AA_Comp.csv 193525at7742 P
if pub_og_id and aa are not specified the program will analyze all orthogroups for each amino acid.
Required R packages:
- EnhancedVolcano
- Biostrings
- dplyr
$ Rscript Utilities/VolcanoPlot.R csv_name
The program creates PDF files with the pairwise comparison plot related to a single amino acid. In every single plot it is combined the value of t-test (y axis) and log2FC (x axis), orthogroups with relevant results are located in the side squares of the graphic and are pointed out with red dots.
Required R packages:
- DECIPHER
- msa
- odseq
- taxizedb
Source script Align_shade.R. The script has to be modified by writing the amino acid/amino acids to focus on in aa
and the identifier of the orthogroup to align in og_id
. Example:
aa="KIN"
og_id="238395at7742"
Run the script. The program takes the best sequences from each class and creates four files: multiple alignment is the PDF file.