Git Product home page Git Product logo

phyloprofile's Introduction

PhyloProfile

Bioconductor install with bioconda published in: Bioinformatics presented at: GCB2018 poster at: SMBE2019 BioC status

Click here for the full PDF version of the BOSC2017 poster

PhyloProfile is an R package that comes together with a Shiny-based tool for integrating, visualizing and exploring multi-layered phylogenetic profiles.

Alongside the presence/absence patterns of orthologs across large taxon collections, PhyloProfile allows the integration of any two additional information layers. These complementary data, like sequence similarity between orthologs, similarities in their domain architecture, or differences in functional annotations enable a more informed interpretation of phylogenetic profiles.

By utilizing the NCBI taxonomy, PhyloProfile can dynamically collapse taxa into higher systematic groups. This enables rapidly changing the resolution from the comparative analyses of proteins in individual species to that of entire kingdoms or even domains without changes to the input data.

PhyloProfile furthermore allows for a dynamic filtering of profiles – taking the taxonomic distribution and the additional information layers into account. This, along with functions to estimate the age of genes and core gene sets facilitates the exploration and analysis of large phylogenetic profiles.

Take a look at the functionality of PhyloProfile and explore the installation-free online version to learn more.

Table of Contents

Installation & Usage

PhyloProfile requires the latest version of R (check for required R version here). Please install or update R on your computer before continue.

Then start R to install and use PhyloProfile.

Using BiocManager

PhyloProfile is available at Bioconductor (require Bioc version ≥ 3.14). To install PhyloProfile, start R and enter:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("PhyloProfile")

To install the development version of PhyloProfile, please use the devel version of Bioconductor:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(version='devel')
BiocManager::install("PhyloProfile")

or install the dev version of PhyloProfile from our github repository using devtools.

Using devtools

The dev version of PhyloProfile can be installed from this github repository using devtools:

if (!requireNamespace("devtools"))
    install.packages("devtools")
devtools::install_github("BIONF/PhyloProfile", INSTALL_opts = c('--no-lock'), build_vignettes = TRUE)

Using Conda

PhyloProfile can also be installed within a conda environment. First, add bioconda to the list of your conda channels:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

then install PhyloProfile using the standard conda install command:

conda install bioconductor-phyloprofile

This installation step can take a while regardless of the method used, as all necessary dependencies will be downloaded and installed automatically. (Note: Depending on your system this sometimes fails, please check the console log for error messages concerning the dependency installation)

Start PhyloProfile's Shiny app

From the R terminal, enter:

library(PhyloProfile)
runPhyloProfile()

Check your web browser, PhyloProfile will be displayed there ;-) For the first time running, the tool will download a pre-caculated taxonomy data. Please be patient until you see a message for uploading input files.

Please check our detailed instructions if you encounter any problems while installing and starting the program.

Input Data

PhyloProfile can read a number of different input files, including multi-FASTA files, regular tab-separated files, OMA ID lists or OrthoXML files. The additional information layers can be embedded in the OrthoXML or be provided separately as tab-separated files.

We described all suppported input formats in section Input Data in our PhyloProfile's Wiki.

Walkthrough & Examples

Read the walkthrough slides to explore the functionality of the PhyloProfile GUI.

Check the vignette for learning how to use PhyloProfile's functions in some specific use-cases:

browseVignettes("PhyloProfile")

Bugs

Any bug reports or comments, suggestions are highly appreciated. Please open an issue on GitHub or be in touch via email.

Acknowledgements

We would like to thank

  1. Bastian for the great initial idea and his kind support,
  2. Members of Ebersberger group for many valuable suggestions and ...bug reports :)

Contributors

License

This tool is released under MIT license.

How-To Cite

Ngoc-Vinh Tran, Bastian Greshake Tzovaras, Ingo Ebersberger, PhyloProfile: dynamic visualization and exploration of multi-layered phylogenetic profiles, Bioinformatics, Volume 34, Issue 17, 01 September 2018, Pages 3041–3043, https://doi.org/10.1093/bioinformatics/bty225

or use the citation function in R CMD to have it directly in BibTex or LaTeX format

citation("PhyloProfile")

Contact

Vinh Tran [email protected]

phyloprofile's People

Contributors

jwokaty avatar nturaga avatar trvinh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phyloprofile's Issues

Using the GH Wiki-pages

I think that we should put the wiki pages to some use (probably after #31 is solved). This could be useful to declutter the README.md, which otherwise might become too overwhelming for people reading about PhyloProfile for the first time.

Things that could go there:

  • More detailed how-tos for the different functions
  • Explanations of the file formats
  • the FAQ that's now inside PhyloProfile

Deactivating Auto update plot is not working

Whether the check box of Auto update plot is checked or not does not seem to have an effect, because the plot is always updating when the user makes any changes.

Version

Observed in version 0.3.0/0.3.2

Type of Issue

Bug

Logfile

There is no error or other message in the R log when checking or unchecking the box

Provide some test data

Maybe create a subfolder testdata where there's a small data set that can be used to play around with the tool after downloading & installing it? Could be the same as for a potential online demo.

Reconfigure as a package?

Reconfiguring the code as an R package simplifies installation, esp. for dependencies over requiring users to use Git and then run the setup script in R.

Description

Basically same as above, just caught my attention from looking through the installation instructions in the README.

Type of Issue

feature-request

Related Issue(s)

This might also help to organize the tests for #60.

improve documentation

Description

  • new walk-through slide
  • new performance test

Type of Issue

Related Issue(s)

Logfile

Change theme of histogram(s)❓

The standard ggplot2 theme looks a bit ugly for the FAS score histogram. Maybe one could change it to theme_minimal or comparable?

image

improve input / output options

Description

  • METADATA for input (e.g. refspec, number of genes for preData, plot size, clustering...)
  • input file contains more than 5 columns (users will choose 2 for visualising)
  • additional input option for OMA (either list of protein IDs or list of OMA groups from file or copy/paste)
  • option to input oneSeq folder (*.phyloprofile, *.extended.fa, *. *.domains)
  • upload 2 domains files and add switch button to choose which will be used for plotting
  • use shinyFile for input domain and fasta folder
  • output taxonomy tree together with domains of selected taxon (like DoMosaics)

Type of Issue

Related Issue(s)

Logfile

Long format input to large for upload

Description

Hi there,
I have an orthologe table from OrthoFinder with 144 species, which makes it quite large. I have converted the orthofinder groups into the long format looking like this:

geneID  ncbiID  orthoID FAS     traceability
OG0000000       ncbi81532       ACANSP|TRINITY_DN10128_c0_g3__TRINITY_DN10128_c0_g3_i1__g.39236__m.39236        NA      NA
OG0000000       ncbi81532       ACANSP|TRINITY_DN10739_c0_g2__TRINITY_DN10739_c0_g2_i1__g.43040__m.43040        NA      NA
OG0000000       ncbi81532       ACANSP|TRINITY_DN10739_c0_g2__TRINITY_DN10739_c0_g2_i2__g.43043__m.43043        NA      NA
OG0000000       ncbi81532       ACANSP|TRINITY_DN10797_c0_g1__TRINITY_DN10797_c0_g1_i1__g.43004__m.43004        NA      NA
OG0000000       ncbi81532       ACANSP|TRINITY_DN1109_c0_g1__TRINITY_DN1109_c0_g1_i1__g.3023__m.3023    NA      NA

However, when I try to upload this into the PhyloProfile GUI it complains that the file size is too large. It is indeed ~440Mb. Can the file size limit be increased? Is there another way to get the OrthoFinder output into PhyloPattern?

Type of Issue

Potential bug, or feature request.

Related Issue(s)

Logfile

Making a first release

With the submission of PhyloProfile we should probably start using some kind of versioning-system. This is partly inspired by the milestone that our PR to taxize got.

GitHub makes that easy with their tags/releases-feature. A new release for PhlyloProfile can be created here.

The suggested guideline is to use semantic versioning, with MAJOR.MINOR.PATCH being the numbering system. So the current state could be v1.0.0 or v.1.0.0-beta, depending on how conservative one wants to be.

color of domain plot

Description

Color of the same domains between 2 proteins should be the same to make it easier to compare.

Type of Issue

Improvement

Related Issue(s)

Logfile

OMA Standalone OrthoXML error

Hello dear developers,

I encountered a problem loading orthoXML file of OMA Standalone analysis. As far as I know OMA doesn't generate any scores for orthologs (similar to FAS score from hamstr/fdog). But PhyloProfile doesn't allow to upload orthoxml file without any variables.
Is it possible somehow to use the files without scores for visualization in PhyloProfile?

Thank you!

missing taxa when download customized profile - is it ok?

Description

The downloaded customized profile could have less taxa than the original data, which could cause a wrong calculation of the percentage of the present taxa in each superkingdom, just because the number of included taxa has been changed.

Type of Issue

Related Issue(s)

Logfile

plot features

Description

PLOT CONFIGURATION

  • plot configuration as yml file (if not, use default values)

DOMAIN PLOT

  • desc for domains (with URL)
  • show domain plot directly on main/customized profile tab
  • option to filter out common domains; or dropdown menu for selecting domains of interest
  • simplify text (seq ID, feature ID)
  • should be more appealing

MAIN PLOT

  • add lines to split taxon groups (e.g. euk, bac, archaea), user can choose the resolution (e.g. phylum/kingdom/superkingdom,..)
  • highlight a group of genes (e.g. gene cluster)
  • auto adjust plot area to number of genes and species
  • link seq ID with UNIPROT or NCBI if possible
  • set number of proteins to be displayed should be in the input page
  • option to lock clustered profile from the filtering (i.e. profile after filtering will not change its order)
  • option to show all taxa in the input independent on the current genes

CUSTOMIZED PLOT

  • [option to] expand the selectize input for gene list

DETAILED PLOT

  • Simplify seq ID by showing Species + gene ID (not the default hamstr ID)
  • Make sure to sort the Y-axis (gene IDs)
  • Display systematic string for the two taxa (?)

Type of Issue

Related Issue(s)

Logfile

Idea: Customized Profile, Filter by Rank

So far the Customized Profile only allows filtering on the taxonomic rank given for the main plot. Which means if you're on the species level and have 400 taxa with 100 fungi you can't easily filter for those 100 fungi, as you'd need to give all 100 fungal species manually in the customized profile.

It would be great if the Customized Profile would allow to select a taxonomic rank and then sub-select for this given Rank (e.g. Rank: Kingdom Supertaxa: Fungi).

write tests for new functions

Description

Copied from @gedankenstuecke email: We really should start to write some tests for PhyloProfile to make sure that all functions do put out what they’re supposed to. If we get started with this a tiny bit before the sprint adding more tests could make a good sprint task. :-)

Type of Issue

To make the distribution easier and this tool ... cooler :D

Related Issue(s)

Logfile

Comments on branch `v0.3.0-beta`

Feedback on branch for Mozilla OL

Description

  • I'm concerned a bit about the python system calls. Would using rPython be better in that it would handle checking for python being installed and possibly avoid other issues with using the system() function to call python from the command line?
  • Agree with issue #60 on adding tests for functions. This could expand to using the demo data files for testing (though re-iterating #63, I think this works a bit more seamlessly if the project follows an R package structure).
  • Add documentation to specify the inputs/outputs of functions, especially in functions.R.
  • Check style on code in functions.R. My linter produced several reasonable complaints about the formatting here and variable names.

Type of Issue

feature request

Related Issue(s)

#60 to add tests.
#63 to make the project into an R package.

issue with checkNewick function

Description

checkNewick function should be checked.

  • (number_of_commas) = (number_of_open parentheses + 1) => not always true (for example: ((a,b),c); is also a correct newick tree where (number_of_commas) = (number_of_open parentheses))
  • missingTaxa returns an empty list and creates an error msg " not exist in main input file!"

Type of Issue

Bug

Related Issue(s)

Logfile

Bug: Could not find file server.R to run

Description

After all libraries were downloaded and installed, R couldn't find the file server.R which is included in the package.

Solution:

  • Check default directory
    image
  • Change the path like to your package
    image

Note: I tested this package using windows system

Type of Issue

Bug

Related Issue(s)

This could be related to #63

Logfile

1

demo files for OMA Standalone OrthoXML-converter are gone.

Whops, these should be in some place where people can see them right away. I'd still be in favor of having some demo-folder with some small example files to that end. Especially as "🐵 see, 🐵 do" is often the best way to learn how to run something 😂

rooting user input tree

Description

Currently the user input taxonomy tree has to be rooted (and sorted) before uploading into the tool. The selecting (and changing) reference species option cannot affect the plot.

Type of Issue

Make it possible to use the chosen reference species to root the input taxonomy tree if it is not rooted with that species.

Related Issue(s)

Logfile

Gene age estimation plots: Wrong ordering

screen shot 2017-08-14 at 10 35 33

It appears that the gene age distribution plot sometimes orders the bars itself in the wrong order. The 161 genes to the left belong to the Ascomycota, clicking on that range will successfully list the Ascomycota genes. Unfortunately the underlying stacked bar-plot is giving the color for the LECA genes (blue) instead of red for the Ascomycota.

Make binary colors look nicer

Right now the dots are getting a rather ugly brown color if you don't upload any data for var1. Guess it would be nicer if this was either a nicer color (e.g. the blue used otherwise) or user selectable?

check the user-defined taxonomy file before processing

Description

An incorrect user-defined taxonomy file (newTaxa.txt) can cause many invisible problems by making the taxonomy matrix file broken (taxonomyMatrix.txt, and in some cases, even idList.txt and rankList.txt are also affected).
Those problems occur mostly because of the duplicated IDs or taxon Names between newTaxa.txt and the idList.txt, rankList.txt or taxonomyMatrix.txt.

So, this newTaxa.txt file should be checked before being processed for:

  • duplicated IDs with existing taxonomy files
  • duplicated names with existing taxonomy files
  • validity of new taxon IDs (greater than the current highest ncbi taxon ID)

And It would be good to implement an indicators for this process, so that the user can estimate how long it will take to finish the job.

Type of Issue

Improve the validity check

Related Issue(s)

Logfile

Add histogram for taxon presence distribution❓

First off: well done on adding the histogram on the FAS scores, it makes it much easier to see how different cutoffs will impact how much data is "hidden" in the end. Do you think it would be useful to add the same for the % of taxa being present in a given supertaxon? As you offer to filter according to that it might be useful as well. 😄

Readme-Link

In PhyloProfile we have now linked out to the GitHub-page that renders the README.md, while this is a nice hack to show the README to people, it has the drawback that it takes people away from the app. Which is unwanted, especially if people are already actively using their app.

We should fix this by either:

  • Make sure the link opens in a new tab/window
  • The whole thing is rendered inside the PhyloProfile-App, without losing the main navigation and all progress/analyses made.

The first is probably easier, the second would be nicer and could be used for rendering out GH-Wiki as well.

X Axis labels horizontal adjustment

Description

With large X axis labels and with any angle, the leftmost X axis label can reach outside the plotting space. Then, the plotting function cuts off the label. If you set the angle at 90° to avoid this issue, the horizontal adjustment disjoints the label from the corresponding data column.

There are two suggestions:

  1. Less favorable: Increase the plotting space to the left by 0.5 to 1 px by overriding the margin.
  2. More favorable: Set the hjust value to 0.5 or 0 when setting the angle to 90°, whatever would centralize the label. I am aware that conditional formatting like this may not be straightforward with R plotting functions.

Type of Issue

Feature request.

Related Issue(s)

None that I am aware of.

Logfile

Observed in the Anaconda package:

Name                            Version           Build            Channel
bioconductor-phyloprofile       1.4.9             r40hdfd78af_0    bioconda

Evidence:

horizontaladjustmentrequest

convergence point

Description

  • Identifying the time point when an ancestry gene got duplicated?

Type of Issue

Related Issue(s)

Logfile

Profiles clustering

So far, the clustering of profiles is done before any of the filters (% present, variable 1, variable 2) are applied. Furthermore, it always clusters on the binary presence/absence of species, even if the subsequent display is done on the grouping on a higher taxonomic level.

It would be awesome if the clustering is done

  1. on the filtered data, after individual taxon presences are removed,
  2. on the grouped level, using the % of species present as the values for the distance matrix.

Otherwise the resulting clusterings look very strange if you look at filtered & grouped profiles, as the clustering does not correspond to what you actually observe.

Filter for number of co-orthologs

Description

It would be nice to have an additional filter so that the user is able to only select orthologous groups without co-orthologs. This could also be a numerical filter with slider. The user could also select a maximum number of co-orthologs to be present.

Type of Issue

Feature request

Provide scripts for input conversion

So far the input data needs to be in a specific matrix format (see data/demo) in order to be compatible with PhyloProfile.

Most people will not have their data ready in this format. In order to make it easier it would be cool if there was a way to automate the data preprocessing, either by including the corresponding conversion inside PhyloProfile or by providing a script that takes long format data and converts them into the appropriate matrices.

  • decide whether to provide external conversion or do it inside PhyloProfile
  • implement the preferred solution 😉

Integrate into a 'Galaxy Interactive Environment'

Description

This is a suggestion for a Mozilla Global Sprint 2018 activity. It would be great to be able to install PhyloProfile as a Galaxy Interactive Environment. That might allow researchers to build PhyloProfile into a larger analysis workflow or automate input data preparation.

There are a few other projects that have integrated Shiny apps into Galaxy that we might be able to build from:

https://github.com/65MO/Galaxy-E
https://github.com/ValentinChCloud/shiny-GIE
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5691353/

Type of Issue

Suggestion & Feature Request

Related Issue(s)

Relates to #63 because it may involve adjusting the installation process.

data filter using uploaded list of genes

Description

some of the functions (e.g. total number of genes, list of genes to highlight) still use the whole input data instead of only the subset based on the uploaded list of gene of interest.

wrong information for clicked point

Description

After clustering the profile, the "point info" shows incorrect info for the clicked point.
It seems that, only the plot is reordered but the data frame used for plotting is not.

Type of Issue

a BIG bug 👎

Related Issue(s)

Logfile

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.