biocore / emperor Goto Github PK

View Code? Open in Web Editor NEW

52.0 12.0 50.0 29.39 MB

Emperor a tool for the analysis and visualization of large microbial ecology datasets

Home Page: http://biocore.github.io/emperor/

License: Other

Python 34.74% CSS 0.20% JavaScript 63.31% HTML 1.65% Shell 0.11%

microbiome visualization emperor python bioinformatics ordination biplots

emperor's People

Contributors

Stargazers

Watchers

emperor's Issues

Sanitize strings in the mapping file

Strings in the mapping file should be sanitized for double and single quotes.

Also the way the mapping file is written into the index.html file should change so lines are not incredibly long when the metadata is rich.

add: --vectors_path=VECTORS_PATH

Name of the file to save the first difference, or the
root mean square (RMS) of the vectors grouped by the
column used with the --add_vectors function. Note that
this option only works with --add_vectors. The file is
going to be created inside the output_dir and its name
will start with the word 'Vectors'.[default:
vectors_output.txt]

Notes

See issue #8.

Color a category in a metadata column based on other column.

For example, we have 2 columns: project and body site and we want to color by project but then select project1 and color by body site only those samples. Currently, the only way to do it is by adding a column like this in the mapping file, which is not the most efficient way to do this.

Ellipsoids by metadata category

A group of samples should be capable of creating an ellipsoid centered at the centroid of all the points and with a volume given either by the standard deviation or the standard error. Some related code already exists in https://github.com/qiime/evident/blob/master/evident/pcoa.py#L48

add a default value for OUTPUT_DIR parameter

-o OUTPUT_DIR, --output_dir=OUTPUT_DIR
path to the output directory that will contain the
PCoA plot.

./ is fine but maybe emperor/ will be better

Axes are misleading

The size of the axes are way beyond the range of the points and is misleading. For instance, in the attached photo, PC1 is 15% variation but the points do not cover the range of variation.

save the previous state

I think it will be good to save the options hidden/shown from other metadata; for example if you hide something by visit you want to keep it hidden when you start working by body site; what do you think?

If we go for this, it will be good to have an option that (de)activates this feature.

Add controller to select between discrete vs continuos colors

Right now the colors are based on a heatmap, which works well for few samples but doesn't really work well for a bunch of samples.

-x needs better explanation

The current help is terrible, we need to add a better explanation and a example.

% explained hidden in y axis

When the file just loads the y axis is to high in the windows that hides the number; at least when tested with the hmp.

add: -a CUSTOM_AXES, --custom_axes=CUSTOM_AXES

This is the category from the metadata mapping file to
use as a custom axis in the plot. For instance, if
there is a pH category and you would like to see the
samples plotted on that axis instead of PC1, PC2,
etc., one can use this option. It is also useful for
plotting time-series data. Note: if there is any non-
numeric data in the column, it will not be plotted
[default: none]

copy_support_files will fail when there are spaces in the paths

I got a traceback when there were spaces in the paths where I was trying to copy the support files.

Fullscreen mode

There should be an option that allows you to hide the right sidebar and enter full-screen mode. The use-case that will be relevant for this feature would specifically be useful for presentations/demos.

Transparency on a per sample basis

The user should be capable of selecting what samples does he want the transparency to be changed. This in turn will allow the visualization of the non-transparent objects to be more evident.

This feature should also take part with issue #3.

We should be able to select different values for different groups in -x

Currently we can use -x "AGE:70" to plot all non numeric values at point 70 but it will be great to be able to have something like: -x "AGE:EXPERIMENT:HMPV13:70" -x "AGE:EXPERIMENT:HMPV35:80", this will plot HMPV13 at 70 and BMPV35 at 80. If the user is missing some groups it will be great to say which groups are missing in the error message.

Add co variance matrix as an option in --ellipsoid_method

@rob-knight suggested in #39 to add an option to calculate the co variance matrix to determine the dimensions of the ellipsoids when jackknifing.

The co variance matrix should give you a tilted oval modeling the dataset as multivariate normal. Yes, each pc is a variate, matrix holds cov(i,j) with var(i) on diagonal, is used to calculate ellipsoid.

@antgonza found an available implementation of this in numpy, see:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html

This gist includes the source code that Ryan Kennedy shared with Rob:
https://gist.github.com/ElDeveloper/31d95bf37ac5aed8070f

Add jackknifed beta diversity

--ellipsoid_smoothness=ELLIPSOID_SMOOTHNESS
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). Valid choices are
0-3. A value of 0 produces very coarse "ellipsoids"
but is fast to render. If you encounter a memory error
when generating or displaying the plots, try including
just one metadata column in your plot. If you still
have trouble, reduce the smoothness level to 0.
[default: 1]
--ellipsoid_opacity=ELLIPSOID_OPACITY
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). The valid range is
between 0-1. 0 produces completely transparent
(invisible) ellipsoids and 1 produces completely
opaque ellipsoids. [default=0.33]
--ellipsoid_method=ELLIPSOID_METHOD
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). Valid values are
"IQR" and "sdev". [default=IQR]

Allow for numeric sort on metadata categories

I'm looking at a study right now with a field that describes elevation. It would be nice if the elevation values were sorted numerically.

Update THREE.js

There seem to be a lot of new features that would be really cool getting hooked up ASAP.

Updating THREE, will require to dig in the qiime/evident repository to see what's the actual THREE.js that we are using.

add: -k BACKGROUND_COLOR, --background_color=BACKGROUND_COLOR

Background color to use in the plots. [default: black]

Using -b with -a needs to have the field in -a with -b

This doesn't work:
make_emperor.py -i pc.txt -o emperor_age -m map.txt -b "EXPERIMENT_TITLE" -a AGE -x "AGE:70"

This work:
make_emperor.py -i pc.txt -o emperor_age -m map.txt -b "EXPERIMENT_TITLE,AGE" -a AGE -x "AGE:70"

Clean-up emperor.js

Working the other day with @meganap we noticed that the indentation levels of this file is inconsistent.

Also I've noticed some other things that get to be really annoying:

Set the indentation for the document to use tabs.
Prefix all the global variables with a g as it is very confusing to deal with some elements and know about their scope at a first glance.
Remove all commented code (this makes sense on some projects but since this is currently being tracked with git, it doesn't make sense to have them).

When fixing this make sure nobody else has a pull request open with changes referring to this file because this will most like cause conflicts for either of the two and since this is a major change to the whole file it will get super messy 💣 ⚡

Tabs do not have scrollbars

The tabs (where the options are) do not have scrollbars. This is important for those metadata columns that have a lot of options, like age or time.

add an all_test.py file

For easy testing.

Add splash screen

When the data to visualize is "large" ~3000 samples, and you open the main window, a white screen will appear for a while and nothing will be shown. Perhaps adding a splash screen before the data is loaded would suffice to enhance the user experience.

Easy way to display taxa names in biplots

We need a new tab or something that allows users to easy see and find the list of taxa in the biplots.

show/hide and color taxa in biplots based in different taxa levels

It will be nice to be able to color taxa based in their taxonomy string and also show/hide them based on this.

add: -s SCALING_METHOD, --scaling_method=SCALING_METHOD

Comma-separated list of scaling methods (i.e. scaled
or unscaled) [default=unscaled]

Error in coloring due to non alphanumeric characters

Via @antgonza, originally posted here

When the categories of a column in the mapping file has no alphanumeric characters (for example '), the coloring by metadata doesn't work.

Add test data to emperor

Use something similar to qiime_test_data_dir for automatic testing and test the script parameters.

Add ChangeLog.md

This file should contain the new features and implementations that make it nicer than make_3d_plots.py.

Create setup.py script

The project should provide a standard setup.py script that at the very least checks for the QIIME python libraries to be available.

Allow filtering of taxa strings for biplots

The full taxa strings can be cumbersome to visualize.

An example taxonomy string:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Ruminococcus;s__

The most informative (and least abusive of real estate) could have just g__Ruminococcus.

It might be helpful to allow the entire taxa string to be displayed, or one or more of the semicolon separated levels (so allow phylum and genus level to be displayed) as a column selected of the taxa.

Could also be handy to allow modification of the text (so only display genus level g__Ruminococcus) and then delete the prefix, so it just shows Ruminococcus.

Empty rows in OTU table for biplots will produce warning

This should be fixed by changing the keep_empty_rows argument in parse_classic_otu_table, note that this option only became available in QIIME 1.6.0-dev ( biocore/qiime@b026d97 ).

For a small discussion on the topic see #54.

add: Bioplots

-t TAXA_FNAME, --taxa_fname=TAXA_FNAME
Used only when generating BiPlots. Input summarized
taxa filepath (i.e., from summarize_taxa.py). Taxa
will be plotted with the samples. [default=none]
--n_taxa_keep=N_TAXA_KEEP
Used only when generating BiPlots. This is the number
of taxa to display. Use -1 to display all. [default:
10]
--biplot_output_file=BIPLOT_OUTPUT_FILE
Used only when generating BiPlots. Output coordinates
filepath when generating a biplot. [default: none]

Notes:

While adding biplots we should add tri and tetra plots (communities, the species, the environmental variables, and the functions)

add: --add_vectors=ADD_VECTORS

Create vectors based on a column of the mapping file.
This parameter accepts up to 2 columns: (1) create the
vectors, (2) sort them. If you wanted to group by
Species and order by SampleID you will pass
--add_vectors=Species but if you wanted to group by
Species but order by DOB you will pass
--add_vectors=Species,DOB; this is useful when you use
--custom_axes param [default: none]

--vectors_algorithm=VECTORS_ALGORITHM
The algorithm used to create the vectors. The method
used can be RMS (either using 'avg' or 'trajectory');
or the first difference (using 'diff'), or 'wdiff' for
a modified first difference algorithm (see
--window_size) the aforementioned use all the
dimensions and weights them using their percentage
explained; returns the norm of the created vectors;
and their confidence using ANOVA. The Vectors are
created as follows: for 'avg' it calculates the
average at each timepoint (averaging within a group),
then calculates the norm of each point; for
'trajectory' calculates the norm for the 1st-2nd, 2nd-
3rd, etc.; for 'diff', it calculates the norm for all
the time-points and then calculates the first
difference for each resulting point; for for 'wdiff'
it uses the same procedure as the previous method but
the subtraction will be between the mean of the next
number of elements specified in --window_size and the
current element, both methods ('wdiff' and 'diff')
will also include the mean and the standard deviation
of the calculations [defautl: none]

This in turn will require to add the following options.

-w, --weight_by_vector
Use -w when you want the output created in the
--vectors_path to be weighted by the space between
samples in the --add_vectors, sorting column, i. e.
days between samples [default: False]
--window_size=WINDOW_SIZE
Use --window_size, when selecting the modified first
difference ('wdiff') option for --vectors_algorithm.
This integer determines the number of elements to be
averaged per element subtraction, the resulting
vector. [default: none]

Notes:

The user should be able to specify multiple algorithms to be executed in the same command.

Default view-point should be PC1 vs PC2

The current default view-point is PC1 vs PC3 but with some rotation, this should be changed so it presented PC1 vs PC2 as the default as that's usually where the main information is contained.

In turn when this gets done, the recenter camera button should take you back to this view-point. Similar to what KiNG does with View>Unnamed View.

Make save as svg work

I know this is already open in E-vident, but this is really an emperor issue.

add: -p PREFS_PATH, --prefs_path=PREFS_PATH

Input user-generated preferences filepath. NOTE: This
is a file with a dictionary containing preferences for
the analysis. [default: none]

Notes:

This will be a good opportunity to revise the parameters file format. Currently is too hard to understand for users.

all examples should be updated to the latest GUI

Not all examples in tests/scripts_test_data/make_emperor/ have the same tools and menu selectors.

Add label for each axis

Currently the axes do not display a label besides the one where the % explained is displayed; a way to add a label to display a string for each axis should be added. Similar to what is being done in KiNG where each axis is labeled as PC1, PC2, etc.

Raise warning when there is no matching sample between map and pc files

Emperor creates an empty plot when you pass a pc and mapping file without common samples

Add connecting lines between points (vectors)

--vectors_axes=VECTORS_AXES

The number of axes to account while doing the vector
specificcalculations. We suggest using 3 because those
are the ones being displayed in the plots but you
could use any number between 1 and number of samples-

To use all of them pass 0. [default: 3]

Emperor scripts directory cannot be found

After I installed Emperor using python setup.py install, I try running the tests with:

python tests/all_tests.py --emperor_scripts_dir scripts

The unit tests and script usage tests appear to run:

Testing /home/jrideout/dev/qiime-deploy-conf/emperor-0.8-dev/foo/emperor-0.8-repository-dc9aa375/tests/test_format.py:

test_format_mapping_file_to_js (__main__.TopLevelTests)
Tests correct formatting of the metadata mapping file ... ok
test_format_pcoa_to_js (__main__.TopLevelTests)
Test correct formatting of the PCoA file ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK

Testing /home/jrideout/dev/qiime-deploy-conf/emperor-0.8-dev/foo/emperor-0.8-repository-dc9aa375/tests/test_util.py:

test_copy_support_files (__main__.TopLevelTests)
Test the support files are correctly copied to a file path ... ok
test_fill_mapping_field_from_mapping_file (__main__.TopLevelTests)
Check the values are being correctly filled in ... ok
test_keep_columns_from_mapping_file (__main__.TopLevelTests)
Check correct selection of metadata is being done ... ok
test_preprocess_coords_file (__main__.TopLevelTests)
Check correct processing is applied to the coords ... ok
test_preprocess_mapping_file (__main__.TopLevelTests)
Check correct preprocessing of metadata is done ... ok
test_sanitize_mapping_file (__main__.TopLevelTests)
Check the mapping file strings are sanitized for it's use in JS ... ok

----------------------------------------------------------------------
Ran 6 tests in 0.015s

OK

Tests to run:
 make_emperor
Testing 5 usage examples from: scripts//make_emperor.py
 Running tests in: /tmp/script_usage_tests/make_emperor
 Tests:
  scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -o emperor_output : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -b 'Treatment&&DOB,Treatment' -o emperor_colored_by : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -a DOB -o pcoa_dob : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc -m Fasting_Map.txt -o jackknifed_pcoa -e sdev : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc -s unweighted_unifrac_pc/pcoa_unweighted_unifrac_rarefaction_110_5.txt -m Fasting_Map.txt -o jackknifed_with_master : Failed

==============
Result summary
==============

Unit test result summary
------------------------


All unit tests passed.



Could not run script usage tests.
The Emperor scripts directory could not be automatically located, try supplying  it manually using with the --emperor_scripts_dir option.

However, there is still an error message at the bottom of the output indicating that the scripts directory couldn't be found, even though I specified it in the command.

Another related problem: the script usage tests fail because all_tests.py doesn't cd into the script usage tests directory before trying to run the commands, and thus can't find the input files.

Using emperor in Chrome will randomly crash your computer

This might because of the WebGL code.

The screen starts flickering and you pretty much need to do a hard reset.

Generate publication quality legends for the plot being displayed.

The idea is that the information that appears under the colors tab (see image) could be printed to an eps, pdf formatted file for use in publications.

Metadata colors not showing up

See attached screenshots

Main HTML file should be called index.html

Talking with @antgonza the other day, he brought up this; right now the file is called emperor.html, this should change to index.html.

make_3d_plots.py with continuous coloring and scientific notation (imported from qiime)

make_3d_plots.py does not work with continuous coloring if a category contains values in scientific notation. If a sample has low values for that category (e.g. 0.01) and second sample has very low value (1e-10), they appear with very different colors in the plot, when they should appear as similarly colored.

add: -b COLORBY, --colorby=COLORBY

Comma-separated list categories metadata categories
(column headers) to color by in the plots. The
categories must match the name of a column header in
the mapping file exactly. Multiple categories can be
list by comma separating them without spaces. The user
can also combine columns in the mapping file by
separating the categories by "&&" without spaces.
[default=color by all]

biocore / emperor Goto Github PK

emperor's People

Contributors

Stargazers

Watchers

Forkers

emperor's Issues

Recommend Projects

Recommend Topics

Recommend Org