biocore / emperor Goto Github PK
View Code? Open in Web Editor NEWEmperor a tool for the analysis and visualization of large microbial ecology datasets
Home Page: http://biocore.github.io/emperor/
License: Other
Emperor a tool for the analysis and visualization of large microbial ecology datasets
Home Page: http://biocore.github.io/emperor/
License: Other
Strings in the mapping file should be sanitized for double and single quotes.
Also the way the mapping file is written into the index.html file should change so lines are not incredibly long when the metadata is rich.
Name of the file to save the first difference, or the
root mean square (RMS) of the vectors grouped by the
column used with the --add_vectors function. Note that
this option only works with --add_vectors. The file is
going to be created inside the output_dir and its name
will start with the word 'Vectors'.[default:
vectors_output.txt]
Notes
See issue #8.
For example, we have 2 columns: project and body site and we want to color by project but then select project1 and color by body site only those samples. Currently, the only way to do it is by adding a column like this in the mapping file, which is not the most efficient way to do this.
A group of samples should be capable of creating an ellipsoid centered at the centroid of all the points and with a volume given either by the standard deviation or the standard error. Some related code already exists in https://github.com/qiime/evident/blob/master/evident/pcoa.py#L48
-o OUTPUT_DIR, --output_dir=OUTPUT_DIR
path to the output directory that will contain the
PCoA plot.
./ is fine but maybe emperor/ will be better
I think it will be good to save the options hidden/shown from other metadata; for example if you hide something by visit you want to keep it hidden when you start working by body site; what do you think?
If we go for this, it will be good to have an option that (de)activates this feature.
Right now the colors are based on a heatmap, which works well for few samples but doesn't really work well for a bunch of samples.
The current help is terrible, we need to add a better explanation and a example.
When the file just loads the y axis is to high in the windows that hides the number; at least when tested with the hmp.
This is the category from the metadata mapping file to
use as a custom axis in the plot. For instance, if
there is a pH category and you would like to see the
samples plotted on that axis instead of PC1, PC2,
etc., one can use this option. It is also useful for
plotting time-series data. Note: if there is any non-
numeric data in the column, it will not be plotted
[default: none]
I got a traceback when there were spaces in the paths where I was trying to copy the support files.
There should be an option that allows you to hide the right sidebar and enter full-screen mode. The use-case that will be relevant for this feature would specifically be useful for presentations/demos.
The user should be capable of selecting what samples does he want the transparency to be changed. This in turn will allow the visualization of the non-transparent objects to be more evident.
This feature should also take part with issue #3.
Currently we can use -x "AGE:70" to plot all non numeric values at point 70 but it will be great to be able to have something like: -x "AGE:EXPERIMENT:HMPV13:70" -x "AGE:EXPERIMENT:HMPV35:80", this will plot HMPV13 at 70 and BMPV35 at 80. If the user is missing some groups it will be great to say which groups are missing in the error message.
@rob-knight suggested in #39 to add an option to calculate the co variance matrix to determine the dimensions of the ellipsoids when jackknifing.
The co variance matrix should give you a tilted oval modeling the dataset as multivariate normal. Yes, each pc is a variate, matrix holds cov(i,j) with var(i) on diagonal, is used to calculate ellipsoid.
@antgonza found an available implementation of this in numpy, see:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html
This gist includes the source code that Ryan Kennedy shared with Rob:
https://gist.github.com/ElDeveloper/31d95bf37ac5aed8070f
--ellipsoid_smoothness=ELLIPSOID_SMOOTHNESS
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). Valid choices are
0-3. A value of 0 produces very coarse "ellipsoids"
but is fast to render. If you encounter a memory error
when generating or displaying the plots, try including
just one metadata column in your plot. If you still
have trouble, reduce the smoothness level to 0.
[default: 1]
--ellipsoid_opacity=ELLIPSOID_OPACITY
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). The valid range is
between 0-1. 0 produces completely transparent
(invisible) ellipsoids and 1 produces completely
opaque ellipsoids. [default=0.33]
--ellipsoid_method=ELLIPSOID_METHOD
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). Valid values are
"IQR" and "sdev". [default=IQR]
I'm looking at a study right now with a field that describes elevation. It would be nice if the elevation values were sorted numerically.
There seem to be a lot of new features that would be really cool getting hooked up ASAP.
Updating THREE, will require to dig in the qiime/evident repository to see what's the actual THREE.js that we are using.
Background color to use in the plots. [default: black]
This doesn't work:
make_emperor.py -i pc.txt -o emperor_age -m map.txt -b "EXPERIMENT_TITLE" -a AGE -x "AGE:70"
This work:
make_emperor.py -i pc.txt -o emperor_age -m map.txt -b "EXPERIMENT_TITLE,AGE" -a AGE -x "AGE:70"
Working the other day with @meganap we noticed that the indentation levels of this file is inconsistent.
Also I've noticed some other things that get to be really annoying:
When fixing this make sure nobody else has a pull request open with changes referring to this file because this will most like cause conflicts for either of the two and since this is a major change to the whole file it will get super messy ๐ฃ โก
The tabs (where the options are) do not have scrollbars. This is important for those metadata columns that have a lot of options, like age or time.
For easy testing.
When the data to visualize is "large" ~3000 samples, and you open the main window, a white screen will appear for a while and nothing will be shown. Perhaps adding a splash screen before the data is loaded would suffice to enhance the user experience.
We need a new tab or something that allows users to easy see and find the list of taxa in the biplots.
It will be nice to be able to color taxa based in their taxonomy string and also show/hide them based on this.
Comma-separated list of scaling methods (i.e. scaled
or unscaled) [default=unscaled]
Use something similar to qiime_test_data_dir for automatic testing and test the script parameters.
This file should contain the new features and implementations that make it nicer than make_3d_plots.py.
The project should provide a standard setup.py
script that at the very least checks for the QIIME python libraries to be available.
The full taxa strings can be cumbersome to visualize.
An example taxonomy string:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Ruminococcus;s__
The most informative (and least abusive of real estate) could have just g__Ruminococcus.
It might be helpful to allow the entire taxa string to be displayed, or one or more of the semicolon separated levels (so allow phylum and genus level to be displayed) as a column selected of the taxa.
Could also be handy to allow modification of the text (so only display genus level g__Ruminococcus) and then delete the prefix, so it just shows Ruminococcus.
This should be fixed by changing the keep_empty_rows
argument in parse_classic_otu_table
, note that this option only became available in QIIME 1.6.0-dev ( biocore/qiime@b026d97 ).
For a small discussion on the topic see #54.
-t TAXA_FNAME, --taxa_fname=TAXA_FNAME
Used only when generating BiPlots. Input summarized
taxa filepath (i.e., from summarize_taxa.py). Taxa
will be plotted with the samples. [default=none]
--n_taxa_keep=N_TAXA_KEEP
Used only when generating BiPlots. This is the number
of taxa to display. Use -1 to display all. [default:
10]
--biplot_output_file=BIPLOT_OUTPUT_FILE
Used only when generating BiPlots. Output coordinates
filepath when generating a biplot. [default: none]
Notes:
Create vectors based on a column of the mapping file.
This parameter accepts up to 2 columns: (1) create the
vectors, (2) sort them. If you wanted to group by
Species and order by SampleID you will pass
--add_vectors=Species but if you wanted to group by
Species but order by DOB you will pass
--add_vectors=Species,DOB; this is useful when you use
--custom_axes param [default: none]
--vectors_algorithm=VECTORS_ALGORITHM
The algorithm used to create the vectors. The method
used can be RMS (either using 'avg' or 'trajectory');
or the first difference (using 'diff'), or 'wdiff' for
a modified first difference algorithm (see
--window_size) the aforementioned use all the
dimensions and weights them using their percentage
explained; returns the norm of the created vectors;
and their confidence using ANOVA. The Vectors are
created as follows: for 'avg' it calculates the
average at each timepoint (averaging within a group),
then calculates the norm of each point; for
'trajectory' calculates the norm for the 1st-2nd, 2nd-
3rd, etc.; for 'diff', it calculates the norm for all
the time-points and then calculates the first
difference for each resulting point; for for 'wdiff'
it uses the same procedure as the previous method but
the subtraction will be between the mean of the next
number of elements specified in --window_size and the
current element, both methods ('wdiff' and 'diff')
will also include the mean and the standard deviation
of the calculations [defautl: none]
This in turn will require to add the following options.
-w, --weight_by_vector
Use -w when you want the output created in the
--vectors_path to be weighted by the space between
samples in the --add_vectors, sorting column, i. e.
days between samples [default: False]
--window_size=WINDOW_SIZE
Use --window_size, when selecting the modified first
difference ('wdiff') option for --vectors_algorithm.
This integer determines the number of elements to be
averaged per element subtraction, the resulting
vector. [default: none]
Notes:
The user should be able to specify multiple algorithms to be executed in the same command.
The current default view-point is PC1 vs PC3 but with some rotation, this should be changed so it presented PC1 vs PC2 as the default as that's usually where the main information is contained.
In turn when this gets done, the recenter camera button should take you back to this view-point. Similar to what KiNG does with View
>Unnamed View
.
I know this is already open in E-vident, but this is really an emperor issue.
Input user-generated preferences filepath. NOTE: This
is a file with a dictionary containing preferences for
the analysis. [default: none]
Notes:
Not all examples in tests/scripts_test_data/make_emperor/
have the same tools and menu selectors.
Currently the axes do not display a label besides the one where the % explained
is displayed; a way to add a label to display a string for each axis should be added. Similar to what is being done in KiNG where each axis is labeled as PC1
, PC2
, etc.
Emperor creates an empty plot when you pass a pc and mapping file without common samples
--vectors_axes=VECTORS_AXES
The number of axes to account while doing the vector
specificcalculations. We suggest using 3 because those
are the ones being displayed in the plots but you
could use any number between 1 and number of samples-
After I installed Emperor using python setup.py install
, I try running the tests with:
python tests/all_tests.py --emperor_scripts_dir scripts
The unit tests and script usage tests appear to run:
Testing /home/jrideout/dev/qiime-deploy-conf/emperor-0.8-dev/foo/emperor-0.8-repository-dc9aa375/tests/test_format.py:
test_format_mapping_file_to_js (__main__.TopLevelTests)
Tests correct formatting of the metadata mapping file ... ok
test_format_pcoa_to_js (__main__.TopLevelTests)
Test correct formatting of the PCoA file ... ok
----------------------------------------------------------------------
Ran 2 tests in 0.001s
OK
Testing /home/jrideout/dev/qiime-deploy-conf/emperor-0.8-dev/foo/emperor-0.8-repository-dc9aa375/tests/test_util.py:
test_copy_support_files (__main__.TopLevelTests)
Test the support files are correctly copied to a file path ... ok
test_fill_mapping_field_from_mapping_file (__main__.TopLevelTests)
Check the values are being correctly filled in ... ok
test_keep_columns_from_mapping_file (__main__.TopLevelTests)
Check correct selection of metadata is being done ... ok
test_preprocess_coords_file (__main__.TopLevelTests)
Check correct processing is applied to the coords ... ok
test_preprocess_mapping_file (__main__.TopLevelTests)
Check correct preprocessing of metadata is done ... ok
test_sanitize_mapping_file (__main__.TopLevelTests)
Check the mapping file strings are sanitized for it's use in JS ... ok
----------------------------------------------------------------------
Ran 6 tests in 0.015s
OK
Tests to run:
make_emperor
Testing 5 usage examples from: scripts//make_emperor.py
Running tests in: /tmp/script_usage_tests/make_emperor
Tests:
scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -o emperor_output : Failed
scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -b 'Treatment&&DOB,Treatment' -o emperor_colored_by : Failed
scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -a DOB -o pcoa_dob : Failed
scripts//make_emperor.py -i unweighted_unifrac_pc -m Fasting_Map.txt -o jackknifed_pcoa -e sdev : Failed
scripts//make_emperor.py -i unweighted_unifrac_pc -s unweighted_unifrac_pc/pcoa_unweighted_unifrac_rarefaction_110_5.txt -m Fasting_Map.txt -o jackknifed_with_master : Failed
==============
Result summary
==============
Unit test result summary
------------------------
All unit tests passed.
Could not run script usage tests.
The Emperor scripts directory could not be automatically located, try supplying it manually using with the --emperor_scripts_dir option.
However, there is still an error message at the bottom of the output indicating that the scripts directory couldn't be found, even though I specified it in the command.
Another related problem: the script usage tests fail because all_tests.py doesn't cd
into the script usage tests directory before trying to run the commands, and thus can't find the input files.
This might because of the WebGL code.
The screen starts flickering and you pretty much need to do a hard reset.
Talking with @antgonza the other day, he brought up this; right now the file is called emperor.html
, this should change to index.html
.
make_3d_plots.py does not work with continuous coloring if a category contains values in scientific notation. If a sample has low values for that category (e.g. 0.01) and second sample has very low value (1e-10), they appear with very different colors in the plot, when they should appear as similarly colored.
Comma-separated list categories metadata categories
(column headers) to color by in the plots. The
categories must match the name of a column header in
the mapping file exactly. Multiple categories can be
list by comma separating them without spaces. The user
can also combine columns in the mapping file by
separating the categories by "&&" without spaces.
[default=color by all]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.