Git Product home page Git Product logo

emperor's People

Contributors

ahdilmore avatar antgonza avatar eldeveloper avatar fedarko avatar gitter-badger avatar gregcaporaso avatar jacksonchen avatar jairideout avatar jayaddison avatar jorge-c avatar josenavas avatar meganap avatar mortonjt avatar santiagotorres avatar teravest avatar wasade avatar wdwvt1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emperor's Issues

Sanitize strings in the mapping file

Strings in the mapping file should be sanitized for double and single quotes.

Also the way the mapping file is written into the index.html file should change so lines are not incredibly long when the metadata is rich.

add: --vectors_path=VECTORS_PATH

Name of the file to save the first difference, or the
root mean square (RMS) of the vectors grouped by the
column used with the --add_vectors function. Note that
this option only works with --add_vectors. The file is
going to be created inside the output_dir and its name
will start with the word 'Vectors'.[default:
vectors_output.txt]


Notes

See issue #8.

Color a category in a metadata column based on other column.

For example, we have 2 columns: project and body site and we want to color by project but then select project1 and color by body site only those samples. Currently, the only way to do it is by adding a column like this in the mapping file, which is not the most efficient way to do this.

Axes are misleading

The size of the axes are way beyond the range of the points and is misleading. For instance, in the attached photo, PC1 is 15% variation but the points do not cover the range of variation.
Screen Shot 2013-04-29 at 12 43 00 PM

save the previous state

I think it will be good to save the options hidden/shown from other metadata; for example if you hide something by visit you want to keep it hidden when you start working by body site; what do you think?

If we go for this, it will be good to have an option that (de)activates this feature.

% explained hidden in y axis

When the file just loads the y axis is to high in the windows that hides the number; at least when tested with the hmp.

add: -a CUSTOM_AXES, --custom_axes=CUSTOM_AXES

This is the category from the metadata mapping file to
use as a custom axis in the plot. For instance, if
there is a pH category and you would like to see the
samples plotted on that axis instead of PC1, PC2,
etc., one can use this option. It is also useful for
plotting time-series data. Note: if there is any non-
numeric data in the column, it will not be plotted
[default: none]

Fullscreen mode

There should be an option that allows you to hide the right sidebar and enter full-screen mode. The use-case that will be relevant for this feature would specifically be useful for presentations/demos.

Transparency on a per sample basis

The user should be capable of selecting what samples does he want the transparency to be changed. This in turn will allow the visualization of the non-transparent objects to be more evident.

This feature should also take part with issue #3.

We should be able to select different values for different groups in -x

Currently we can use -x "AGE:70" to plot all non numeric values at point 70 but it will be great to be able to have something like: -x "AGE:EXPERIMENT:HMPV13:70" -x "AGE:EXPERIMENT:HMPV35:80", this will plot HMPV13 at 70 and BMPV35 at 80. If the user is missing some groups it will be great to say which groups are missing in the error message.

Add co variance matrix as an option in --ellipsoid_method

@rob-knight suggested in #39 to add an option to calculate the co variance matrix to determine the dimensions of the ellipsoids when jackknifing.

The co variance matrix should give you a tilted oval modeling the dataset as multivariate normal. Yes, each pc is a variate, matrix holds cov(i,j) with var(i) on diagonal, is used to calculate ellipsoid.

@antgonza found an available implementation of this in numpy, see:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.cov.html


This gist includes the source code that Ryan Kennedy shared with Rob:
https://gist.github.com/ElDeveloper/31d95bf37ac5aed8070f

Add jackknifed beta diversity

--ellipsoid_smoothness=ELLIPSOID_SMOOTHNESS
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). Valid choices are
0-3. A value of 0 produces very coarse "ellipsoids"
but is fast to render. If you encounter a memory error
when generating or displaying the plots, try including
just one metadata column in your plot. If you still
have trouble, reduce the smoothness level to 0.
[default: 1]
--ellipsoid_opacity=ELLIPSOID_OPACITY
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). The valid range is
between 0-1. 0 produces completely transparent
(invisible) ellipsoids and 1 produces completely
opaque ellipsoids. [default=0.33]
--ellipsoid_method=ELLIPSOID_METHOD
Used only when plotting ellipsoids for jackknifed beta
diversity (i.e. using a directory of coord files
instead of a single coord file). Valid values are
"IQR" and "sdev". [default=IQR]

Update THREE.js

There seem to be a lot of new features that would be really cool getting hooked up ASAP.

Updating THREE, will require to dig in the qiime/evident repository to see what's the actual THREE.js that we are using.

Using -b with -a needs to have the field in -a with -b

This doesn't work:
make_emperor.py -i pc.txt -o emperor_age -m map.txt -b "EXPERIMENT_TITLE" -a AGE -x "AGE:70"

This work:
make_emperor.py -i pc.txt -o emperor_age -m map.txt -b "EXPERIMENT_TITLE,AGE" -a AGE -x "AGE:70"

Clean-up emperor.js

Working the other day with @meganap we noticed that the indentation levels of this file is inconsistent.

Also I've noticed some other things that get to be really annoying:

  • Set the indentation for the document to use tabs.
  • Prefix all the global variables with a g as it is very confusing to deal with some elements and know about their scope at a first glance.
  • Remove all commented code (this makes sense on some projects but since this is currently being tracked with git, it doesn't make sense to have them).

When fixing this make sure nobody else has a pull request open with changes referring to this file because this will most like cause conflicts for either of the two and since this is a major change to the whole file it will get super messy ๐Ÿ’ฃ โšก

Tabs do not have scrollbars

The tabs (where the options are) do not have scrollbars. This is important for those metadata columns that have a lot of options, like age or time.

Add splash screen

When the data to visualize is "large" ~3000 samples, and you open the main window, a white screen will appear for a while and nothing will be shown. Perhaps adding a splash screen before the data is loaded would suffice to enhance the user experience.

Add test data to emperor

Use something similar to qiime_test_data_dir for automatic testing and test the script parameters.

Add ChangeLog.md

This file should contain the new features and implementations that make it nicer than make_3d_plots.py.

Create setup.py script

The project should provide a standard setup.py script that at the very least checks for the QIIME python libraries to be available.

Allow filtering of taxa strings for biplots

The full taxa strings can be cumbersome to visualize.

An example taxonomy string:
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;g__Ruminococcus;s__

The most informative (and least abusive of real estate) could have just g__Ruminococcus.

It might be helpful to allow the entire taxa string to be displayed, or one or more of the semicolon separated levels (so allow phylum and genus level to be displayed) as a column selected of the taxa.

Could also be handy to allow modification of the text (so only display genus level g__Ruminococcus) and then delete the prefix, so it just shows Ruminococcus.

add: Bioplots

-t TAXA_FNAME, --taxa_fname=TAXA_FNAME
Used only when generating BiPlots. Input summarized
taxa filepath (i.e., from summarize_taxa.py). Taxa
will be plotted with the samples. [default=none]
--n_taxa_keep=N_TAXA_KEEP
Used only when generating BiPlots. This is the number
of taxa to display. Use -1 to display all. [default:
10]
--biplot_output_file=BIPLOT_OUTPUT_FILE
Used only when generating BiPlots. Output coordinates
filepath when generating a biplot. [default: none]

Notes:

  • While adding biplots we should add tri and tetra plots (communities, the species, the environmental variables, and the functions)

add: --add_vectors=ADD_VECTORS

Create vectors based on a column of the mapping file.
This parameter accepts up to 2 columns: (1) create the
vectors, (2) sort them. If you wanted to group by
Species and order by SampleID you will pass
--add_vectors=Species but if you wanted to group by
Species but order by DOB you will pass
--add_vectors=Species,DOB; this is useful when you use
--custom_axes param [default: none]

--vectors_algorithm=VECTORS_ALGORITHM
The algorithm used to create the vectors. The method
used can be RMS (either using 'avg' or 'trajectory');
or the first difference (using 'diff'), or 'wdiff' for
a modified first difference algorithm (see
--window_size) the aforementioned use all the
dimensions and weights them using their percentage
explained; returns the norm of the created vectors;
and their confidence using ANOVA. The Vectors are
created as follows: for 'avg' it calculates the
average at each timepoint (averaging within a group),
then calculates the norm of each point; for
'trajectory' calculates the norm for the 1st-2nd, 2nd-
3rd, etc.; for 'diff', it calculates the norm for all
the time-points and then calculates the first
difference for each resulting point; for for 'wdiff'
it uses the same procedure as the previous method but
the subtraction will be between the mean of the next
number of elements specified in --window_size and the
current element, both methods ('wdiff' and 'diff')
will also include the mean and the standard deviation
of the calculations [defautl: none]


This in turn will require to add the following options.

-w, --weight_by_vector
Use -w when you want the output created in the
--vectors_path to be weighted by the space between
samples in the --add_vectors, sorting column, i. e.
days between samples [default: False]
--window_size=WINDOW_SIZE
Use --window_size, when selecting the modified first
difference ('wdiff') option for --vectors_algorithm.
This integer determines the number of elements to be
averaged per element subtraction, the resulting
vector. [default: none]


Notes:

The user should be able to specify multiple algorithms to be executed in the same command.

Default view-point should be PC1 vs PC2

The current default view-point is PC1 vs PC3 but with some rotation, this should be changed so it presented PC1 vs PC2 as the default as that's usually where the main information is contained.

In turn when this gets done, the recenter camera button should take you back to this view-point. Similar to what KiNG does with View>Unnamed View.

add: -p PREFS_PATH, --prefs_path=PREFS_PATH

Input user-generated preferences filepath. NOTE: This
is a file with a dictionary containing preferences for
the analysis. [default: none]

Notes:

  • This will be a good opportunity to revise the parameters file format. Currently is too hard to understand for users.

Add label for each axis

Currently the axes do not display a label besides the one where the % explained is displayed; a way to add a label to display a string for each axis should be added. Similar to what is being done in KiNG where each axis is labeled as PC1, PC2, etc.

Add connecting lines between points (vectors)

--vectors_axes=VECTORS_AXES

The number of axes to account while doing the vector
specificcalculations. We suggest using 3 because those
are the ones being displayed in the plots but you
could use any number between 1 and number of samples-

  1. To use all of them pass 0. [default: 3]

Emperor scripts directory cannot be found

After I installed Emperor using python setup.py install, I try running the tests with:

python tests/all_tests.py --emperor_scripts_dir scripts

The unit tests and script usage tests appear to run:

Testing /home/jrideout/dev/qiime-deploy-conf/emperor-0.8-dev/foo/emperor-0.8-repository-dc9aa375/tests/test_format.py:

test_format_mapping_file_to_js (__main__.TopLevelTests)
Tests correct formatting of the metadata mapping file ... ok
test_format_pcoa_to_js (__main__.TopLevelTests)
Test correct formatting of the PCoA file ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.001s

OK

Testing /home/jrideout/dev/qiime-deploy-conf/emperor-0.8-dev/foo/emperor-0.8-repository-dc9aa375/tests/test_util.py:

test_copy_support_files (__main__.TopLevelTests)
Test the support files are correctly copied to a file path ... ok
test_fill_mapping_field_from_mapping_file (__main__.TopLevelTests)
Check the values are being correctly filled in ... ok
test_keep_columns_from_mapping_file (__main__.TopLevelTests)
Check correct selection of metadata is being done ... ok
test_preprocess_coords_file (__main__.TopLevelTests)
Check correct processing is applied to the coords ... ok
test_preprocess_mapping_file (__main__.TopLevelTests)
Check correct preprocessing of metadata is done ... ok
test_sanitize_mapping_file (__main__.TopLevelTests)
Check the mapping file strings are sanitized for it's use in JS ... ok

----------------------------------------------------------------------
Ran 6 tests in 0.015s

OK

Tests to run:
 make_emperor
Testing 5 usage examples from: scripts//make_emperor.py
 Running tests in: /tmp/script_usage_tests/make_emperor
 Tests:
  scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -o emperor_output : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -b 'Treatment&&DOB,Treatment' -o emperor_colored_by : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc.txt -m Fasting_Map.txt -a DOB -o pcoa_dob : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc -m Fasting_Map.txt -o jackknifed_pcoa -e sdev : Failed
  scripts//make_emperor.py -i unweighted_unifrac_pc -s unweighted_unifrac_pc/pcoa_unweighted_unifrac_rarefaction_110_5.txt -m Fasting_Map.txt -o jackknifed_with_master : Failed

==============
Result summary
==============

Unit test result summary
------------------------


All unit tests passed.



Could not run script usage tests.
The Emperor scripts directory could not be automatically located, try supplying  it manually using with the --emperor_scripts_dir option.

However, there is still an error message at the bottom of the output indicating that the scripts directory couldn't be found, even though I specified it in the command.

Another related problem: the script usage tests fail because all_tests.py doesn't cd into the script usage tests directory before trying to run the commands, and thus can't find the input files.

add: -b COLORBY, --colorby=COLORBY

Comma-separated list categories metadata categories
(column headers) to color by in the plots. The
categories must match the name of a column header in
the mapping file exactly. Multiple categories can be
list by comma separating them without spaces. The user
can also combine columns in the mapping file by
separating the categories by "&&" without spaces.
[default=color by all]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.