microscopium / microscopium-ui Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 4.0 6.04 MB

Microscopium web interface.

JavaScript 81.08% HTML 8.61% CSS 2.65% Python 7.66%

microscopium-ui's Introduction

Microscopium

Unsupervised clustering and dataset exploration for high content screens.

See microscopium in action

Public dataset BBBC021 from the Broad Bioimage Benchmark Collection with t-SNE image embedding.

For developers

We encourage pull requests - please get in touch if you think you might like to contribute.

License

This project uses the 3-clause BSD license. See LICENSE.txt.

Development installation

First, clone this repository, and change into its directory:

git clone https://github.com/microscopium/microscopium.git
cd microscopium

Then, install the dependencies via one of the below methods

conda, new environment (recommended)

conda env create -f environment.yml
conda activate mic

conda, existing environment

# conda activate <env-name>
conda install -f environment.yml

pip

pip install -r requirements.txt

Finally, install microscopium, optionally as an editable package:

pip install [-e] .

Serving the web app

Supported browsers are Chrome and Firefox. However we have observed that performance is much better on Chrome. (Unfortunately, we do not currently support Safari or Internet Explorer.)

Your data needs to have the following format:

a collection of image files (can be in a directory, or in a nested directory structure)
a .csv file containing, at a minimum, the x/y coordinates of each image, and the path to the image in the directory. The path should be relative to the location of the .csv file.
a .yaml file containing settings. At a minimum, it should contain an embeddings field with maps from <embedding name> to column names for x and y, as well as image-column containing the name of the column containing the path to each image. If you don't want to specify the settings file path, place settings.yaml next to the .csv file. Microscopium will look here by default.

For example data, see:

tests/testdata/images/*.png
tests/testdata/images/data.csv
tests/testdata/images/settings.yaml

To run the web app locally, try:

python -m microscopium.serve tests/testdata/images/data.csv -c tests/testdata/images/settings.yaml

You should then be able to see the app in your web browser at: http://localhost:5000

You can specify a port number with -P.

python -m microscopium.serve tests/testdata/images/data.csv -P 5001

This specifies the port number as 5001, and the app will run locally at: http://localhost:5001/

For more information, run python -m microscopium.serve --help

microscopium-ui's People

Contributors

Stargazers

Watchers

Forkers

starcalibre jni don86 genevievebuckley

microscopium-ui's Issues

Replace empty string with suitable flag string when no gene info is available

The first line in the gene filter (at least for the MYORES screen) is blank. This looks like a UI glitch when it is in fact probably due to missing gene information. Some possible replacement strings:

"[None]"
""
"blank"

(it might be worth prepending a character with a high sort value so that these are pushed to the botto of the list.)

Add zoom and pan tools to the PCA plot

This might alleviate the problem of dealing with outliers.

Add mongo instructions to README

Eventually, these will need to go in a documentation website, but as an initial step having them in the README would allow people to get started with the project. From @starcalibre:

Some commands for when you import the new dataset later!

First drop the old collection:
$ mongo
use micropscopium
db.samples.drop()
then in another terminal window
$ mongoimport -d microscopium -c samples microscopium-samples-genenames.json
now back to your mongo terminal
db.samples.count() // just to check that the import succeeded
db.samples.ensureIndex({'screen':1});
db.samples.ensureIndex({'row':1});
db.samples.ensureIndex({'column':1});
db.samples.ensureIndex({'gene_name':1});
As you might guess, this will build indices on the specified fields. This is done because the filter will be querying on these fields, so you want to make sure an index is there to avoid O(n) queries!

Improve histogram binning

See if it's feasible to have the same range (e.g. -4 to +4) for all the histograms, with more bins than currently (I think that the bars are too fat right now)
Put the tails in a special wide bin, e.g. anything < -3 goes in a bin from -3 to -4. (while all other bins are 0.2 wide, for example.)

Performance of filtering is sluggish

Clicking around with no filters active is nice and smooth. Once a filter is active, everything becomes much more stuttery. We should explore ways to make the experience smooth.

Change glyphs for positive and negative controls

There should be a distinguishing visual feature for glyphs representing positive and negative control data. I suggest a (smaller) dot with a horizontal bar for negative controls and a similarly small dot with a vertical bar for positive controls.

Improve visual functionality of filtering icon

The current filter button is highlighted on hover, but offers no further visual cues. It should:

stay highlighted while the filter drawer is open, or appear a different colour.
more importantly, appear red, badged, or otherwise marked whenever there is an active filter on the data.

Tasks:

Fix colour functionality for selected gene list. (currently only working for other filters.)
Reset red colour when filters are reset
Try out badges instead of colour. It's not salient enough for my taste currently...

Mongo schema

Mongo

var Screen = {
    _id: String, // short screen name < 8 char
    screen_name: String, // full screen name
    screen_desc: String,
    number_samples : Number
    features : Array // Array of features used in the screen
}

Screen has a one-to-many relationship to Sample.

var Sample = {
        _id: String, // name-plate-well
        screen: String, // Screen _id
        gene_name: String, // or gene_nane
        control_pos: Boolean, // true if sample is positive control
        control_neg: Boolean, // true if sample is negative control
        feature_vector: Array, // unstandardised features
        featute_vector_std: Array, // standardised features
        neighbours: Array, of _id // nearest neighbours (reference to samples)
        column: String, // '01', '02', '03', ...
        row: String, // 'A', 'B', 'C', ...
        pca_vector: Array,
        image: ObjectID, // Image ID
        cluster_member: Array, // array of cluster memberships for clustering scheme k=2..20
    }

var Image = {
        sample_id: String, // name-plate-well ENSURE A UNIQUE INDEX IS CREATED HERE!
        image_full: String, // BinData of image
        image_thumb: String // BinData of image (thumbnail)
    }

Screen has a one to one relationship with Images.

Documents will be stored in the microscopium database, Screen documents in screens collection and Sample documents in samples collection. Image documents in images collection.

Add axis labels to the lineplot and histogram

Add axis labels for both the histogram and lineplot. This is very straightforward, and I think I'll tack in onto the current PR I have in the works.

Also: https://xkcd.com/833/

Add filtering for specific samples

Add ability to filter on specific plate and co-ordinate.

Dropdown box for plate and dropdown box for co-ordinate. Add these to a list a'la gene filter.

Add React.js framework

React.js is a framework written by the Facebook team that's built for pages where lots of DOM manipulation happens (ie our UI). Two reasons I want to use it:

React uses a 'virtual DOM' model.. DOM changes are calculated in memory (as opposed to in the browser), and a diff is applied to the DOM. So you're only changing the DOM when you need to. Given the insane amounts of DOM manipulation we're doing with D3, this will be of great benefit to us.
Each part of the interface is compartmentalised into 'components' that have uniform methods to handle state changes, initialisation, etc. This will make code much more modular and thus easier to maintain and test.

Here's a great blog-post detailing how D3 plots can be cast as React components. I think the code looks much nicer this way, and if we're getting significant performance gains.. it's a no-brainer!

Idea: show plate shape for filter tool

Instead of having tickboxes for each row and column of the plate (which looks clunky), when the user wants to do this kind of filtering, show a near-full-screen overlay of a plate, where rows, columns, and individual wells are selectable and de-selectable.

Display gene/treatment name in image gallery

In addition to the sample ID, the gene/treatment name should appear in the image gallery. This can either be added to the current tooltip or as text near the image.

Add back and forwards functionality

This should be fairly easy to implement, it's just a matter of keeping a record of each sample id that has been visited. The back and forward arrows could be added to the top navbar using the FontAwesome left and right chevron arrows.

Ensure Neighbour Nebula display is appropriately resized for less-wide browsers

If the browser window is resized to be less than ~1280 pixels wide, the images get stacked under the PCA plot, which looks hideous and is not functional. Hopefully some kind of rescaling of the PCA plot is possible...

Add complete public dataset to web demo

@starcalibre had the excellent idea of finding a dataset that can be made public on microscopium.io, which now points to a VM that we own on the nectar research cloud.

@VCFG do you have or know of any datasets that would be suitable for public consumption?

Add node-webkit

Add node-webkit so the interface can be run as a self-contained desktop application.

Add loupe to image display

Now that we have thumb, large, and full image sizes, we should create a tool that lets users display the full size image on request, perhaps upon clicking on the large size.

Keyboard focus should go straight to the gene search box when filtering

When the filtering drawer is opened, the focus should go straight to the gene filtering box, rather than the user having to click on it. (Even if the user is after some different form of filtering, this would not interfere with a later click.)

Add grunt

Add grunt with (at least) the following plugins:

Karma - unit testing
JSLint - code quality checking
Watch - re-run server whenever changes made to file (very cool!)

Overlay screen single-z-score result onto the neighbour plot

Every screen we get has a z-score result, or a list of hits. It would be great to be able to see these on the neighbour plot / dimension-reduced scatterplot.

Use Django for backend

The more I think about, the more sense this makes to me. On top of the advantage of having access to the scipy stack in the backend, it'll be much easier for users to run the UI locally. All they'll need is Python, which should be installed on pretty much any bioinformaticians/biologists machine. You can't say the same for Node.js, which is still a JS hipster's thing (but an exciting one none the less IMO) :P

Coupled with an embedded database, everything would be able to run without any dependencies beyond Python! WHOAH! :D

Our backend isn't overly complicated, just a bunch of GET requests that query Mongo so it shouldn't be overly difficult. I'll just need to learn Django, which I'm happy to do because I'd like to add it to my skillset.

Add running instructions to README

e.g. how to make mongo in the correct schema, how to run and access on localhost.

Add gene/target names to image tooltip

Show the gene/treatment target when the mouse hovers over the images in the gallery. Right now we can just add this information to the alt tag but we might want to think of making some styled tooltips a'la the PCA plot.

Consider onboarding for new researchers, and how to convert exploration to discovery

We're quite familiar with the interface, but a new user might need onboarding: guidance about what the interface can do and how to use that to enable discovery. This is a tough problem but definitely something that we need to think about for release and for the paper.

Update PCA hover overlay with gene names

This would make it more intuitive to look at for the biologist exploring their dataset.

Turn on continuous integration with Travis-CI

Here's my blog post series about doing this for Python. I think it should be fairly generalisable for Javascript.

Improve initial loading screen

It currently takes just a little bit too long to load a dataset once a screen has been selected (or on page load when there is a single screen). It only just gets to the point where you wonder if the site is broken. Possible options:

include a progress bar
save the initial plot state somewhere (eg as part of the screen info) and display that.
optionally, add each data point to the plot as it is processed.
something else I haven't thought of.

Add parser for VCFG screen request format

Use this to exclude sample(s) from the UI.

Add results from clustering

Now that we're approaching the stage that we'll be incorporating the results from clustering the images, we need to think of how these should be presented. Ideas discussed in the meeting today were:

Finding centroid samples and displaying them. Some clustering algorithms will explicitly choose centroids (KMeans, DBSCAN) while others we'll need to find the centroid ourselves.
Average feature vectors.
Highlighting features that do interesting things across the dataset -- maybe high variance features? The nice thing about the new feature documents collection is we can associate such statistics to each feature so the UI can easily display these.

Add authentication framework to UI

As above. This is how the auth will work:

User documents will be stored in the DB in the below format. We'll set these up manually for the time being.

user = { userName: 'VCFG', password: [some hashed password], screens: ['LNC-10A-SP', 'LNC-231-SP'] }

The user logs in and a token is saved to their PC locally. The screens array is saved in the token (tokens are hashed so they can't be edited locally). All of the routes on the server are updated such that the user sends their token whenever a GET request is made to the server. If the user isn't logged in, or doesn't have the screen they're trying to access attached to their token -- the API won't serve the request.

We can set it up such that anyone can access the public dataset, but you need to login to access the others.

With respect to securing Mongo, three things need to be done (at a minimum) once it's installed on the nectar instance.

Add user authentication. By default Mongo DBs have full read/write access for anyone who connects. We add two users.. an admin with full read/write access and read only account. The Web API will connect to the DB with the read only account.
Restrict the list of allowed IPs so only the localhost can connect (ie the server itself).
Change the default port to anything but the default!

Idea - pre-render histograms

I was chatting with a mate at MelbJS and we came up with a better way to render the histograms. In the current approach, we need to pull 6k-10k floats from the database and render the histogram client client. This means a) we're waiting for all he floats to come back from the DB, b) the client needs to bin all the floats and render the histogram once the data comes back.

The idea is pre-compute all of the HTML/SVG/JS and save this as string. When a histogram for a feature is called, the string is pulled from the DB, parses this as markup and inserts it into the DOM. This will be much faster because there's no binning to be done, much less data to send in the AJAX request and it'll be nicely compressible because it'll be plain text with lots of redundancy.

The downside to this approach is you lose the data binding, and other information kept in D3 such as the scale that maps the SVG co-ordinate to a data value. We'll need this if we want to make the histogram clickable (see #50). But if we're using a uniform scale for the histogram (see #60) this shouldn't be a problem.

Decide on image and tooltip names

I suggested in #13 to use gene names as the PCA hover. However, that would not translate to screens using e.g. drugs or other compounds. So, perhaps we need to invent a new field, e.g. "treatment name". For RNAi knockdown, we could create this field using gene_id and the string "KD".

Add screen information and/or screen diagnostics

This could include information such as the screen description and diagnostic information such as inter/intra gene distances.

Highlight target replicates as well as sample neighbours

When a point on the plot is clicked, the point itself is highlighted along with its neighbouring points. Any replicate (or in the case of controls, all other controls) samples should also be highlighted. We'll need to decide how these should be coded -- by colour? A differently shaped point?

With respect to implementing this, this can be done in one of two ways:

Send a Mongo query that finds all samples with the same target as the point clicked.
Add a new field to each sample, say replicate_samples that stores these IDs.

Re-write plots as objects

This makes more cleaner and more efficient code, and allows the plots to be easily transferred to other projects.

Massively simplify and clean up UI

I think there's a clear winner between the different panels — the neighbour nebula. I think, in fact, that our entire UI could be folded into it, as follows:

the "Cluster cosmologist" can be a toggle on colouring of the dots in the very same plot, and a slider that is hidden most of the time (or at the very bottom).
the "Feature explorer" panels can be massively reduced and placed under the PCA plot
the "Screen info" panel can just be a tooltip or a drawer, represented by an (?) or somesuch icon

Once this is done, the splash page can be either a screen selection splash or (if the user only has access to a single screen) the nebula UI directly!

This would really simplify and clean up the experience for new users...

Use jQuery to generate image gallery

When rendering the nearest neighbours image gallery, jQuery (or some other method) should be used to generate the containers for the image. This will prevent a bunch of a repeated HTML, and allow us to display galleries of varying sizes.

Allow extraction of interaction state from URL using RESTful interface

It would be great for saving explorations for later, sharing results between researchers, etc.

e.g., after much exploring a user finds that they want to display a particular set of filtered genes, with a specific one being selected, and a specific feature being selected. They click a "bookmark view" button, which saves the URL:

http://play.microscopium.io/screen_name/treatment_id/?filtered=foo,bar,baz&feature=z

Entering this URL in a browser gets them the exact view they saved.

Awesomely, this could also enable things such as continuing an analysis on iPad on the train home. =D

Data upload feature

Currently there's not a super easy way for users to load their data for microscopium.

At some point, there probably needs to be a data upload feature built - without one it'll be pretty hard to get new users past the intial hurdle and on board. I've recently been doing some user testing for the CAVE's PREVIS data upload tool, and I'm wondering if there might be some lessons we can draw from it.

PREVIS details:

Platforms: Linux, MacOS
Built with: Node.js, python, potree, sharevol, ThreeJS, Angular5, Firebase

I don't see a repository for this on either Toan's Github or the MIVP Github page - it might be best just to email Toan directly if we want to know more about it.

Simplify the DB API

All of the DB routes have been written haphazardly as the UI has had new features added to it, but it'd be great to make the routes consistent and accept query parameters. This would be useful for letting users query the data directly for their own analysis. For example, let's say I want all of the standardised features for screens in MYORES belonging to plate number 12 in the first row (A) I could send the command:

/api/MYORES/feature_vector_std?plate=12&row=A

This route could be written like so:

app.get('/api/:screen/:field', handleGet(req, res))

handleGet(req, res) {
    var plate = req.query.plate;
    var row = req.query.row;
   // logic to query the database and send appropriate response
}

Allow filtering of dataset using a list of GeneIDs

To push #13 a bit further, some biologists may want to look at what their favourite genes are doing on the chart. I envision a text box in which a researcher can paste a list of gene names separated by spaces, tabs, commas, newlines, or some combination thereof, and those genes would be somehow highlighted on the plot. (A small output text box would display warnings for gene IDs not found.) The highlighting would persist as long as the text box was not cleared.

Both #13 and this issue relate to actual questions from the Marcelle team.

Allow selections on the histogram

Two related bits of functionality suggested here. I think both will be useful:

click on a point in the histogram; selection switches to image closest to that feature value. (ie if you click on the point corresponding to horizontal 2.45, unlikely any samples exactly match that, but find sample with closest value on that feature, e.g. 2.5782)
select a range on the histogram; multiple points are selected on the PCA plot, all of the ones in that feature range. Bonus points: the feature vector display shows multiple line plots, one per selected sample.

Add distance threshold for nearest neighbours

Some samples are way off by themselves and it doesn't really make sense to show their far-away neighbours. We suggested adding a distance threshold beyond which neighbours are not shown. This could perhaps be user-selected via a slider.