Light

broadinstitute / deepprofilerexperiments Goto Github PK

View Code? Open in Web Editor NEW

4.0 5.0 5.0 30.13 MB

Jupyter Notebook 99.97% Python 0.03%

deepprofilerexperiments's Introduction

DeepProfilerExperiments

This is a supplementary repository for DeepProfiler software and the related analysis pre-print.

Please see our DeepProfiler handbook for documentation.

In the folders ta-orf, bbbc022 and cdrp of this repository you can find Jupyter notebooks for downstream analysis and configuration files (folder config) to reproduce training experiments and run feature extraction afterwards, plus configuration files to run feature extraction with Cell Painting CNN or EfficientNet pre-trained on ImageNet dataset. Ground truth annotations are in data folders used in the publication.

profiling folder contains libraries and utils functions for downstream analysis notebooks.

bbbc021 contains Jupyter notebooks for downstream analysis of the BBBC021 dataset. This is legacy code.

deepprofilerexperiments's People

Contributors

Stargazers

Watchers

Forkers

shntnu vinhle169 mikeyecology cfredinh ektawb123

deepprofilerexperiments's Issues

pycytominer integration

In cytomining/pycytominer#78 I am working towards integrating DeepProfiler processing into pycytominer. Currently and by default, DeepProfiler outputs .npz files storing numpy arrays of single cell profiles. In cytomining/DeepProfiler#229 we discuss a potential update to the .npz file output to also include metadata information.

There are a couple of decision points that we need to make to move the integration forward, which will be partially driven by the goals in the DeepProfilerExperiments repo. In cytomining/DeepProfiler#229 (comment) I bring up two different points of consideration: 1) How to use index.csv and 2) Feature prefix style.

I think both of these decision points are relatively minor, and any pycytominer code will be flexible to handle multiple metadata options and enable a customizable feature prefix. The question about feature prefix is most directly related to what we think the default prefix should be (DP or DP_ are two options)

Additional topics

I think that these topics are more pressing than the first two listed above: Will the profiles be updated for each dataset to include the metadata .npz format? Or, will we proceed without recalculating? If we proceed without recalculating (which I think is the likely scenario), we need to settle on pycytominer strategy.

Strategy

I do not think that pycytominer should include code to parse plate, well, and site information from filenames. This is a very fragile way of storing these variables - I believe that they should come from an internal source or be stored in an external file that includes file path information pointing to files with corresponding metadata. The latter is also fragile (file names are mutable!), but not as fragile as the metadata-in-file name paradigm.

However, since we probably won't recompute profiles, we require a strategy to incorporate metadata from file names. Therefore, I propose that we take multiple pycytominer steps to integrate these metadata (instead of dealing with all of the processing internally in pycytominer).

The proposed workflow is as follows:

Ingest current .npz files in pycytominer
Extract out plate, well, and site from file name
Append these metadata to a pycytominer load_npz() output
Reingest this file with metadata back into pycytominer and proceed with standard downstream processing

I will proceed with this strategy for now, but please do suggest alternatives! We can always pivot strategies later on if this ends up being clunky or doesn't reduce code.

but are now using precision-based metrics, possibly Average Precision?
https://github.com/broadinstitute/DeepProfilerExperiments/blob/master/profiling/metrics.py

h/t to @gwaygenomics whose issue sent me here cytomining/cytominer-eval#17

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

broadinstitute / deepprofilerexperiments Goto Github PK

deepprofilerexperiments's Introduction

DeepProfilerExperiments

deepprofilerexperiments's People

Contributors

Stargazers

Watchers

Forkers

deepprofilerexperiments's Issues

pycytominer integration

Additional topics

Strategy

Re-compute pretrained features for Cell Painting datasets

Similarity / distance metrics for representations

Evaluation metrics for representations

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent