Git Product home page Git Product logo

deepprofilerexperiments's Introduction

DeepProfilerExperiments

This is a supplementary repository for DeepProfiler software and the related analysis pre-print.

Please see our DeepProfiler handbook for documentation.

In the folders ta-orf, bbbc022 and cdrp of this repository you can find Jupyter notebooks for downstream analysis and configuration files (folder config) to reproduce training experiments and run feature extraction afterwards, plus configuration files to run feature extraction with Cell Painting CNN or EfficientNet pre-trained on ImageNet dataset. Ground truth annotations are in data folders used in the publication.

profiling folder contains libraries and utils functions for downstream analysis notebooks.

bbbc021 contains Jupyter notebooks for downstream analysis of the BBBC021 dataset. This is legacy code.

deepprofilerexperiments's People

Contributors

arkkienkeli avatar jccaicedo avatar shntnu avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

deepprofilerexperiments's Issues

pycytominer integration

In cytomining/pycytominer#78 I am working towards integrating DeepProfiler processing into pycytominer. Currently and by default, DeepProfiler outputs .npz files storing numpy arrays of single cell profiles. In cytomining/DeepProfiler#229 we discuss a potential update to the .npz file output to also include metadata information.

There are a couple of decision points that we need to make to move the integration forward, which will be partially driven by the goals in the DeepProfilerExperiments repo. In cytomining/DeepProfiler#229 (comment) I bring up two different points of consideration: 1) How to use index.csv and 2) Feature prefix style.

I think both of these decision points are relatively minor, and any pycytominer code will be flexible to handle multiple metadata options and enable a customizable feature prefix. The question about feature prefix is most directly related to what we think the default prefix should be (DP or DP_ are two options)

Additional topics

I think that these topics are more pressing than the first two listed above: Will the profiles be updated for each dataset to include the metadata .npz format? Or, will we proceed without recalculating? If we proceed without recalculating (which I think is the likely scenario), we need to settle on pycytominer strategy.

Strategy

I do not think that pycytominer should include code to parse plate, well, and site information from filenames. This is a very fragile way of storing these variables - I believe that they should come from an internal source or be stored in an external file that includes file path information pointing to files with corresponding metadata. The latter is also fragile (file names are mutable!), but not as fragile as the metadata-in-file name paradigm.

However, since we probably won't recompute profiles, we require a strategy to incorporate metadata from file names. Therefore, I propose that we take multiple pycytominer steps to integrate these metadata (instead of dealing with all of the processing internally in pycytominer).

The proposed workflow is as follows:

  1. Ingest current .npz files in pycytominer
  2. Extract out plate, well, and site from file name
  3. Append these metadata to a pycytominer load_npz() output
  4. Reingest this file with metadata back into pycytominer and proceed with standard downstream processing

I will proceed with this strategy for now, but please do suggest alternatives! We can always pivot strategies later on if this ends up being clunky or doesn't reduce code.

Evaluation metrics for representations

@jccaicedo can you clarify the metric you are now using for evaluating representations, and how you are reporting it?

IIUC you were previously using this
https://github.com/broadinstitute/DeepProfilerExperiments/blob/master/profiling/quality.py

but are now using precision-based metrics, possibly Average Precision?
https://github.com/broadinstitute/DeepProfilerExperiments/blob/master/profiling/metrics.py

h/t to @gwaygenomics whose issue sent me here cytomining/cytominer-eval#17

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.