Git Product home page Git Product logo

Comments (5)

gfinak avatar gfinak commented on July 29, 2024

The h5 files created contain your single cell data and along side all the gate definitions as well as the indices for all the cell events in those gates.
At the end you get a materialization of your workspace that you can do a lot more with long term, not just visualization. For example you can work with massive data sets without loading all the files into memory.
I suggest, just bite the bullet and do the conversion for the whole workspace and all the fcs files. Then pull out what you need. Keep the resulting converted gating set on hand for future use.
Can you say a bit more about your end goal? What are these filters and how are you using them?

from cytoml.

Close-your-eyes avatar Close-your-eyes commented on July 29, 2024

Thank you for the response. To be honest this is rather optimization than a bug. There are also ways to do things by hand. But somehow that’s what we want to avoid?!

With filters I meant the gate-filters (ellipsoid, rectangle, …) that can be applied to a flowFrame to filter for events (rows).

I will explain what I want to do from time to time:

I have 30 FCS files with cells from different mice (WT, mutant, …) that have been stained with equal antibody panels (10 markers if you want). Each file may contain 3-5x10^6 events. Most of events are irrelevant as I am only interested in a small subset, let’s say 2x10^5 events per file. This subset is gated in FlowJo for every fcs file and a respective .wsp exists.

I want to calculate a dimension reduction with the relevant events (tsne/umap) and annotate clusters (kmeans-, leiden-, louvain-algorithm). I want to find out if any population/cluster is diminished or elevated in some mice.

To use functions from R I need a concatenated data frame of compensated fluorescence intensities from each fcs file. So, the relevant data will end up in memory anyway. I could obtain those data directly from fcs files if I only knew the respective indices.

As I said above, with flowjo_to_gatingset I can get where I want but I wondered if there is a way to avoid having the h5-files written to disk.

from cytoml.

gfinak avatar gfinak commented on July 29, 2024

from cytoml.

Close-your-eyes avatar Close-your-eyes commented on July 29, 2024

Okay. Thank you.

from cytoml.

jacobpwagner avatar jacobpwagner commented on July 29, 2024

@Close-your-eyes, I know you might already be doing this, but once the gates are applied, you can pull a boolean mask for any subpopulation in any sample efficiently with gh_pop_get_indices or gh_pop_get_indices_mat. You could then save just those (or the numeric indices) out for later FCS filtering if all you are trying to do is avoid repeated loading of the GatingSet.

But one way or another, at least once you will need to load in the geometric gate definitions and apply them to the data to obtain the indices. And as Greg mentioned, the most efficient and scalable way to do that will be to let the data be managed as HDF5 instead of in memory.

However, after you've done that once, there's nothing stopping you from just saving out vectors/matrices of filter indices to apply to FCS files if you so choose. But again, as Greg said, for most cases the most efficient and scalable way to get those subsets and concatenate them will be using gh_pop_get_data/gs_pop_get_data on the GatingSet.

A basic sketch, just in case you haven't already been looking at this

library(flowCore)
library(flowWorkspace)

dataDir <- system.file("extdata",package="flowWorkspaceData")
gs_archive <- list.files(dataDir, pattern = "gs_bcell_auto",full = TRUE)
gs <- load_gs(gs_archive)

# Boolean mask
mask <- gh_pop_get_indices(gs[[1]], "lymph")
# Numeric indices
indices <- which(mask)

# Multiple populations (a matrix column for each)
mask_matrix <- gh_pop_get_indices_mat(gs[[1]], c("CD3", "CD19"))
# Converted to a list of indices for each pop
indices_multi_pops <- apply(mask_matrix, 2, which)

from cytoml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.