Hi 👋🏻 I'm an independent cloud architect and software alchemist.
I help build, operate, and manage software systems. I wrote my first line of code in 1989 and have over 25 years' experience shipping to production.
Want some help on a short- or long-term project? I'm available! My clients range from single persons to Fortune 50 enterprises. Check out my site for more: https://www.redwoodconsulting.io/
Some projects outside my client work:
DeepCell AI cellular segmentation for cancer research. [repo]
I'm working with the DiMi Lab to deploy DeepCell imaging tools on Google Cloud. It's a zero-to-one effort running DeepCell on the cloud in their target environment.
DeepCell was developed by the Van Valen Lab at Caltech.
I'm writing a tool for You Need A Budget to help my personal workflow. I want to mark shared transactions as split to a certain category, but it's rather a hassle to do it by hand (especially on mobile devices). I don't intend to truly productionize this but might do. It's a web app written in Kotlin transpiled to Javascript, because, why not?
I'm a Google Cloud Expert, and hold a 2nd degree black belt in Seido Karate.
Larger data testing is tedious because post-processing is Hella Slow™. It's 8min or more for 1.3 GB inputs.
Note that infrastructure doesn’t seem to make a big difference for post-processing time. And GPU is not used at all during this phase (per observation of monitoring charts + knowledge of implementation).
This represents post-processing time broken down by machine type, GPU (or not), and input size. Note that post-processing doesn't vary too much.
It would be really nice if we didn't have to wait for this.
(1) Skip post-processing in benchmarks.
Note in benchmark data if post-processing was run.
The output is a bit meaningless in terms of correctness.
(2) Speed up post-processing. #28
Option 1 could be something like: skip the post-processing by passing in no-op function as the postprocessing_fn in constructor to Application object (maybe need to create a subclass to Mesmer class to override constructor).
Relating to #94 we need to determine if CPU prediction is affected by 1st vs subsequent runs.
Task: run the ~230 MB sample through the benchmark, on n1-standard-8 no GPU batch size 16, then restart kernel (NOT make a new instance) & do it again.
From commit history it looks like the dataset may have been replaced with the tissue_net dataset. The expected hash values don't match, but this could be just the naming inside the .npz file.
Objective of this work: determine the difference between the old commit data (which is still available on s3 as of 2023-11-17 at least) and the newly available tissue net data.
When running the e2e benchmark notebook on Vertex AI, there was a kernel warning:
2023-12-03 07:19:17.937327: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/jupyter/.local/lib/python3.10/site-packages/cv2/../../lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-12-03 07:19:17.937385: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-12-03 07:19:17.937413: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (72a01191f8f9): /proc/driver/nvidia/version does not exist
I think this is because we're using a TF 2.10 kernel but have installed TF 2.8 (DeepCell's dependency).
If the kernel is relevant: how can we fix this?
If the kernel is irrelevant: can we use something different? Like a basic python kernel?
I'm not sure how much to worry about this– perhaps it means we (and/or DeepCell??) aren't using modern Vertex AI kernels optimally…
As a user I can: go to github repo, download it to local, get ipython notebook, get sample data, upload notebook to test env, (vertex ai), config info (instance types/size, GPUs, ...) to verify. (Part of test is to figure out config parameters)
Notebook that runs prediction on parameterized input file
def test_zero_image_one_mask():
"""Test reconstruction with an image of all zeros and a mask that's not"""
result = reconstruction(np.zeros((10, 10)), np.ones((10, 10)))
> assert_array_almost_equal(result, 0)
E AssertionError:
E Arrays are not almost equal to 6 decimals
E
E Mismatched elements: 100 / 100 (100%)
E Max absolute difference: 1.
E Max relative difference: inf
E x: array([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
E [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
E [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],...
E y: array(0)
test_reconstruction.py:113: AssertionError
I'm surprised to see max abs = 1 and max rel = inf, also it's 100% different so something weird is happening.
The test test_two_image_peaks asserts that out.dtype == _supported_float_type(mask.dtype). Meanwhile the current reconstruct implementation indeed creates the result image as a float, even if it was ints to begin with.
I'm not sure we need to (always?) do this. The core of the algorithm is to adjust to the neighborhood max. The max can't have more precision than any of the starting numbers. And the max can't be capped to more precision than the mask precision. So if ints are masking ints, why not have int results?
However floats masked by ints are floats, and arguably ints masked by floats should be floats. for example 10 mask 0.51 should be 5.1 not 5. (Right?)
The current behavior is to always return floats. This is undesirable for performance as it precludes updating in-place. I wonder if we can simply ship this as new behavior. Downstream usages could be affected if they assume floats and start getting ints. Can we control via "Yet Another Parameter"™️ ?
The model file is relatively large (100MB). Cache the download to disk to avoid refetching. Also, the notebook doesn't support the model download in the first place 😬
Use this gs uri: gs://davids-genomics-data-public/cellular-segmentation/deep-cell/vanvalenlab-tf-model-multiplex-downloaded-20230706/MultiplexSegmentation.tgz
were generated with a previous convention, following the DeepCell API (one file == 4D array starting with num samples).
The other samples are 3D: x, y, channel. (One array == one input)
This creates problems because worksheets & people don't know which shape to expect, therefore when they need to do a new-axis, or not.
We should normalize one way or the other. My general thinking is that a thing is a single thing, until it is a group of things– which we could represent as either a list of things, or a numpy vector of the things. In other words, the shape of a single data example is not a list.
To support #10 , let's at least add DeepCell's Mesmer data (multiplex_tissue) to the repo. This gives us an easily accessible starting point for test data (albeit quite small at 512 x 512).
The persistent disk is a relatively small expenditure ($0.14 cents daily, for a forgotten 100 GB persistent disk). We probably don't need to worry too much about this.
Still, it would be nice to know if we're vastly over-provisioned. Let's try running a benchmark with ~10GB persistent disk, or 50GB. Use one of the larger files.
Also consider simply not caring for now, assuming DevOps processes & cost monitoring would find the issue. (Really though?) It's still just a few cents.
The larger sample data I've been using Xenium_FFPE_Human_Breast_Cancer_Rep1_if_image.tif was obtained from 10X genomics, but isn't super-duper easy to fetch especially not programmatically.
It would be nice if we had a comparable ~500MB sample available.
The optimization only supported the dilation method: finding local maxima. We need to support erosion (local minima). It's basically a question of flipping the min/max signs.
The cython fast-hybrid implementation is a bit "raw", requiring manual cythonify in the right directory etc.
The file should be repackaged into a proper module in deepcell-imaging, with appropriate setup.py etc. so that pip knows to build the extension as part of installation.
This could also be accomplished by publishing the fast-hybrid implementation as its own library, and including that as a dependency to this repo.