masonearles / 3dleafct Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 2.0 8.3 MB

Random forest segmentation of 3D leaf microCT images

Python 2.87% Jupyter Notebook 97.13%

3dleafct's People

Contributors

Stargazers

Watchers

Forkers

mattjenkins3

3dleafct's Issues

Error when loading the input_key file in Read from file mode

I just got this error:

Working on scan: 1 of 1

Traceback (most recent call last):
  File "MLmicroCT.py", line 1328, in <module>
    main()
  File "MLmicroCT.py", line 1256, in main
    filepath,grid_name,phase_name,label_name,Th_grid,Th_phase,gridphase_train_slices_subset,gridphase_test_slices_subset,label_train_slices_subset,label_test_slices_subset,image_process_bool,train_model_bool,full_stack_bool,post_process_bool,epid_value,bg_value,spongy_value,palisade_value,ias_value,vein_value,folder_name = openAndReadFile("../settings/"+filenames[j])
  File "MLmicroCT.py", line 835, in openAndReadFile
    ias_value = int(myFile.readline().rstrip('\n'))
ValueError: invalid literal for int() with base 10: 'test1'

I think it''s because the input_key.txt file hasn't been updated since you use multiple tissues. I added what was missing (adding spongy and veins) and it worked.

Calculating Sm and Ames/Vmes

I was just looking at the new automated leaf traits measurements. I do not agree with the current definition of Sm. I think it should be only the surface of the airspace. Adding the surfaces of the two mesophylls and the veins adds duplicates some surfaces, and the point of thresholding the veins is to remove them from surface estimation. However, measuring the airspace is prone to some errors.

Two ways to define the surface area of the mesophyll cells Sm would be (for which I have no idea to program in python):

(surface of airspace - surface of airspace touching the epidermis - surface of airspace touching the veins)
surface of the mesophyll touching the airspace

Also, I would like to see two Sm computed:

using your current definition of epidermal area; and
using the projected area (either the width of the image or the length of a line fit between the the leftmost and rightmost point on the straitest epidermis).

It would also be easy to compute Ames/Vmes. I define Vmes as the volume of the mesophyll (mesophyll + airspace), but for consistency with potential extrapolation from the literature, the should be also Vmes = (mesophyll + vein + airspace).

Keep up the great work guys!

Error in the LOAD AND ENCODE LABEL IMAGE VECTORS step

I got this error in the Read from file mode. I looked up the lines but have no idea what's happening.

***LOAD AND ENCODE LABEL IMAGE VECTORS***
Traceback (most recent call last):
  File "MLmicroCT.py", line 1328, in <module>
    main()
  File "MLmicroCT.py", line 1280, in main
    rf_transverse,FL_train,FL_test,Label_train,Label_test = train_model(gridrec_stack,phaserec_stack,label_stack,localthick_stack,gridphase_train_slices_subset,gridphase_test_slices_subset,label_train_slices_subset,label_test_slices_subset)
  File "MLmicroCT.py", line 709, in train_model
    Label_test = LoadLabelData(ls, label_test, "transverse")
  File "MLmicroCT.py", line 665, in LoadLabelData
    labelimg_in_rot_sub = labelimg_in_rot[sub_slices,:,:]
IndexError: index 12 is out of bounds for axis 0 with size 12

Check if packages have been installed

I'm finally using the command line version from the new repo on a new leaf. I'm also using a new computer, and many packages are missing. It would be nice to run a script that check if all the dependencies are installed. I think I had to install 4-5 packages.

Error when creating `prediction_transverse_prob_imgs`

I've run into this issue in the previous version but forgot about it. It is still in the latest notebook.

The original code produced this error:

prediction_transverse_prob_imgs = class_prediction_transverse_prob.reshape((
    -1,
    label_stack.shape[1],
    label_stack.shape[2],
    4),
    order="F")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-e940f9abb33f> in <module>()
      5     label_stack.shape[2],
      6     4),
----> 7     order="F")
      8 prediction_transverse_imgs = class_prediction_transverse.reshape((
      9     -1,

ValueError: cannot reshape array of size 8254710 into shape (387,1422,4)

I've changed the 4 to class_prediction_transverse_prob.shape[1], which in my case is 5, and it worked, but I'm not sure this is the right way to work around the error. I don't know if there are other places where values like this should be changed to a computed value in an object (I haven't run into any other errors like this).

Dump files to a new directory (check if present and create)

I like the way the updated code dumps files into directories. I added some lines to my version to check if the directory is present and to create it if it is not. In my case, I'll dump everything to a "ML" folder in my species-specific folder.

import os
if os.path.exists(filepath + "ML") == False:
    os.mkdir(filepath + "ML")

One could also create a name for this dumping directory and just call that name instead.

Units for the conversion from µm2 to a "human understandable" unit

Just found a typo in the leaf traits notebook (LeafTraits.ipynb). When converting the µm^2 values, it is written they are converted to m^2, but that's actually mm^2. For m^2, it's 10^12.

Include the 'filepath' argument throughout

Currently, for several io.imread or io.imsave, there is a full path written. This should be changed to filepath + 'name of file' for consistency and ease (as in the third block of the notebook).

"Hard to label" stacks (e.g. curved / irregular leaf, curved tape) - 3D instead of 2D training?

This is more an opened question and maybe future to-do, but my first try with the script was a hard to label image I feel. I ended up training the algorithm on 30 contiguous slices and testing it on 10 contiguous slices (this took almost 4 hours), using cells, air, veins, tape, and the outside of the leaf as labels. The training ended up being quite good, as below:

Training labelled image

Predicted

However, after running the trained algorithm on the whole stack, I still ended up not so good labelling, like these:

Tape leaking into the cells and veins

Cells leaking into the tape and outside of leaf as air

Mid-veins are ok

Just looking at the veins seems to be an appropriate result given the minimal effort I have to give to get this. So my question is, is it possible to train the algorithm in 3D (i.e. looking at all surrounding voxels) instead of training it on specific images. This would probably increase the computing time, but would benefit the labelling of harder to measure stacks. If you look at the mid-vein image above, there's a few interruptions in the vein in the middle. I don't know how to do this or how difficult it would be to implement it.

Create a vector/list of labelled stack positions and call this instead of hard coding it

I don't know how it is in your final program setup, but I have added this statement after opening the raw images:

label_stack_nb = [position of the labelled stacks]

This is then called either through:
label_stack_nb[3:7] for a range of values
or through
itemgetter(0,2,4)(label_stack_nb) for specific values (you to import this function: from operator import itemgetter).

I used it in my version, as I find there is a better consistency between the two sets of stacks (label vs. full).

Error in SAVING PREDICTED STACK

I got this error when saving the stacks (I remove some # to shorten the lines). Don't know why the 4th element is only 623 (623 is the width of the stack, and 2411 is the number of slices). I also ran it in the manual mode and still had the same error.

***SAVING PREDICTED STACK***
Post-processing...
100%|###| 2411/2411 [00:03<00:00, 693.44it/s]
100%|###| 2411/2411 [00:01<00:00, 1978.07it/s]
100%|###| 2411/2411 [00:01<00:00, 2216.60it/s]
100%|###| 623/623 [00:05<00:00, 118.02it/s]
100%|###| 2411/2411 [00:03<00:00, 704.88it/s]
Traceback (most recent call last):
  File "MLmicroCT.py", line 1328, in <module>
    main()
  File "MLmicroCT.py", line 1315, in main
    processed = final_smooth(step2,vein_value,spongy_value,palisade_value,epid_value,ias_value,bg_value)
  File "MLmicroCT.py", line 228, in final_smooth
    d = (tileB*c)
ValueError: operands could not be broadcast together with shapes (2411,163,623) (2411,164,623)

When trying to troubleshoot it manually from the command line code, I got trapped, so I went to run the post processing jupyter notebook. I think I found the error. I have a stack with an odd value for height (327). Numpy, when dividing that value in 2, rounds down. So, in the original code, you have:

# Define 3D array of distances from lower value of img.shape[1] to median value# Define  
rangeA = range(0,img3.shape[1]/2)
tileA = np.tile(rangeA,(img3.shape[2],img3.shape[0],1))
tileA = np.moveaxis(tileA,[0,1,2],[2,0,1])
tileB = np.flip(tileA,1)

Actually, tileB shouldn't have the same range as tileA if there if both are uneven, and I think this is accounted for elsewhere in the code. I made it work around like this:

rangeB = range(img3.shape[1]/2, img3.shape[1])
tileB = np.tile(rangeB,(img3.shape[2],img3.shape[0],1))
tileB = np.moveaxis(tileB,[0,1,2],[2,0,1])
tileB = np.flip(tileB, 1)

I think the post-processing now works, but didn't perform so well on my leaf. There were a lot of background intrusion into the airspace, mainly because the epidermis was thin in some splaces and I guess it was considered as dangling epidermis there. It did a good job on the veins.

Nice work on the postprocessing! Maybe there could be an option to what the user wants to choose, e.g. only correct for veins. Maybe my stack was just crappy! :D

Detection of the image resolution for later use in the leaf traits analysis

Auto-detecting the resolution beforehand would be really useful, as most often we define it in ImageJ. I have no idea how to do this, but here are potential ways to do it. Hopefully they work well with files saved in ImageJ.

https://stackoverflow.com/questions/21697645/how-to-extract-metadata-from-a-image-using-python
https://stackoverflow.com/questions/765396/exif-manipulation-library-for-python

Change the instructions from .rtf to Markdown

That would be nice!

Saving the predicted labelled image to 8-bit instead of 16-bit, and using the right number of labels

Currently, the predicted image, RFPredictCTStack_out, is saved to 16-bits using img_as_int. This is too much since there as so few labels, so img_as_ubyte (saving to 8-bits) should be used.

Further, since the number of labels could change from one user to another, and since img_as_ requires values between 0 and 1 (i.e. they must be divided by the total number of labels), there should be a call to the actual number of labels. I suggest the following, but the np.unique could actually be called within the img_as_ubyte function:

uniq_labs = np.unique(RFPredictCTStack_out[1])
io.imsave(filepath + 'FILENAME.tif', img_as_ubyte(RFPredictCTStack_out / len(uniq_labs)))