thank you for releasing this super helpful repo! I think it'd be great to be able to r

read images and annotation directly from s3 bucket about pathology-whole-slide-data HOT 9 CLOSED

rj678 commented on June 3, 2024

read images and annotation directly from s3 bucket

from pathology-whole-slide-data.

Comments (9)

rj678 commented on June 3, 2024 1

thanks so much for looking into this, and for your feedback - I'll follow up on all of your points and push new commits later this week to get your feedback

from pathology-whole-slide-data.

rj678 commented on June 3, 2024 1

thanks so much, Mart - that fixed it! I'll follow up on the feedback on the PR.

from pathology-whole-slide-data.

martvanrijthoven commented on June 3, 2024

Dear Rishi

Thank you for your kind words about this package!
Adding a new image backend is always great. I think something you propose would be very nice to add to this package.

Here are just some ideas that I had after I peeked into your branch:

I noticed that the current openslide backend and the backend for tifflslide have some redundancy. It would be great if we could resolve that a bit.
And I think we can add the S3 loading of the annotation files in a subclass of WholeSlideAnnotationFile such that it is a bit more general. Not sure yet how it should be implemented best.

One other point that I already would like to make is that it would be costly and inefficient (training time) if someone would train by reading directly from S3 via deep learning machines other than AWS services like SageMaker. So I think it would be good to clarify this somehow (maybe using a warning?).

Please feel free to open a pull request, and we can discuss this addition in more detail.

Best wishes,
Mart

from pathology-whole-slide-data.

rj678 commented on June 3, 2024

Hello Mart,

sorry for the delay in getting back on this. I've cleaned up the code a bit and created a draft PR, with TODOs based on your feedback. I'll follow up on the TODOs.

I added a notebook that can be run on colab where users can read images directly from the s3 bucket. I added a note on reading and training directly from s3 being slow. I was planning on using this s3 plugin for PyTorch for training: https://github.com/aws/amazon-s3-plugin-for-pytorch , which is described as a high-performance library and has been upstreamed into the torchdata package.

Thanks for being open to this contribution - looking forward to committing more code as per your standards so the PR gets accepted :)

from pathology-whole-slide-data.

martvanrijthoven commented on June 3, 2024

Dear Rishi

Thank you for your work!
Yes, the s3 plugin looks nice. Please let me know if you want me to review something. Looking forward to this nice addition.

Best wishes,
Mart

from pathology-whole-slide-data.

rj678 commented on June 3, 2024

thanks, Mart. I have a quick clarification about the backend for the annotation parser:

I don't have asap installed on my laptop (I use a conda env on WSL2). When I try to use the asap Image backend, I get the error as expected: Registrant is not found.... , but I am able to use the asap annotation parser: wsa = WholeSlideAnnotation(path_to_xml, parser='asap').

On Colab though, when I try to use the asap annotation parser, I get the error: Registrant 'asap' is not found in the register of class 'AnnotationParser' with registrant names ('wsa', 'mask', 'virtum-asap')

is there some library that the wholeslidedata library is checks for to use the asap annotation parser - once I ID that, I will install that on Colab as well,

thank you,

from pathology-whole-slide-data.

martvanrijthoven commented on June 3, 2024

Dear Rishi,
No additional software needs to be installed for the asap annotation parser.
I can not reproduce your issue on Colab. Could you please share the code that you have tried on Colab, Then i will look into it.

from pathology-whole-slide-data.

rj678 commented on June 3, 2024

Hi Mart, this notebook can be imported into Colab and has the error in the last cell:

https://github.com/rj678/pathology-whole-slide-data/blob/test_s3_read/notebooks/colab_s3_image_annotation.ipynb

thank you for helping me resolve this error

from pathology-whole-slide-data.

martvanrijthoven commented on June 3, 2024

I think you did not install boto3, which makes the import of the AsapAnnotationParser fail and therefore it can not find the parser.

I think, running the following should solve your problem:
!pip install boto3

Please let me know if you still have problems.

Best wishes,
Mart

from pathology-whole-slide-data.

read images and annotation directly from s3 bucket about pathology-whole-slide-data HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent