mikevoets / jama16-retina-replication Goto Github PK
View Code? Open in Web Editor NEWJAMA 2016; 316(22) Replication Study
Home Page: https://doi.org/10.1371/journal.pone.0217541
License: MIT License
JAMA 2016; 316(22) Replication Study
Home Page: https://doi.org/10.1371/journal.pone.0217541
License: MIT License
Thank you for the repo Mike.
I am following the README instructions to evaluate on a custom data set, but in the first step I do not find scale_normalize
in the current version of lib/preprocess.py, but I do see it in the oldest version. Should we be using _resize_and_center_fundus_all
instead?
Thanks!
I'm observing the same issue reported in #6 when I train with gradable images only. When all eyepacs images are used this does not happen
I don't understand what needs to be done to prepare for the integrated network. Could you explain it in detail?
To create an ensemble of networks and evaluate the linear average of predictions, use the -lm parameter. To specify multiple models to evaluate as an ensemble, the model paths should be comma-separated or satisfy a regular expression. For example: -lm=./tmp/model-1,./tmp/model-2,./tmp/model-3 or -lm=./tmp/model-?.
Hi,
Since all this training and replication takes a lot of resources and time, is it possible to make the learned models available to the public ?
Thanks
Following your README step by step and creating several models, only one model has only achieved 0.76 AUC for EyePACS so far. It's not clear to me whether the reported AUC of 0.94 used a single model or an ensemble... I'll try an ensemble, but most of the models I am running end with 0.52~ AUC, which means they are likely not contributing much.
Are there any undetailed reasons for the code to not reproduce the paper results?
Maybe a different seed for the distribution of images in the folders?
I used the --only_gradable
flag, it's also not clear whether your paper used all images or only the gradable ones.
Thank you!
Hi,
I'm sorry about asking this, but I have no knowledge about how tf-records work, and therefore have no clue on how to debug this.
When starting the training, the datasets appear to be empty. After checking the folder data/eyepacs/bin2/train
, it seems that tf-records files are empty. Would you have a hint on what could have gone wrong here?
The same goes for the dimensions.txt file I was mentioning in a previous issue, and also for those 0
and 1
folders, so this may be the root of that problem I had also..
Thank you very much for your help. Cheers,
Adrian
This is bit more detailed but I wanted to check if the batch norm layer of the computation graph (1st picture below) matches the one you find when importing the model.
When zeroing in on the model graph loaded under inference mode, it seems that the batch norm doesn't use the running_mean and running_var during forward pass (see 1st graph below; I also checked by computing expected output and it didn't match).
But when I construct a simpler net, batch norm uses the running_mean and running_var during forward pass (see 2nd graph; I also checked by computing expected output).
I'm wondering if this is could be an error related to FusedBatchNorm vs FusedBatchNormV3; upon importing the model, it uses FusedBN yet when constructing a new model, it used FusedBNV3. I'm not sure what causes this difference in the type of class used to construct the BN layer.
Thanks.
HI:
I'm still not understand after using eyepacs.sh.
I got bin2/pool, two folder, there is grade folder in pool (0/1/2/3/4), but only 0/1 in bin2.
which one can use for training and test ?
thank you so much
Hi
Have you uploaded any video tutorial of executing this algorithm.
Pls share the link as it would be easy in following and executing it.
Thanks
When I run the file "evaluate.py" I get this error:
Default MaxPoolingOp only supports NHWC.
[[Node: max_pooling2d/MaxPool = MaxPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
I found that it should run with intel mkl, but not sure
Thanks!
Hi there,
I am trying to reproduce your experiments in my university's lab and I'm observing some weird behaviour. After successfully following steps in readme regarding preprocessing (excluding Messidor for now) I run train.py
without any parameters (as you describe it in the aforementioned readme). Cross entropy during training fluctuates naturally, but at the end of every epoch it turns out that Brier score, AUC and, what is more, confusion matrix - they are all filled with zeros :/ Due to the early stopping rule based on AUC training is then interrupted after epoch 10 (see attached log file). Any ideas what might have gone so wrong?
My environment:
^note: Python version is lower than specified in requirements, but I've noticed that the only Python feature forcing the scripts to require version 3.6 is f-string formatting, so I've modified all the print statements to use older {} formatting style.
Hi there,
First, thanks for this awesome resource. I am starting to try to reproduce your experiments on my computer (OS: ubuntu 16.04), and I'm having some trouble with the first step of the "Preprocessing before training" section on the readme.
Specifically, when I execute the command git submodule update --init
I get the following on the terminal:
Cloning into 'create_tfrecords'...
Permission denied (publickey).
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:mikevoets/create_tfrecords.git' into submodule path 'create_tfrecords' failed
Could you give me a hint about why could this be happening? I have limited knowledge on git, unfortunately..
Thanks!
Adrian
Hello,
Since you are training the full network did you consider training the whole Inception-v3 along with the auxiliary classifier? I know Keras Inception V3 doesn't include the auxiliary classifier and also I understood from you implementation that you don't consider the auxiliary classifier.
Could that be a reason why replicating the JAMA paper is not possible so far? If not, did you manage to replicate the paper?
thank you and great work.
Hello, professor
I recently trying to reappear the your code, I met a problem, but before I kaggle downloaded DR data set, contains five seven training set and test set, but I put these data sets in eyepacs folder after running eyepacs. Sh extract only the training set, after running eyepacs. Sh is only a pool folder, inside a folder called "train" and a lot of pictures, but the train is empty inside.I only see one pool file when I run eyepacs.sh. Can you help me solve it?
Hi.
I am also researching on DR topic. I focus the Google work also. I notice
that AUC for Messidor dataset reached around 0.59 in your work. I have trained all images from Kaggle( total 88702) without excluding any images and make a binary classification. And test my model on Messidor dataset. It reaches about 0.94 AUC score.
I use Resnet50 as the backbone of the network. Because some lesions like red small dots are extremely small, I set the size of the input image as 448*448. I refer some preprocessing procedures from first prize team from Kaggle. Many papers do not use EyePacs for testing. I suppose you may try using the data for training and validate only.
But my model achieved quite a poor performance on another dataset DIARETDB1. Would you mind testing your model on this dataset? Besides, have you ever tried detecting lesions in fundus images?
Hi Mike!
I finally managed to run smoothly the eyepacs.sh script, but now I encountered a different issue. When running train.py
, I receive the following error:
line 50, in initialize_dataset
image_dim = [int(x) for x in image_dims[0].split('x')]
IndexError: list index out of range
I have been debugging this, stopping the execution before that line, and apparently the program reads the image dimensions from a text file called dimensions.txt
inside data/eyepacs/bin2/train
. Unfortunately, that text file seems to be empty. I've checked, and the same thing happens in the val
and test
folders. Maybe I've done sth wrong?
Thanks!
P.S.: off-topic, but this is more a comment than an issue. I first created a virtual environment with Python 3.5, but apparently your code needs Python 3.6 in order to understand the format
functions the way you are using them. May I suggest adding to the requirements in the readme.md
''Python >3.6'?'
In preprocess_eyepacs.py, the code uses a testLabels.csv
file which is not available in the Kaggle dataset. There is a new csv file called sampleSubmission.csv instead which does not contain true labels for the images(all labels are class 0). But in the discussion tab of Kaggle, they have provided a file called retinopathy_solution.csv, which seems like true labels. But some comments are negative about the labels. So I am having doubts about the authenticity about the labels. Can you provide the true labels for the test images?
A preview of sampleSubmission.csv
&
A preview of retinopathy_solution.csv
Hi,
I'm sorry about asking this, but I have no idea to solve this problem when run train.py. the following is error message:
Found GPU! Using channels first as default image data format.
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call
return fn(*args)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".\train.py", line 235, in
[global_step, mean_xentropy, train_op, update_brier])
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
run_metadata_ptr)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run
options, run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Caused by op 'mixed0/concat', defined at:
File ".\train.py", line 133, in
include_top=False, weights='imagenet', pooling='avg', input_tensor=x)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\applications\inception_v3.py", line 216, in InceptionV3
name='mixed0')
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 665, in concatenate
return Concatenate(axis=axis, **kwargs)(inputs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\engine\topology.py", line 258, in call
output = super(Layer, self).call(inputs, **kwargs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 696, in call
outputs = self.call(inputs, *args, **kwargs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 174, in call
return self._merge_function(inputs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 380, in _merge_function
return K.concatenate(inputs, axis=self.axis)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\backend.py", line 2083, in concatenate
return array_ops.concat([to_dense(x) for x in tensors], axis)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1175, in concat
return gen_array_ops._concat_v2(values=values, axis=axis, name=name)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 777, in _concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op
op_def=op_def)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
When I was running your code, I found that the data didn't match. Could you explain it to me?in eyepacs.sh.
bin2_0_cnt=48784
bin2_0_tr_cnt=40688
bin2_1_tr_cnt=16458
What is the relationship between 40688 and 48784?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.