mikevoets / jama16-retina-replication Goto Github PK

JAMA 2016; 316(22) Replication Study

Home Page: https://doi.org/10.1371/journal.pone.0217541

License: MIT License

Python 76.66% Shell 23.34%

deep-learning deep-learning-algorithms jama retinal-fundus-photographs replication-study reproducibility retinopathy detection reproduction reproduction-study

jama16-retina-replication's People

Contributors

Stargazers

Watchers

jama16-retina-replication's Issues

.scale_normalize from lib/preprocess.py

Thank you for the repo Mike.

I am following the README instructions to evaluate on a custom data set, but in the first step I do not find scale_normalize in the current version of lib/preprocess.py, but I do see it in the oldest version. Should we be using _resize_and_center_fundus_all instead?

Thanks!

Zeroed out confusion matrix and AUC = 0 during training when trained with gradable images only

I'm observing the same issue reported in #6 when I train with gradable images only. When all eyepacs images are used this does not happen

about

I don't understand what needs to be done to prepare for the integrated network. Could you explain it in detail?

To create an ensemble of networks and evaluate the linear average of predictions, use the -lm parameter. To specify multiple models to evaluate as an ensemble, the model paths should be comma-separated or satisfy a regular expression. For example: -lm=./tmp/model-1,./tmp/model-2,./tmp/model-3 or -lm=./tmp/model-?.

Making trained models available

Hi,

Since all this training and replication takes a lot of resources and time, is it possible to make the learned models available to the public ?

Thanks

0.94 AUC not reproducible

Following your README step by step and creating several models, only one model has only achieved 0.76 AUC for EyePACS so far. It's not clear to me whether the reported AUC of 0.94 used a single model or an ensemble... I'll try an ensemble, but most of the models I am running end with 0.52~ AUC, which means they are likely not contributing much.

Are there any undetailed reasons for the code to not reproduce the paper results?
Maybe a different seed for the distribution of images in the folders?
I used the --only_gradable flag, it's also not clear whether your paper used all images or only the gradable ones.

Thank you!

Empty tf-records files

Hi,

I'm sorry about asking this, but I have no knowledge about how tf-records work, and therefore have no clue on how to debug this.

When starting the training, the datasets appear to be empty. After checking the folder data/eyepacs/bin2/train, it seems that tf-records files are empty. Would you have a hint on what could have gone wrong here?

The same goes for the dimensions.txt file I was mentioning in a previous issue, and also for those 0 and 1 folders, so this may be the root of that problem I had also..

Thank you very much for your help. Cheers,

Adrian

BatchNorm behavior during inference

This is bit more detailed but I wanted to check if the batch norm layer of the computation graph (1st picture below) matches the one you find when importing the model.

When zeroing in on the model graph loaded under inference mode, it seems that the batch norm doesn't use the running_mean and running_var during forward pass (see 1st graph below; I also checked by computing expected output and it didn't match).

But when I construct a simpler net, batch norm uses the running_mean and running_var during forward pass (see 2nd graph; I also checked by computing expected output).

I'm wondering if this is could be an error related to FusedBatchNorm vs FusedBatchNormV3; upon importing the model, it uses FusedBN yet when constructing a new model, it used FusedBNV3. I'm not sure what causes this difference in the type of class used to construct the BN layer.

Thanks.

question about eyepacs.sh

HI:

I'm still not understand after using eyepacs.sh.
I got bin2/pool, two folder, there is grade folder in pool (0/1/2/3/4), but only 0/1 in bin2.

which one can use for training and test ?
thank you so much

Is there any video tutorial of this execution available

Hi
Have you uploaded any video tutorial of executing this algorithm.
Pls share the link as it would be easy in following and executing it.
Thanks

Default MaxPoolingOp only supports NHWC.

When I run the file "evaluate.py" I get this error:
Default MaxPoolingOp only supports NHWC.
[[Node: max_pooling2d/MaxPool = MaxPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="VALID", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

I found that it should run with intel mkl, but not sure
Thanks!

Zeroed out confusion matrix and AUC = 0 during whole training

Hi there,
I am trying to reproduce your experiments in my university's lab and I'm observing some weird behaviour. After successfully following steps in readme regarding preprocessing (excluding Messidor for now) I run train.py without any parameters (as you describe it in the aforementioned readme). Cross entropy during training fluctuates naturally, but at the end of every epoch it turns out that Brier score, AUC and, what is more, confusion matrix - they are all filled with zeros :/ Due to the early stopping rule based on AUC training is then interrupted after epoch 10 (see attached log file). Any ideas what might have gone so wrong?

My environment:

OS: CentOS 7.4 x64
Python: 3.4^
Tensorflow: 1.5.0
GPU: NVIDIA Tesla K20m

^note: Python version is lower than specified in requirements, but I've noticed that the only Python feature forcing the scripts to require version 3.6 is f-string formatting, so I've modified all the print statements to use older {} formatting style.

Problem creating TFRecord files

Hi there,

First, thanks for this awesome resource. I am starting to try to reproduce your experiments on my computer (OS: ubuntu 16.04), and I'm having some trouble with the first step of the "Preprocessing before training" section on the readme.

Specifically, when I execute the command git submodule update --init I get the following on the terminal:

Cloning into 'create_tfrecords'...
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
fatal: clone of '[email protected]:mikevoets/create_tfrecords.git' into submodule path 'create_tfrecords' failed

Could you give me a hint about why could this be happening? I have limited knowledge on git, unfortunately..

Thanks!

Adrian

training Auxiliary classifier of Inception V3

Hello,
Since you are training the full network did you consider training the whole Inception-v3 along with the auxiliary classifier? I know Keras Inception V3 doesn't include the auxiliary classifier and also I understood from you implementation that you don't consider the auxiliary classifier.
Could that be a reason why replicating the JAMA paper is not possible so far? If not, did you manage to replicate the paper?

thank you and great work.

About the data

Hello, professor
I recently trying to reappear the your code, I met a problem, but before I kaggle downloaded DR data set, contains five seven training set and test set, but I put these data sets in eyepacs folder after running eyepacs. Sh extract only the training set, after running eyepacs. Sh is only a pool folder, inside a folder called "train" and a lot of pictures, but the train is empty inside.I only see one pool file when I run eyepacs.sh. Can you help me solve it?

About replication details

Hi.
I am also researching on DR topic. I focus the Google work also. I notice
that AUC for Messidor dataset reached around 0.59 in your work. I have trained all images from Kaggle( total 88702) without excluding any images and make a binary classification. And test my model on Messidor dataset. It reaches about 0.94 AUC score.

I use Resnet50 as the backbone of the network. Because some lesions like red small dots are extremely small, I set the size of the input image as 448*448. I refer some preprocessing procedures from first prize team from Kaggle. Many papers do not use EyePacs for testing. I suppose you may try using the data for training and validate only.

But my model achieved quite a poor performance on another dataset DIARETDB1. Would you mind testing your model on this dataset? Besides, have you ever tried detecting lesions in fundus images?

dimensions.txt empty

Hi Mike!

I finally managed to run smoothly the eyepacs.sh script, but now I encountered a different issue. When running train.py, I receive the following error:

line 50, in initialize_dataset
image_dim = [int(x) for x in image_dims[0].split('x')]
IndexError: list index out of range

I have been debugging this, stopping the execution before that line, and apparently the program reads the image dimensions from a text file called dimensions.txt inside data/eyepacs/bin2/train. Unfortunately, that text file seems to be empty. I've checked, and the same thing happens in the val and test folders. Maybe I've done sth wrong?

Thanks!

P.S.: off-topic, but this is more a comment than an issue. I first created a virtual environment with Python 3.5, but apparently your code needs Python 3.6 in order to understand the format functions the way you are using them. May I suggest adding to the requirements in the readme.md ''Python >3.6'?'

Regarding EyePACS test image true labels

In preprocess_eyepacs.py, the code uses a testLabels.csv file which is not available in the Kaggle dataset. There is a new csv file called sampleSubmission.csv instead which does not contain true labels for the images(all labels are class 0). But in the discussion tab of Kaggle, they have provided a file called retinopathy_solution.csv, which seems like true labels. But some comments are negative about the labels. So I am having doubts about the authenticity about the labels. Can you provide the true labels for the test images?

A preview of sampleSubmission.csv &

A preview of retinopathy_solution.csv

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM

Hi,

I'm sorry about asking this, but I have no idea to solve this problem when run train.py. the following is error message:

System information

OS Platform and Distribution: Windows 10
TensorFlow version: 1.2
CUDA/cuDNN version:
Cuda compilation tools, release 9.1 and cuDNN 8.0 with the NVIDIA Quadro M500M
GPU memory: 2G
Python version**: Python 3.6.2 |Anaconda 4.4.0 (64-bit)|

Error Message:

Found GPU! Using channels first as default image data format.
Traceback (most recent call last):
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1361, in _do_call
return fn(*args)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File ".\train.py", line 235, in
[global_step, mean_xentropy, train_op, update_brier])
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 905, in run
run_metadata_ptr)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1355, in _do_run
options, run_metadata)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\client\session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'mixed0/concat', defined at:
File ".\train.py", line 133, in
include_top=False, weights='imagenet', pooling='avg', input_tensor=x)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\applications\inception_v3.py", line 216, in InceptionV3
name='mixed0')
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 665, in concatenate
return Concatenate(axis=axis, **kwargs)(inputs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\engine\topology.py", line 258, in call
output = super(Layer, self).call(inputs, **kwargs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\layers\base.py", line 696, in call
outputs = self.call(inputs, *args, **kwargs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 174, in call
return self._merge_function(inputs)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\layers\merge.py", line 380, in _merge_function
return K.concatenate(inputs, axis=self.axis)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\keras_impl\keras\backend.py", line 2083, in concatenate
return array_ops.concat([to_dense(x) for x in tensors], axis)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1175, in concat
return gen_array_ops._concat_v2(values=values, axis=axis, name=name)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 777, in _concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 3271, in create_op
op_def=op_def)
File "D:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[32,256,35,35] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: mixed0/concat = ConcatV2[N=4, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](activation_5/Relu, activation_7/Relu, activation_10/Relu, activation_11/Relu, gradients/global_average_pooling2d/Mean_grad/Maximum_1/y)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[Node: GradientDescent/update/_2290 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4395_GradientDescent/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

about data

When I was running your code, I found that the data didn't match. Could you explain it to me?in eyepacs.sh.
bin2_0_cnt=48784
bin2_0_tr_cnt=40688
bin2_1_tr_cnt=16458
What is the relationship between 40688 and 48784?

mikevoets / jama16-retina-replication Goto Github PK

jama16-retina-replication's People

Contributors

Stargazers

Watchers

Forkers

jama16-retina-replication's Issues

System information

Error Message:

Recommend Projects

Recommend Topics

Recommend Org