juliandewit / kaggle_ndsb2017 Goto Github PK

View Code? Open in Web Editor NEW

621.0 621.0 293.0 73 KB

Kaggle datascience bowl 2017

License: MIT License

Python 100.00%

deep-learning kaggle keras machine-learning medical-imaging tensorflow

kaggle_ndsb2017's People

Contributors

Stargazers

Watchers

Forkers

chelovekhe bityangke thatfreesky tinyjie dearkafka dewei36 sam186 rubenszimbres xiaotie tglines walterreade allensmile yushu-liu yydxlv ykwon0407 whuguozili wuqixiaobai allenmao xuanheiiis bishesh benjamesbabala khan-faiz soumendas directorscut82 vkx-11 jdc08161063 goyalrajni szupzp sagarchaturvedi1 sunjieee arnocandel chingu163 wycharry hejunbok trigrass2 zhangxujinsh researchase liangzehai falconzyx kongshuchen hendra-herviawan foxet lyk125 gustavodemari jzhanglab jidiazhernandez ieee820 hfcf20061 digiflux potis sunshyam qilicun shartoo stephenbin maloletnik zhangt369 utayao tzzhang10 gclouding xiaofengqing qingdatascience tifftliu zhangxiaolin5213 liwenliang1001 jenifferwuucla chzblych summerdaway caitsithx tongli12 zengxia anthonylife pustar kkfuwfny weidezhang xuetsing jiandanjinxin zhudaoruyi cyranochen gokul180288 jeperez satpreetsingh kenhollandwhy rajat1994 shagru chenrongjing naejin amitkumarj441 steccami bkong1990 yib2 davidfumo myulia zhangyang5511 markjacksonfishing pebblecoin yafuilee deercoder qianwangn guyucowboy rafemcbeth

kaggle_ndsb2017's Issues

About ur documentation ....and codes for the lung nodule detection

hiii julian

I am little bit confused about the ndsb dataset. you gave the two solutions for the preprocessing of the dataset.one is for luna dataset and ndsb dataset. from where ndsb dataset you got? also in the following link you showed that, you create the 3D chunks of the scans.How to create it exactly?

http://juliandewit.github.io/kaggle-ndsb2017/
but in the LUNA 16 challenge they also given the documentation for the extraction of the patches. why are you going for the 3D chunks of the scans rather than the patches obtained from the luna dataset???

Thank you!!!

ValueError: bad axis2 argument to swapaxes

i am getting error in step1_preprocess_luna16.py

Computer: DESKTOP-AVK4MS4
0 patient: 1.3.6.1.4.1.14519.5.2.1.6279.6001.105756658031515062000744821260
Img array: (121, 512, 512)
Annos: 0
Origin (x,y,z): [-198.100006 -195. -335.209991]
Spacing (x,y,z): [ 0.76171899 0.76171899 2.5 ]
Rescale: [ 0.76171899 0.76171899 2.5 ]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
(390, 390)
1 patient: 1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492
Img array: (119, 512, 512)
Annos: 1
Origin (x,y,z): [-182.5 -190. -313.75]
Spacing (x,y,z): [ 0.74218798 0.74218798 2.5 ]
Rescale: [ 0.74218798 0.74218798 2.5 ]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
(380, 380)
Node org (x,y,z,diam): (-100.57, 67.26, -231.82, 6.44)
Node tra (x,y,z,diam): (110.0, 347.0, 33.0)
Traceback (most recent call last):
File "step1_preprocess_luna16.py", line 718, in
process_pos_annotations_patient2()
File "step1_preprocess_luna16.py", line 642, in process_pos_annotations_patient2
process_pos_annotations_patient(src_path, patient_id)
File "step1_preprocess_luna16.py", line 280, in process_pos_annotations_patient
center_float_percent = center_float_rescaled / patient_imgs.swapaxes(0,2).shape
ValueError: bad axis2 argument to swapaxes

How to cite your work in paper? Thanks!

Hi, juliandewit.
Your work is great! How to cite your work in paper? Which papers should be added to reference?
Thanks
Gu Yu

Cam we get the CT viewer?

Hey @juliandewit. Thank you so much for sharing such great knowledge. I wonder if we could get the CT viewer you used while in the competition.
I know that you may have not published it with this repo for some good reason, but still, please share the code for CT viewer. I actually wanted to visualize the data and see the output in the CT viewer.
I hope you'll understand.
Thank you.

Issue about the negative data and label

Hi, julian,
I am trying to build a nodule detector based on you job, and thanks very much for your sharing.
May I ask some questions:

You use several types of training set:
labels from lidc, v2 from luna16, luna16 false positive, ndsb and non-lung tissue edge.
So, on the train stage, except the non-lung tissue edge, the others are all positive sample? and the label for the positive sample is YES(to say if the cube contains a nodule) for positive samples, and NO for non-lung tissue edge, right?
Another question is: When predicting, a 646464 cube is get to the net, the result is if the cub contains a nodule and the probability?
Any information will be welcomed!

Why 100 is added in the function dice_coef in step2_train_mass_segmenter.py

Hi,juliandewit
Your source code helps me lot. I have another question to ask you. I found the function dice_coef at line 207 in step2_train_mass_segmenter.The function return (2. * intersection + 100) / (K.sum(y_true_f) + K.sum(y_pred_f) + 100).
The definition of dice coef do not contain 100. It seems to be (2. * intersection ) / (K.sum(y_true_f) + K.sum(y_pred_f) )
Why did you add 100 at both numerator and denominator? Thanks for you help.
Gu Yu

question about the batch size in step 2 detector

The code is that:
model.fit_generator(train_gen, len(train_files) / 1, 12, validation_data=holdout_gen, nb_val_samples=len(holdout_files) / 1, callbacks=[checkpoint, checkpoint_fixed_name, learnrate_scheduler])
Url is:
https://github.com/juliandewit/kaggle_ndsb2017/blob/master/step2_train_nodule_detector.py#L387

Why to divide 1? Maybe should divide the batch size?

ValueError: need at least one array to concatenate

I am getting this error when i ran STEP1B_PREPROCESS_MAKE_TRAIN_CUBES.PY Error has thrown for some luna16_manual_labels files. i don't understand what's happening in some specific CSV files.
i have seen few .png images in luna16_train_cubes_manual folder.

ERROR SCREENSHOT
Computer: DESKTOP-AVK4MS4
1.3.6.1.4.1.14519.5.2.1.6279.6001.128881800399702510818644205032
0 1.3.6.1.4.1.14519.5.2.1.6279.6001.128881800399702510818644205032 2
1.3.6.1.4.1.14519.5.2.1.6279.6001.160216916075817913953530562493
1 1.3.6.1.4.1.14519.5.2.1.6279.6001.160216916075817913953530562493 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.161002239822118346732951898613
1.3.6.1.4.1.14519.5.2.1.6279.6001.167919147233131417984739058859
3 1.3.6.1.4.1.14519.5.2.1.6279.6001.167919147233131417984739058859 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.170825539570536865106681134236
4 1.3.6.1.4.1.14519.5.2.1.6279.6001.170825539570536865106681134236 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.172845185165807139298420209778
5 1.3.6.1.4.1.14519.5.2.1.6279.6001.172845185165807139298420209778 3
1.3.6.1.4.1.14519.5.2.1.6279.6001.173931884906244951746140865701
6 1.3.6.1.4.1.14519.5.2.1.6279.6001.173931884906244951746140865701 2
1.3.6.1.4.1.14519.5.2.1.6279.6001.227968442353440630355230778531
7 1.3.6.1.4.1.14519.5.2.1.6279.6001.227968442353440630355230778531 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.230491296081537726468075344411
8 1.3.6.1.4.1.14519.5.2.1.6279.6001.230491296081537726468075344411 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.241717018262666382493757419144
9 1.3.6.1.4.1.14519.5.2.1.6279.6001.241717018262666382493757419144 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.246225645401227472829175288633
Traceback (most recent call last):
File "step1b_preprocess_make_train_cubes.py", line 271, in
make_pos_annotation_images_manual()
File "step1b_preprocess_make_train_cubes.py", line 139, in make_pos_annotation_images_manual
images = helpers.load_patient_images(patient_id, settings.LUNA16_EXTRACTED_IMAGE_DIR, "*" + CUBE_IMGTYPE_SRC + ".png")
File "C:\Users\Sangryal\Downloads\sathya\helpers.py", line 78, in load_patient_images
res = numpy.vstack(images)
File "C:\Users\Sangryal\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 234, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

what is the value LUNA_16_TRAIN_DIR2D2?

what is the value LUNA_16_TRAIN_DIR2D2? I can not find in setting.py file

where are the data being stored??

Hi Julina,

Congratulation on doing such a great work. I just have few question about the directories where you stored the data. In 'setting.py', I see u are referring to following locations:
BASE_DIR_SSD
BASE_DIR
EXTRA_DATA_DIR
NDSB3_RAW_SRC_DIR
LUNA16_RAW_SRC_DIR

I am kind of confused which folder contains what; where am i supposed to store the ndsb data and where to store the LUNA16 dataset.

Thank you so much.

Training nodule detector is slow. 16 hours for an epoch

Hi,

I'm running step2_train_nodule_detector.py in Linux machine with TitanX GPU.
It's taking close to 16 hours for completing a single epoch, where as in Readme.MD it's mentioned the total time for 12 epochs is 8 hours. I'm using anaconda2 python environment.

Can you please help me with this ? What am I missing ?

ZeroDivisionError: division by zero with submission_preds_list in step 4

all_preds is null after vstack while submission_preds_list'count is 1000.

if submission:
all_preds = numpy.vstack(submission_preds_list)
avg_preds = numpy.average(all_preds, axis=0)

throw error:
ZeroDivisionError: division by zero

regarding the error in step2_train_nodule_detector.py

I run the script step2_nodule_detector.py in which for, model 2 on luna16 annotations + ndsb pos annotations I got following error,

File "", line 2, in
train(train_full_set=True, load_weights_path=None, ndsb3_holdout=0, manual_labels=True, model_name="luna_posnegndsb_v1", fold_count=2)

File "", line 12, in train
train_files, holdout_files = get_train_holdout_files(train_percentage=80, ndsb3_holdout=ndsb3_holdout, manual_labels=manual_labels, full_luna_set=train_full_set, fold_count=fold_count)

File "", line 113, in get_train_holdout_files
pos_sample_path = pos_samples[pos_idx]

IndexError: list index out of range
Thanks in advance!
Please reply as soon as possible.

What's differents between the models extracted from trained_models.rar?

I downloaded the trained_models.rar and decompressed it.Several files were extracted there.But I don't know the exact meaning and what they stand for.
Could you tell me?

Where does the data in resource.rar come from?

Hi, julian,

Your work is great. Thanks for sharing.

I download the resource.rar and there are several folders including different data. As far as I know, the data in the folder 'luna16_annotations' is from LUNA16 and LIDC-IDRI. How about other folders?

Thanks
Liu Peng

About your code in step1_preprocess_luna16.py

Hi Julian,

This script is generating so many csv files. Can you please tell me what exactly does each of these functions generate?
process_lidc_annotations(only_patient=None, agreement_threshold=0)
1.process_pos_annotations_patient2()
2.process_excluded_annotations_patients(only_patient=None)
3.process_luna_candidates_patients(only_patient_id=None)
4.process_auto_candidates_patients()

Thank You.

Accuracy calculation

in your submission, only the probability of having cancer is calculated. How would you calculate the accuracy of your submission??

extracting features from ur trained network

Hi Julian,

I am more interested in your learned features rather than predicting the final outcome through the network. I was wodnering if it would possbile to extract features from the intermediate layers of ur 3D network? Or do you by any chance know any of these trained networks (preferably trained on 3D images) that can be used easily for feature extraction.

Thanks for your help in advance,
laleh

Question about skipping the cube in step3_predict_nodules.py?

Hi, Julian,

In the function predict_cubes() of step3_predict_nodules.py, you try to predict all the 323232 cubes of each patient. You have a cube skipping condition at line 266:

 if cube_mask.sum() < 2000:
         skipped_count += 1

Would you like to explain why?

Thanks
tjliupeng

batchnorm order in CNN

Thank you so much for your sharing.
I have a question in batchnorm layer.
In step2_train_mass_segmenter.py , from line 314, it's the architecture of 2d u-net. In each block, layers go like this: input -> batchnom -> conv1 -> relu -> conv2 -> relu -> pooling -> output
In other papers, traditional batch norm layers are put between conv layers and relu, in order to avoid gradient explosion. So I wonder why you put batchnorm before conv, do you have some theories to support this order? Or is it a new trick/tip in CNN?
Of course I know there's no "correct" position for every layer, and your work preforms quite well. Congratulations for the challenge!

Getting confused about where to keep the dataset

Can anyone help me with the links to download the dataset required to run this repo?

step

How can draw ROC diagram?

Dear Julian,
How can draw ROC diagram?

How to do nodule detect job with step3_predict_nodules.py?

Hello Juliandewit.
I have some dicom files about lung cancer paitients, not the kaggle ones.I run step1_preprocess_nsdb.py and step3.py with dicom files in the directory NDSB3_RAW_SRC_DIR in the setting.py, extracted resources.rar in ./resources/ directorty and extracted train_models.rar in ./models/
directorty.

But I got 9 empty folders in the NDSB3_NODULE_DETECTION_DIR directory.Following what you said in the README.MD,I thought I can get what I want.But now I am a little confused and reading the code.

Could you please tell me how to use the trained models to do the nodule detect job on other dicom files?
Thank you.

all nodules at (0,0,0) shown in the result

I run step3.py separately.

I downloaded the code and tried to run it on my own computer. (also with trained model and LUNA16 training data as test set) However, there are something wrong in the result --- all nodules are detected at (0,0,0) , and the "diameter_mm"s are all negative numbers.

I tried to debug step3.py and find something: at line 60-62, "center_x","center_y","center_z" equals to 0.0 no mater what the input image is.

How could I fix this problem? Waiting for your reply...

step2_train_nodule_detector.py ValueError: output of generator should be a tuple

Why are the coordinates of predict results are all zero running step3_predict _nodules.py?

Hi, Julian,

After training, I run step3_predict _nodules.py and the result is odd: all the x, y, z coordinates are 0.0, and diameter_mm column are all negative.

What's the possible problem?

Thanks

I have some problems with the csv.

Hi, julian,
thank you for your sharing.
I am reading and running your code. I have some problems with the csv files in the resource folder. You had answered similar problems. You said that the luna16_falsepos_labels folder was automatically generated. Can you tell me how?
thank you very much.

diameter fields in resources/ndsb3_manual_label

Hi Julian,

Take an example in resources/ndsb3_manual_label
::::::::::::::
id,x,y,z,d,mal,dmm
0,0.7380484,0.4426079,0.4596774,0.08382452,1,0
0,0.6142763,0.630854,0.3790323,0.0589391,1,0
0,0.7439424,0.6382002,0.3790323,0.07203662,1,0
0,0.6660118,0.630854,0.3360215,0.09299278,1,0

There are two fields: d, and dmm fields. Are they (predicted) diameters? Or they are malscore?

AttributeError: 'NoneType' object has no attribute 'reshape' in helpers.py

I am facing error, when i run this function in step1_preprocess_luna16.py
if True:
process_pos_annotations_patient2()
process_excluded_annotations_patients(only_patient=None)

error:

File "C:\Users\Sangryal\Downloads\sathya\kaggle_ndsb2017\kaggle_ndsb2017-master\helpers.py", line 77, in load_patient_images
images = [im.reshape((1,) + im.shape) for im in images]
File "C:\Users\Sangryal\Downloads\sathya\kaggle_ndsb2017\kaggle_ndsb2017-master\helpers.py", line 77, in
images = [im.reshape((1,) + im.shape) for im in images]
AttributeError: 'NoneType' object has no attribute 'reshape'

4 hours cost per epoch in step2_train_nodule_detector.py in my computer. How to solve that?

Hi, juliandewit:
Great work! Thank you for sharing your code.
I have a question to ask you. When I ran the step2_train_nodule_detector.py, I found that 4 hours were used for each epoch. It just waited for 3 hours and ran model.fit_generator for 1 hour. The gpu in my computer is GTX 980 Ti G1. How can I cut down the running time?
Thanks!
Gu Yu

UnboundLocalError: local variable 'extension' referenced before assignment in function 'combine_nodule_predictions'

Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.2\helpers\pydev\pydevd.py", line 1596, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.2\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/Python/code/Kaggle/Bowl2017/place2-2/step4_train_submissions.py", line 399, in
combine_nodule_predictions(None, train_set=False, nodule_th=0.7, extensions=[model_variant])
File "D:/Python/code/Kaggle/Bowl2017/place2-2/step4_train_submissions.py", line 130, in combine_nodule_predictions
target_path = settings.BASE_DIR + "xgboost_trainsets/" "train" + extension + ".csv" if train_set else settings.BASE_DIR + "xgboost_trainsets/" + "submission" + extension + ".csv"
UnboundLocalError: local variable 'extension' referenced before assignment

run code on multiple GPUs

Hi, Julian,
I just start to run your step3_predict_nodules.py using your trained model.
I found it only ran on 1 GPU even I assigned 2 GPUs to it by
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
I also muted config.gpu_options.per_process_gpu_memory_fraction = 0.5
because I am allowed to use the 2 GPUs totally, but the speed was still slow.

Could you let me know how to run the code on multiple GPUs? Thanks.

Extracting NDSB raw data Stage12

Hi Julian,
In step1_preprocess_ndsb.py should I extract both the Stage 1 and Stage 2 NDSB data into the same directory? or into to separate ones?
ie.

/data/ndsb3_extracted_images/<patient_dirs>

/data/ndsb3_extracted_images/stage1/<patient_dirs>
/data/ndsb3_extracted_images/stage2/<patient_dirs>

Where does the data in resource.rar come from?

Hi, julian,

Your work is great. Thanks for sharing.

I download the resource.rar and there are several folders including different data. As far as I know, the data of the folder 'luna16_annotations' is from LUNA16 and LIDC-IDRI ，and the data of the folders 'luna16_manual_labels' and 'ndsb3_manual_labels' are generated manually. How about other folders? Such as annotations_excluded.csv of the folder 'luna16_annotations', candidates_V2.csv of the folder 'luna16_annotations' , the folder 'luna16_falsepos_labels' and the folder 'segmenter_traindata'.

Thanks
Cao jiehui

why do you select "sigmoid" and "None" rather than "Relu" or "LeakyReLU" in last layer of CNN?

Hi, juliandewit
why do you select "sigmoid" and "None" rather than "Relu" or "LeakyReLU" in last layer of CNN? Are Relu or LeakyReLU better than sigmoid? Thanks.

The codes are as below:

out_class = Convolution3D(1, 1, 1, 1, activation="sigmoid", name="out_class_last")(last64)
out_class = Flatten(name="out_class")(out_class)

out_malignancy = Convolution3D(1, 1, 1, 1, activation=None, name="out_malignancy_last")(last64)
out_malignancy = Flatten(name="out_malignancy")(out_malignancy)

Best regards
Gu Yu

Hi julian,can I get the lung mask images and nodule mask images only?how can I save these 3D image? Thanks a lot.

creating/locating masses_predictions.csv

In step4_train_submissions.py you load a csv file as such:

mass_df = pandas.read_csv(settings.BASE_DIR + "masses_predictions.csv")

Is this file created somewhere in the previous steps? or is it one that is downloaded beforehand? I can't seem to find where it is created or located.

Thanks,
Teaghan

settings.MANUAL_ANNOTATIONS_LABELS_DIR and pos_labels_dir = settings.LUNA_NODULE_LABELS_DIR

In step3_predict_nodules.py,
settings.MANUAL_ANNOTATIONS_LABELS_DIR
pos_labels_dir = settings.LUNA_NODULE_LABELS_DIR ,what is the value for that?
Thanks in advance

Training time for step_train_nodule_detector.py

Dear julian,

I run you code on my gpu(Tesla K10) simulator, but it seems it's very time-consuming. I need over 30h to finish one epoch. How long do you need to finish one epoch? Thanks.

predicting time

Implementation of code...

Is any one completely run this codes,with a proper required outputs? Plz reply.
Thank you!

How to process "slices[0].ImageOrientationPatient != [1.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.000000]"

What if there exist dicom whose slices[0].ImageOrientationPatient != [1.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.000000] ? Should i just discard or do some other preprocess?

who generated luna16_train_cubes_manual?

Hi Julian, who created the csv files in resources/luna16_train_cubes_manual? What is the reason to create this data? Sorry I have read your blog, but I still can't understand it.

Luna Data

Hi,julian
can i just change step3 to predict luna date,and make a submission for luna16

network code problem

Hi julian,I don't know what '->Model' means in

def get_net(input_shape=(CUBE_SIZE, CUBE_SIZE, CUBE_SIZE, 1), load_weight_path=None, features=False, mal=False) -> Model:

and when I was running this stage code,some problems happened,is this related to python version,my version is 2.7.

would you please tell me the answer~~~~~

Issue on the data weight while training the nodule detector net

Hello, Julian,

Thank you for offering the code and great work.

I've a question on the data weight of trainset.

I notice that there are several data sources in the trainset such as labels from lidc, v2 from luna16, luna16 false positive, ndsb and non-lung tissue edge. Among them, lidc and nodules of luna16 should be the positive samples, the others are negative samples (the labels for them are 0,0).

But the negative samples are far more than positive ones. It is unbalanced. How about the rates of combing the trainset. I think 1(positive) : 1(false positive) : 2(non-lung tissue or edge) maybe make sense, because too many negative samples would dilute the accuracy.

Would you please give me some suggestions on this issue?

How to extract overlays from images for the U-net mass detector?

Hi, julian,
How do you extract overlays from images for the U-net mass detector?
The overlays in "resources\segmenter_traindata\ *_o.png". How to generate those files?
Thank you!

question about implementation

Hey Julian, this is such an impressive work. I would like to ask about the hardware you used for training and storing files. Did you use AWS cloud (EC2 + S3) or you have local resources?

Count back to absolute slice positions after segmenting nodules

Hi Julian
the code works like a charm for me. Thanks!
However, I wonder how to count back to the absolute slice positions of a nodule.
So currently, one receives coordinates such as 0.1834, 0.5272, 0.71179 in the CSV file for x, y and z.
How can I calculate in which slice the respective nodule is (Z axis), being positioned X voxels from left and Y voxels from top?
I tried multiplying the values with the image shape e.g. (261, 512, 512) but that gives strange looking results, e.g. 47.8674, 269.9264, 364.43648.
Do you know a way to get the correct absolute values?
Thanks a lot
Willi

juliandewit / kaggle_ndsb2017 Goto Github PK

kaggle_ndsb2017's People

Contributors

Stargazers

Watchers

Forkers

kaggle_ndsb2017's Issues

Recommend Projects

Recommend Topics

Recommend Org