Git Product home page Git Product logo

kaggle_ndsb2017's People

Contributors

juliandewit avatar ko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kaggle_ndsb2017's Issues

About ur documentation ....and codes for the lung nodule detection

hiii julian

I am little bit confused about the ndsb dataset. you gave the two solutions for the preprocessing of the dataset.one is for luna dataset and ndsb dataset. from where ndsb dataset you got? also in the following link you showed that, you create the 3D chunks of the scans.How to create it exactly?

http://juliandewit.github.io/kaggle-ndsb2017/
but in the LUNA 16 challenge they also given the documentation for the extraction of the patches. why are you going for the 3D chunks of the scans rather than the patches obtained from the luna dataset???

Thank you!!!

ValueError: bad axis2 argument to swapaxes

i am getting error in step1_preprocess_luna16.py

Computer: DESKTOP-AVK4MS4
0 patient: 1.3.6.1.4.1.14519.5.2.1.6279.6001.105756658031515062000744821260
Img array: (121, 512, 512)
Annos: 0
Origin (x,y,z): [-198.100006 -195. -335.209991]
Spacing (x,y,z): [ 0.76171899 0.76171899 2.5 ]
Rescale: [ 0.76171899 0.76171899 2.5 ]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
(390, 390)
1 patient: 1.3.6.1.4.1.14519.5.2.1.6279.6001.108197895896446896160048741492
Img array: (119, 512, 512)
Annos: 1
Origin (x,y,z): [-182.5 -190. -313.75]
Spacing (x,y,z): [ 0.74218798 0.74218798 2.5 ]
Rescale: [ 0.74218798 0.74218798 2.5 ]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
Direction: [ 1. 0. 0. 0. 1. 0. 0. 0. 1.]
(380, 380)
Node org (x,y,z,diam): (-100.57, 67.26, -231.82, 6.44)
Node tra (x,y,z,diam): (110.0, 347.0, 33.0)
Traceback (most recent call last):
File "step1_preprocess_luna16.py", line 718, in
process_pos_annotations_patient2()
File "step1_preprocess_luna16.py", line 642, in process_pos_annotations_patient2
process_pos_annotations_patient(src_path, patient_id)
File "step1_preprocess_luna16.py", line 280, in process_pos_annotations_patient
center_float_percent = center_float_rescaled / patient_imgs.swapaxes(0,2).shape

ValueError: bad axis2 argument to swapaxes

Cam we get the CT viewer?

Hey @juliandewit. Thank you so much for sharing such great knowledge. I wonder if we could get the CT viewer you used while in the competition.
I know that you may have not published it with this repo for some good reason, but still, please share the code for CT viewer. I actually wanted to visualize the data and see the output in the CT viewer.
I hope you'll understand.
Thank you.

Issue about the negative data and label

Hi, julian,
I am trying to build a nodule detector based on you job, and thanks very much for your sharing.
May I ask some questions:

  1. You use several types of training set:
    labels from lidc, v2 from luna16, luna16 false positive, ndsb and non-lung tissue edge.
    So, on the train stage, except the non-lung tissue edge, the others are all positive sample? and the label for the positive sample is YES(to say if the cube contains a nodule) for positive samples, and NO for non-lung tissue edge, right?

  2. Another question is: When predicting, a 646464 cube is get to the net, the result is if the cub contains a nodule and the probability?
    Any information will be welcomed!

Why 100 is added in the function dice_coef in step2_train_mass_segmenter.py

Hi,juliandewit
Your source code helps me lot. I have another question to ask you. I found the function dice_coef at line 207 in step2_train_mass_segmenter.The function return (2. * intersection + 100) / (K.sum(y_true_f) + K.sum(y_pred_f) + 100).
The definition of dice coef do not contain 100. It seems to be (2. * intersection ) / (K.sum(y_true_f) + K.sum(y_pred_f) )
Why did you add 100 at both numerator and denominator? Thanks for you help.
Gu Yu

ValueError: need at least one array to concatenate

I am getting this error when i ran STEP1B_PREPROCESS_MAKE_TRAIN_CUBES.PY Error has thrown for some luna16_manual_labels files. i don't understand what's happening in some specific CSV files.
i have seen few .png images in luna16_train_cubes_manual folder.

ERROR SCREENSHOT
Computer: DESKTOP-AVK4MS4
1.3.6.1.4.1.14519.5.2.1.6279.6001.128881800399702510818644205032
0 1.3.6.1.4.1.14519.5.2.1.6279.6001.128881800399702510818644205032 2
1.3.6.1.4.1.14519.5.2.1.6279.6001.160216916075817913953530562493
1 1.3.6.1.4.1.14519.5.2.1.6279.6001.160216916075817913953530562493 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.161002239822118346732951898613
1.3.6.1.4.1.14519.5.2.1.6279.6001.167919147233131417984739058859
3 1.3.6.1.4.1.14519.5.2.1.6279.6001.167919147233131417984739058859 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.170825539570536865106681134236
4 1.3.6.1.4.1.14519.5.2.1.6279.6001.170825539570536865106681134236 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.172845185165807139298420209778
5 1.3.6.1.4.1.14519.5.2.1.6279.6001.172845185165807139298420209778 3
1.3.6.1.4.1.14519.5.2.1.6279.6001.173931884906244951746140865701
6 1.3.6.1.4.1.14519.5.2.1.6279.6001.173931884906244951746140865701 2
1.3.6.1.4.1.14519.5.2.1.6279.6001.227968442353440630355230778531
7 1.3.6.1.4.1.14519.5.2.1.6279.6001.227968442353440630355230778531 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.230491296081537726468075344411
8 1.3.6.1.4.1.14519.5.2.1.6279.6001.230491296081537726468075344411 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.241717018262666382493757419144
9 1.3.6.1.4.1.14519.5.2.1.6279.6001.241717018262666382493757419144 1
1.3.6.1.4.1.14519.5.2.1.6279.6001.246225645401227472829175288633
Traceback (most recent call last):
File "step1b_preprocess_make_train_cubes.py", line 271, in
make_pos_annotation_images_manual()
File "step1b_preprocess_make_train_cubes.py", line 139, in make_pos_annotation_images_manual
images = helpers.load_patient_images(patient_id, settings.LUNA16_EXTRACTED_IMAGE_DIR, "*" + CUBE_IMGTYPE_SRC + ".png")
File "C:\Users\Sangryal\Downloads\sathya\helpers.py", line 78, in load_patient_images
res = numpy.vstack(images)
File "C:\Users\Sangryal\Anaconda3\lib\site-packages\numpy\core\shape_base.py", line 234, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate

where are the data being stored??

Hi Julina,

Congratulation on doing such a great work. I just have few question about the directories where you stored the data. In 'setting.py', I see u are referring to following locations:
BASE_DIR_SSD
BASE_DIR
EXTRA_DATA_DIR
NDSB3_RAW_SRC_DIR
LUNA16_RAW_SRC_DIR

I am kind of confused which folder contains what; where am i supposed to store the ndsb data and where to store the LUNA16 dataset.

Thank you so much.

Training nodule detector is slow. 16 hours for an epoch

Hi,

I'm running step2_train_nodule_detector.py in Linux machine with TitanX GPU.
It's taking close to 16 hours for completing a single epoch, where as in Readme.MD it's mentioned the total time for 12 epochs is 8 hours. I'm using anaconda2 python environment.

Can you please help me with this ? What am I missing ?

regarding the error in step2_train_nodule_detector.py

I run the script step2_nodule_detector.py in which for, model 2 on luna16 annotations + ndsb pos annotations I got following error,

File "", line 2, in
train(train_full_set=True, load_weights_path=None, ndsb3_holdout=0, manual_labels=True, model_name="luna_posnegndsb_v1", fold_count=2)

File "", line 12, in train
train_files, holdout_files = get_train_holdout_files(train_percentage=80, ndsb3_holdout=ndsb3_holdout, manual_labels=manual_labels, full_luna_set=train_full_set, fold_count=fold_count)

File "", line 113, in get_train_holdout_files
pos_sample_path = pos_samples[pos_idx]

IndexError: list index out of range
Thanks in advance!
Please reply as soon as possible.

Where does the data in resource.rar come from?

Hi, julian,

Your work is great. Thanks for sharing.

I download the resource.rar and there are several folders including different data. As far as I know, the data in the folder 'luna16_annotations' is from LUNA16 and LIDC-IDRI. How about other folders?

Thanks
Liu Peng

About your code in step1_preprocess_luna16.py

Hi Julian,

This script is generating so many csv files. Can you please tell me what exactly does each of these functions generate?
process_lidc_annotations(only_patient=None, agreement_threshold=0)
1.process_pos_annotations_patient2()
2.process_excluded_annotations_patients(only_patient=None)
3.process_luna_candidates_patients(only_patient_id=None)
4.process_auto_candidates_patients()

Thank You.

Accuracy calculation

in your submission, only the probability of having cancer is calculated. How would you calculate the accuracy of your submission??

extracting features from ur trained network

Hi Julian,

I am more interested in your learned features rather than predicting the final outcome through the network. I was wodnering if it would possbile to extract features from the intermediate layers of ur 3D network? Or do you by any chance know any of these trained networks (preferably trained on 3D images) that can be used easily for feature extraction.

Thanks for your help in advance,
laleh

Question about skipping the cube in step3_predict_nodules.py?

Hi, Julian,

In the function predict_cubes() of step3_predict_nodules.py, you try to predict all the 323232 cubes of each patient. You have a cube skipping condition at line 266:

 if cube_mask.sum() < 2000:
         skipped_count += 1

Would you like to explain why?

Thanks
tjliupeng

batchnorm order in CNN

Thank you so much for your sharing.
I have a question in batchnorm layer.
In step2_train_mass_segmenter.py , from line 314, it's the architecture of 2d u-net. In each block, layers go like this: input -> batchnom -> conv1 -> relu -> conv2 -> relu -> pooling -> output
In other papers, traditional batch norm layers are put between conv layers and relu, in order to avoid gradient explosion. So I wonder why you put batchnorm before conv, do you have some theories to support this order? Or is it a new trick/tip in CNN?
Of course I know there's no "correct" position for every layer, and your work preforms quite well. Congratulations for the challenge!

How to do nodule detect job with step3_predict_nodules.py?

Hello Juliandewit.
I have some dicom files about lung cancer paitients, not the kaggle ones.I run step1_preprocess_nsdb.py and step3.py with dicom files in the directory NDSB3_RAW_SRC_DIR in the setting.py, extracted resources.rar in ./resources/ directorty and extracted train_models.rar in ./models/
directorty.

But I got 9 empty folders in the NDSB3_NODULE_DETECTION_DIR directory.Following what you said in the README.MD,I thought I can get what I want.But now I am a little confused and reading the code.

Could you please tell me how to use the trained models to do the nodule detect job on other dicom files?
Thank you.

all nodules at (0,0,0) shown in the result

I run step3.py separately.

I downloaded the code and tried to run it on my own computer. (also with trained model and LUNA16 training data as test set) However, there are something wrong in the result --- all nodules are detected at (0,0,0) , and the "diameter_mm"s are all negative numbers.

I tried to debug step3.py and find something: at line 60-62, "center_x","center_y","center_z" equals to 0.0 no mater what the input image is.

How could I fix this problem? Waiting for your reply...

I have some problems with the csv.

Hi, julian,
thank you for your sharing.
I am reading and running your code. I have some problems with the csv files in the resource folder. You had answered similar problems. You said that the luna16_falsepos_labels folder was automatically generated. Can you tell me how?
thank you very much.

diameter fields in resources/ndsb3_manual_label

Hi Julian,

Take an example in resources/ndsb3_manual_label
::::::::::::::
id,x,y,z,d,mal,dmm
0,0.7380484,0.4426079,0.4596774,0.08382452,1,0
0,0.6142763,0.630854,0.3790323,0.0589391,1,0
0,0.7439424,0.6382002,0.3790323,0.07203662,1,0
0,0.6660118,0.630854,0.3360215,0.09299278,1,0

There are two fields: d, and dmm fields. Are they (predicted) diameters? Or they are malscore?

AttributeError: 'NoneType' object has no attribute 'reshape' in helpers.py

I am facing error, when i run this function in step1_preprocess_luna16.py
if True:
process_pos_annotations_patient2()
process_excluded_annotations_patients(only_patient=None)

error:

File "C:\Users\Sangryal\Downloads\sathya\kaggle_ndsb2017\kaggle_ndsb2017-master\helpers.py", line 77, in load_patient_images
images = [im.reshape((1,) + im.shape) for im in images]
File "C:\Users\Sangryal\Downloads\sathya\kaggle_ndsb2017\kaggle_ndsb2017-master\helpers.py", line 77, in
images = [im.reshape((1,) + im.shape) for im in images]
AttributeError: 'NoneType' object has no attribute 'reshape'

UnboundLocalError: local variable 'extension' referenced before assignment in function 'combine_nodule_predictions'

Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.2\helpers\pydev\pydevd.py", line 1596, in
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files (x86)\JetBrains\PyCharm 2016.3.2\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/Python/code/Kaggle/Bowl2017/place2-2/step4_train_submissions.py", line 399, in
combine_nodule_predictions(None, train_set=False, nodule_th=0.7, extensions=[model_variant])
File "D:/Python/code/Kaggle/Bowl2017/place2-2/step4_train_submissions.py", line 130, in combine_nodule_predictions
target_path = settings.BASE_DIR + "xgboost_trainsets/" "train" + extension + ".csv" if train_set else settings.BASE_DIR + "xgboost_trainsets/" + "submission" + extension + ".csv"
UnboundLocalError: local variable 'extension' referenced before assignment

run code on multiple GPUs

Hi, Julian,
I just start to run your step3_predict_nodules.py using your trained model.
I found it only ran on 1 GPU even I assigned 2 GPUs to it by
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
I also muted config.gpu_options.per_process_gpu_memory_fraction = 0.5
because I am allowed to use the 2 GPUs totally, but the speed was still slow.

Could you let me know how to run the code on multiple GPUs? Thanks.

Extracting NDSB raw data Stage12

Hi Julian,
In step1_preprocess_ndsb.py should I extract both the Stage 1 and Stage 2 NDSB data into the same directory? or into to separate ones?
ie.

/data/ndsb3_extracted_images/<patient_dirs>

or

/data/ndsb3_extracted_images/stage1/<patient_dirs>
/data/ndsb3_extracted_images/stage2/<patient_dirs>

?

Where does the data in resource.rar come from?

Hi, julian,

Your work is great. Thanks for sharing.

I download the resource.rar and there are several folders including different data. As far as I know, the data of the folder 'luna16_annotations' is from LUNA16 and LIDC-IDRI ,and the data of the folders 'luna16_manual_labels' and 'ndsb3_manual_labels' are generated manually. How about other folders? Such as annotations_excluded.csv of the folder 'luna16_annotations', candidates_V2.csv of the folder 'luna16_annotations' , the folder 'luna16_falsepos_labels' and the folder 'segmenter_traindata'.

Thanks
Cao jiehui

why do you select "sigmoid" and "None" rather than "Relu" or "LeakyReLU" in last layer of CNN?

Hi, juliandewit
why do you select "sigmoid" and "None" rather than "Relu" or "LeakyReLU" in last layer of CNN? Are Relu or LeakyReLU better than sigmoid? Thanks.

The codes are as below:

out_class = Convolution3D(1, 1, 1, 1, activation="sigmoid", name="out_class_last")(last64)
out_class = Flatten(name="out_class")(out_class)

out_malignancy = Convolution3D(1, 1, 1, 1, activation=None, name="out_malignancy_last")(last64)
out_malignancy = Flatten(name="out_malignancy")(out_malignancy)

Best regards
Gu Yu

creating/locating masses_predictions.csv

In step4_train_submissions.py you load a csv file as such:

mass_df = pandas.read_csv(settings.BASE_DIR + "masses_predictions.csv")

Is this file created somewhere in the previous steps? or is it one that is downloaded beforehand? I can't seem to find where it is created or located.

Thanks,
Teaghan

Training time for step_train_nodule_detector.py

Dear julian,

I run you code on my gpu(Tesla K10) simulator, but it seems it's very time-consuming. I need over 30h to finish one epoch. How long do you need to finish one epoch? Thanks.
1508254494 1

who generated luna16_train_cubes_manual?

Hi Julian, who created the csv files in resources/luna16_train_cubes_manual? What is the reason to create this data? Sorry I have read your blog, but I still can't understand it.

Luna Data

Hi,julian
can i just change step3 to predict luna date,and make a submission for luna16

network code problem

Hi julian,I don't know what '->Model' means in

def get_net(input_shape=(CUBE_SIZE, CUBE_SIZE, CUBE_SIZE, 1), load_weight_path=None, features=False, mal=False) -> Model:

and when I was running this stage code,some problems happened,is this related to python version,my version is 2.7.

would you please tell me the answer~~~~~

Issue on the data weight while training the nodule detector net

Hello, Julian,

Thank you for offering the code and great work.

I've a question on the data weight of trainset.

I notice that there are several data sources in the trainset such as labels from lidc, v2 from luna16, luna16 false positive, ndsb and non-lung tissue edge. Among them, lidc and nodules of luna16 should be the positive samples, the others are negative samples (the labels for them are 0,0).

But the negative samples are far more than positive ones. It is unbalanced. How about the rates of combing the trainset. I think 1(positive) : 1(false positive) : 2(non-lung tissue or edge) maybe make sense, because too many negative samples would dilute the accuracy.

Would you please give me some suggestions on this issue?

question about implementation

Hey Julian, this is such an impressive work. I would like to ask about the hardware you used for training and storing files. Did you use AWS cloud (EC2 + S3) or you have local resources?

Count back to absolute slice positions after segmenting nodules

Hi Julian
the code works like a charm for me. Thanks!
However, I wonder how to count back to the absolute slice positions of a nodule.
So currently, one receives coordinates such as 0.1834, 0.5272, 0.71179 in the CSV file for x, y and z.
How can I calculate in which slice the respective nodule is (Z axis), being positioned X voxels from left and Y voxels from top?
I tried multiplying the values with the image shape e.g. (261, 512, 512) but that gives strange looking results, e.g. 47.8674, 269.9264, 364.43648.
Do you know a way to get the correct absolute values?
Thanks a lot
Willi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.