Git Product home page Git Product logo

Comments (3)

dyollb avatar dyollb commented on June 2, 2024

The full log looks more like this:

Audit for 16 images
-------------------
Total memory GiB:  1.776
Number of classes: 17

2D:
Real space span:   263.500
Sample dim:        304.000

3D:
Sample dim:        64
Real space span:   263.500
Box span:          54.400
--------------------------------------------------------------------------------
>>> Logged by: 'set_value' in 'hparams.py'
Setting value '263.50000739097595' (type <class 'numpy.float64'>) in subdir 'fit' with name 'real_space_span'
Setting value '304' (type <class 'numpy.int64'>) in subdir 'build' with name 'dim'
Setting value '1' (type <class 'int'>) in subdir 'build' with name 'n_channels'
Entry of name 'n_classes' already set in subdir 'build' with value '17'. Skipping (overwrite=False).

>>> Logged by: 'save_current' in 'hparams.py'
Saving current YAML configuration to file:
 /home/jovyan/work/results/drcmr_16/train_hparams.yaml
--------------------------------------------------------------------------------
>>> Logged by: '_base_loader_func' in 'data_preparation_funcs.py'
Preparing dataset ImagePairLoader(id=train, images=10, data_dir=/home/jovyan/work/data_dir/train)
X10679
--- loaded:     False
--- shape:      [310 310 310   1]
--- bg class    0
--- bg value    ['1pct']
--- scaler      RobustScaler
--- real shape: [263.5 263.5 263.5]
--- pixdim:     [0.85 0.85 0.85]
X14109
--- loaded:     False
--- shape:      [310 310 310   1]
--- bg class    0
--- bg value    ['1pct']
--- scaler      RobustScaler
--- real shape: [263.5 263.5 263.5]
--- pixdim:     [0.85 0.85 0.85]
...
>>> Logged by: '__init__' in 'eager_queue.py'
'Eager' queue created:
  Dataset:      ImagePairLoader(id=train, images=10, data_dir=/home/jovyan/work/data_dir/train)
Preloading all 10 images now... (eager)
'Eager' queue created:
  Dataset:      ImagePairLoader(id=val, images=6, data_dir=/home/jovyan/work/data_dir/val)
Preloading all 6 images now... (eager)
--------------------------------------------------------------------------------
>>> Logged by: 'sample_random_views_with_angle_restriction' in 'sample_grid.py'
Generating 6 random views...
[OBS] Weighting random views by median res: [0.85 0.85 0.85]
--------------------------------------------------------------------------------
>>> Logged by: 'load_or_create_views' in 'data_preparation_funcs.py'
View SD:     0.1
--------------------------------------------------------------------------------
>>> Logged by: 'prepare_for_multi_view_unet' in 'data_preparation_funcs.py'
Views:       N=6
             [ 0.89081436 -0.42408026  0.16311258]
             [ 0.43957442 -0.12374472  0.88964126]
             [-0.54803847  0.1605022   0.82090979]
             [ 0.03570166 -0.69000704  0.72292163]
             [-0.12735015  0.93440077  0.33268173]
             [-0.95791583 -0.14147684  0.24976303]

--------------------------------------------------------------------------------
>>> Logged by: 'get_sequencers' in 'data_preparation_funcs.py'
Preparing sequence objects...
--------------------------------------------------------------------------------
>>> Logged by: 'get_sequence' in 'utils.py'
Using on-the-fly augmenters:
Elastic2D(alpha=[0, 450], sigma=[20, 30], apply_prob=0.333)
--------------------------------------------------------------------------------
>>> Logged by: 'log' in 'isotrophic_live_view_sequence_2d.py'

Is validation:               False
Using real space span:       263.50000739097595
Using sample dim:            304
Using real space sample res: 0.8667763401018945
N fg slices:                 8
Batch size:                  16
Force all FG:                False
Noise SD:                    0.1
Augmenters:                  [Elastic2D(alpha=[0, 450], sigma=[20, 30], apply_prob=0.333)]

Is validation:               True
Using real space span:       263.50000739097595
Using sample dim:            304
Using real space sample res: 0.8667763401018945
N fg slices:                 8
Batch size:                  16
Force all FG:                False
Noise SD:                    0.0
Augmenters:                  None
Waiting for free GPU.
Found free GPU: 0
2021-08-11 14:20:16.064686: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2021-08-11 14:20:17.031107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:17:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.

...

>>> Logged by: 'init_model' in 'model_init.py'
Creating new model of type 'UNet'
--------------------------------------------------------------------------------
>>> Logged by: 'log' in 'unet.py'
UNet Model Summary
------------------
Image rows:        304
Image cols:        304
Image channels:    1
N classes:         17
CF factor:         2.000
Depth:             4
l2 reg:            False
Padding:           same
Conv activation:   relu
Out activation:    softmax
Receptive field:   [155 155]
N params:          62062642
Output:            Tensor("flatten_output/Reshape:0", shape=(None, 92416, 17), dtype=float32)
Crop:              None
--------------------------------------------------------------------------------
>>> Logged by: 'set_bias_weights' in 'utils.py'
OBS: Estimating class counts from 10 images
/opt/conda/lib/python3.7/site-packages/mpunet/utils/utils.py:237: RuntimeWarning: divide by zero encountered in log
  bias = np.log(freq * np.sum(np.exp(freq)))
/opt/conda/lib/python3.7/site-packages/mpunet/utils/utils.py:238: RuntimeWarning: invalid value encountered in true_divide
  bias /= np.linalg.norm(bias)
Setting bias weights on output layer to:
[ 0. -0. -0. -0. -0. -0. -0. -0. -0. -0.  0. -0. nan -0. -0. -0. -0.]
--------------------------------------------------------------------------------
>>> Logged by: 'compile_model' in 'trainer.py'
Optimizer:   <tensorflow.python.keras.optimizer_v2.adam.Adam object at 0x7f138c320590>
Loss funcs:  [<tensorflow.python.keras.losses.SparseCategoricalCrossentropy object at 0x7f138c326950>]
Metrics:     <function init_metrics at 0x7f138c172ef0>
--------------------------------------------------------------------------------
>>> Logged by: 'save_images' in 'plotting.py'
Saving 64 sample images in '<project_dir>/images' folder
--------------------------------------------------------------------------------
>>> Logged by: '_fit' in 'trainer.py'
Using 157 steps per train epoch (total batches=1000000000000)
Using 219 steps per val epoch (total batches=1000000000000)
--------------------------------------------------------------------------------
>>> Logged by: 'init_callback_objects' in 'funcs.py'
[1] Using callback: Validation(params=?)
[2] Using callback: MeanReduceLogArrays(params=?)
[3] Using callback: ReduceLROnPlateau(patience=2, factor=0.9, verbose=1, monitor=val_dice, mode=max)
[4] Using callback: TensorBoard(log_dir=./tensorboard, profile_batch=0)
[5] Using callback: ModelCheckPointClean(filepath=./model/@epoch_{epoch:02d}_val_dice_{val_dice:.5f}.h5, monitor=val_dice, save_best_only=True, save_weights_only=True, verbose=1, mode=max)
[6] Using callback: EarlyStopping(monitor=val_dice, min_delta=0, patience=15, verbose=1, mode=max)
[7] Using callback: TrainTimer(verbose=True, logger=Logger(base_path=/home/jovyan/work/results/drcmr_16, print_to_screen=True, overwrite_existing=False, append_existing=False))
[8] Using callback: CSVLogger(filename=logs/training.csv, separator=,, append=True)
[9] Using callback: FGBatchBalancer(params=?)
[10] Using callback: SavePredictionImages(params=?)
[11] Using callback: LearningCurve(params=?)
[12] Using callback: DividerLine(params=?)
Epoch 1/500
  1/157 [..............................] - ETA: 0s - loss: nan - sparse_categorical_accuracy: 0.7300
...

from multiplanarunet.

perslev avatar perslev commented on June 2, 2024

Hi,

Thanks for reporting this. Based on the log it seems that the issue is caused by a given class not being present across a sample of images used to estimate class frequencies, ultimately leading to the setting of a NaN model weight during initialisation. Specifically, I am referring to the following section of the log:

--------------------------------------------------------------------------------
>>> Logged by: 'set_bias_weights' in 'utils.py'
OBS: Estimating class counts from 10 images
/opt/conda/lib/python3.7/site-packages/mpunet/utils/utils.py:237: RuntimeWarning: divide by zero encountered in log
  bias = np.log(freq * np.sum(np.exp(freq)))
/opt/conda/lib/python3.7/site-packages/mpunet/utils/utils.py:238: RuntimeWarning: invalid value encountered in true_divide
  bias /= np.linalg.norm(bias)
Setting bias weights on output layer to:
[ 0. -0. -0. -0. -0. -0. -0. -0. -0. -0.  0. -0. nan -0. -0. -0. -0.]
--------------------------------------------------------------------------------

I will fix this issue in 1 or 2 weeks when I am back from other activities. Until then, could you please try and set the biased_output_layer variable to False in the train_hparams.yaml parameter file and re-run training to verify that this is indeed the issue? See:

https://github.com/perslev/MultiPlanarUNet/blob/master/mpunet/bin/defaults/MultiPlanar/train_hparams.yaml#L86

Cheers,
Mathias

from multiplanarunet.

dyollb avatar dyollb commented on June 2, 2024

At some point I realized that my dataset has no label 12, i.e. label=12 never is used.
After fixing it in the data the model is now learning...

So the reason for the nan (at position 12 of the weights) was this missing label.

from multiplanarunet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.