Hi! I find the predicted com data are all NaN in the com3d0.mat. the parameters was se

COM prediction values are NaN about dannce HOT 3 CLOSED

yuan0821 commented on August 17, 2024

COM prediction values are NaN

from dannce.

Comments (3)

yuan0821 commented on August 17, 2024

Below is my com-predict log.

(dannce113) F:\testdannce120\dannce\demo\new919>com-predict .\com_config_919.yaml 2022-10-15 18:14:07.426269: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll downfac not found in io.yaml file, falling back to main config extension not found in io.yaml file, falling back to main config io_config not found in io.yaml file, falling back to main config crop_height not found in io.yaml file, falling back to main config crop_width not found in io.yaml file, falling back to main config n_channels_in not found in io.yaml file, falling back to main config camnames not found in io.yaml file, falling back to main config n_views not found in io.yaml file, falling back to main config n_channels_out not found in io.yaml file, falling back to main config batch_size not found in io.yaml file, falling back to main config sigma not found in io.yaml file, falling back to main config epochs not found in io.yaml file, falling back to main config verbose not found in io.yaml file, falling back to main config loss not found in io.yaml file, falling back to main config lr not found in io.yaml file, falling back to main config net not found in io.yaml file, falling back to main config vid_dir_flag not found in io.yaml file, falling back to main config metric not found in io.yaml file, falling back to main config num_validation_per_exp not found in io.yaml file, falling back to main config debug not found in io.yaml file, falling back to main config max_num_samples not found in io.yaml file, falling back to main config train_mode not found in io.yaml file, falling back to main config com_finetune_weights not found in io.yaml file, falling back to main config com_train_dir set to: .\COM\train_results\ com_predict_dir set to: .\COM\predict_results\ dannce_train_dir set to: .\DANNCE\train_results\AVG\ dannce_predict_dir set to: .\DANNCE\predict_results\ exp set to: [{'label3d_file': './20221015_173137_Label3D_dannce.mat'}] downfac set to: 4 extension set to: .avi io_config set to: io.yaml crop_height set to: [0, 1152] crop_width set to: [0, 1920] n_channels_in set to: 1 camnames set to: ['Camera1', 'Camera2', 'Camera3'] n_views set to: 3 n_channels_out set to: 1 batch_size set to: 2 sigma set to: 18 epochs set to: 10 verbose set to: 1 loss set to: mask_nan_keep_loss lr set to: 5e-5 net set to: unet2d_fullbn vid_dir_flag set to: False metric set to: mse num_validation_per_exp set to: 10 debug set to: False max_num_samples set to: 100 train_mode set to: finetune com_finetune_weights set to: ..\markerless_mouse_1\COM\weights\ base_config set to: .\com_config_919.yaml viddir set to: videos gpu_id set to: 0 immode set to: vid mono set to: False mirror set to: False start_batch set to: 0 start_sample set to: 0 dsmode set to: nn com_predict_weights set to: None num_train_per_exp set to: None chunks set to: None lockfirst set to: None load_valid set to: None augment_hue set to: False augment_brightness set to: False augment_hue_val set to: 0.05 augment_bright_val set to: 0.05 augment_rotation_val set to: 5 drop_landmark set to: None raw_im_h set to: None raw_im_w set to: None n_instances set to: 1 write_npy set to: None use_npy set to: False data_split_seed set to: None valid_exp set to: None com_debug set to: None com_exp set to: None augment_rotation set to: False augment_shear set to: False augment_zoom set to: False augment_shift set to: False augment_shear_val set to: 5 augment_zoom_val set to: 0.05 augment_shift_val set to: 0.05 Setting vid_dir_flag to True. Setting extension to .avi. Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0])}. Setting n_channels_in to 3. Setting raw_im_h to 2560. Setting raw_im_w to 2560. Using the following *dannce.mat files: .\20221015_173137_Label3D_dannce.mat Using camnames: ['Camera1', 'Camera2', 'Camera3'] Initializing Network... 2022-10-15 18:14:10.325801: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll 2022-10-15 18:14:10.349421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:14:10.349690: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll 2022-10-15 18:14:10.352770: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll 2022-10-15 18:14:10.353266: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll 2022-10-15 18:14:10.354040: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll 2022-10-15 18:14:10.355596: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll 2022-10-15 18:14:10.355709: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll 2022-10-15 18:14:10.356146: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll 2022-10-15 18:14:10.356725: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll 2022-10-15 18:14:10.357199: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:14:10.357985: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-15 18:14:10.361644: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties: pciBusID: 0000:41:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6 coreClock: 1.8GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s 2022-10-15 18:14:10.361924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0 2022-10-15 18:14:10.754055: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-10-15 18:14:10.754230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0 2022-10-15 18:14:10.755581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N 2022-10-15 18:14:10.756272: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7433 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:41:00.0, compute capability: 8.6) E:\anaconda\envs\dannce113\lib\site-packages\tensorflow\python\keras\optimizer_v2\optimizer_v2.py:375: UserWarning: The lrargument is deprecated, uselearning_rateinstead. "Thelrargument is deprecated, uselearning_rate` instead.")
Loading weights from .\COM\train_results\weights.0-0.00000.hdf5
COMPLETE

Predicting on sample 0
Loading new video: videos\Camera1\0.avi for Camera1
Loading new video: videos\Camera2\0.avi for Camera2
Loading new video: videos\Camera3\0.avi for Camera3
2022-10-15 18:14:12.697986: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2022-10-15 18:14:12.947341: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2022-10-15 18:14:13.646958: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8302
2022-10-15 18:14:14.571302: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2022-10-15 18:14:14.571485: W tensorflow/stream_executor/gpu/asm_compiler.cc:56] Couldn't invoke ptxas.exe --version
2022-10-15 18:14:14.575616: E tensorflow/core/platform/windows/subprocess.cc:287] Call to CreateProcess failed. Error code: 2
2022-10-15 18:14:14.576118: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Internal: Failed to launch ptxas
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
2022-10-15 18:14:14.605683: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2022-10-15 18:14:14.606093: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
Predicting on sample 1
Predicting on sample 2
Predicting on sample 3
Predicting on sample 4
Predicting on sample 5
Predicting on sample 6
Predicting on sample 7
Predicting on sample 8
Predicting on sample 9
Predicting on sample 10
Predicting on sample 11
Predicting on sample 12
Predicting on sample 13
Predicting on sample 14
Predicting on sample 15
Predicting on sample 16
Predicting on sample 17
Predicting on sample 18
Predicting on sample 19
Predicting on sample 20
Predicting on sample 21
Predicting on sample 22
Predicting on sample 23
Predicting on sample 24
Predicting on sample 25
Predicting on sample 26
Predicting on sample 27
Predicting on sample 28
Predicting on sample 29
Predicting on sample 30
Predicting on sample 31
Predicting on sample 32
Predicting on sample 33
Predicting on sample 34
Predicting on sample 35
Predicting on sample 36
Predicting on sample 37
Predicting on sample 38
Predicting on sample 39
Predicting on sample 40
Predicting on sample 41
Predicting on sample 42
Predicting on sample 43
Predicting on sample 44
Predicting on sample 45
Predicting on sample 46
Predicting on sample 47
Predicting on sample 48
Predicting on sample 49
Predicting on sample 50
Predicting on sample 51
Predicting on sample 52
Predicting on sample 53
Predicting on sample 54
Predicting on sample 55
Predicting on sample 56
Predicting on sample 57
Predicting on sample 58
Predicting on sample 59
Predicting on sample 60
Predicting on sample 61
Predicting on sample 62
Predicting on sample 63
Predicting on sample 64
Predicting on sample 65
Predicting on sample 66
Predicting on sample 67
Predicting on sample 68
Predicting on sample 69
Predicting on sample 70
Predicting on sample 71
Predicting on sample 72
Predicting on sample 73
Predicting on sample 74
Predicting on sample 75
Predicting on sample 76
Predicting on sample 77
Predicting on sample 78
Predicting on sample 79
Predicting on sample 80
Predicting on sample 81
Predicting on sample 82
Predicting on sample 83
Predicting on sample 84
Predicting on sample 85
Predicting on sample 86
Predicting on sample 87
Predicting on sample 88
Predicting on sample 89
Predicting on sample 90
Predicting on sample 91
Predicting on sample 92
Predicting on sample 93
Predicting on sample 94
Predicting on sample 95
Predicting on sample 96
Predicting on sample 97
Predicting on sample 98
Predicting on sample 99
using median to get 3D COM
E:\anaconda\envs\dannce113\lib\site-packages\numpy\lib\nanfunctions.py:1114: RuntimeWarning: All-NaN slice encountered
overwrite_input=overwrite_input)
Saving 3D COM to .\COM\predict_results\com3d0.mat
done!`

from dannce.

histun commented on August 17, 2024

Have you fixed this issue?

I've used 1) the provided weight (weights.250-0.00036.hdf5) as well as 2) a newly generated fintuned weight (using the provided pretrained weights.rat.COM.hdf5 and the dannce.mat), however, I always get NaN values for com-predict. (99 NaNs out of 100 predictions).

I thought it may have to do with my env setting, but 1) predicting dannce with the provided weight (weights.12000-0.00014.hdf5) and 2) finetuning/predicting (weights.rat.MAX.6cam.hdf5 with the dannce.mat) seem to work fine.

I'm puzzled with the problems I'm having with COM.
I was wondering if anyone has any ideas.

from dannce.

histun commented on August 17, 2024

I fixed this issue by reinstalling dannce with the from the development branch, which had TF2.4
Since I have RTX ada, I installed cuda 11.8 and cudnn 8.7.0 from the nvidia website following their installation guide.
After this, com-predict with the demo dataset and weights worked fine without NaN data.

from dannce.

COM prediction values are NaN about dannce HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent