Hi, For fine tuning of weights for COM prediction, I receive the following error,

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

com-train : Mismatch in dimensions about dannce HOT 7 CLOSED

harshk95 commented on July 17, 2024

com-train : Mismatch in dimensions

from dannce.

Comments (7)

diegoaldarondo commented on July 17, 2024

Haven't run into this one before, but I think the issue is most likely that the COM network from which you are finetuning was initially trained with 6 camera rgb data rather than 5 camera mono data.

Regardless, while finetuning a COM network is possible and others have had success with it, I don't recommend it. I've had more success with training COM networks from scratch using 150-200 labels. The number of labels is overkill, but produces reliable results and is fairly easy to reach since COM labeling is so quick. You can find an example config in configs/dannce_rig_com_config.yaml which you would need to modify to fit mono 5 camera data.

In contrast, when you eventually get to training a dannce model, you'll want to finetune from a 5 camera grayscale model trained on Rat7M. @spoonsso can help get you one of those models when he's available.

from dannce.

harshk95 commented on July 17, 2024

Hi,
Thanks for the response.
In the case I have to train a new network would you recommend I use the labels with all the points of the skeleton or create a new label file with only COM. Essentially, how do I use the dannce.mat for labels for training the COM network and the DANNCE network - should I use 2 different ones, one for the COM and one for DANNCE.

Thanks again!

from dannce.

harshk95 commented on July 17, 2024

Hi,
I did run it with the labelling I already had just to check and run into the following error. I have also attached the config file I used as a text file.

com-train C:\Users\realtime\dannce\configs\dannce_test_com_config.yaml
2021-07-12 16:00:40.700211: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
downfac not found in io.yaml file, falling back to main config
extension not found in io.yaml file, falling back to main config
io_config not found in io.yaml file, falling back to main config
crop_height not found in io.yaml file, falling back to main config
crop_width not found in io.yaml file, falling back to main config
n_channels_in not found in io.yaml file, falling back to main config
camnames not found in io.yaml file, falling back to main config
n_views not found in io.yaml file, falling back to main config
n_channels_out not found in io.yaml file, falling back to main config
batch_size not found in io.yaml file, falling back to main config
sigma not found in io.yaml file, falling back to main config
epochs not found in io.yaml file, falling back to main config
verbose not found in io.yaml file, falling back to main config
loss not found in io.yaml file, falling back to main config
lr not found in io.yaml file, falling back to main config
net not found in io.yaml file, falling back to main config
vid_dir_flag not found in io.yaml file, falling back to main config
metric not found in io.yaml file, falling back to main config
num_validation_per_exp not found in io.yaml file, falling back to main config
debug not found in io.yaml file, falling back to main config
max_num_samples not found in io.yaml file, falling back to main config
dsmode not found in io.yaml file, falling back to main config
medfilt_window not found in io.yaml file, falling back to main config
mono not found in io.yaml file, falling back to main config
com_train_dir set to: .\COM\train_results
com_predict_dir set to: .\COM\predict_results
dannce_train_dir set to: .\DANNCE\train_results\AVG
dannce_predict_dir set to: .\DANNCE\predict_results
dannce_predict_model set to: .\DANNCE\train_results\AVG\weights.1200-12.77642.hdf5
exp set to: [{'label3d_file': 'E:/DANNCE_test_210608/20210610_091000_Label3D_dannce.mat'}]
downfac set to: 2
extension set to: .avi
io_config set to: io.yaml
crop_height set to: [0, 960]
crop_width set to: [0, 576]
n_channels_in set to: 1
camnames set to: ['Camera1', 'Camera2', 'Camera3', 'Camera4', 'Camera5']
n_views set to: 5
n_channels_out set to: 1
batch_size set to: 2
sigma set to: 18
epochs set to: 200
verbose set to: 1
loss set to: mask_nan_keep_loss
lr set to: 5e-5
net set to: unet2d_fullbn
vid_dir_flag set to: True
metric set to: mse
num_validation_per_exp set to: 10
debug set to: False
max_num_samples set to: max
dsmode set to: dsm
medfilt_window set to: 30
mono set to: True
base_config set to: C:\Users\realtime\dannce\configs\dannce_test_com_config.yaml
viddir set to: videos
gpu_id set to: 0
immode set to: vid
mirror set to: False
num_train_per_exp set to: None
augment_hue set to: False
augment_brightness set to: False
augment_hue_val set to: 0.05
augment_bright_val set to: 0.05
augment_rotation_val set to: 5
data_split_seed set to: None
valid_exp set to: None
com_finetune_weights set to: None
augment_shift set to: False
augment_zoom set to: False
augment_shear set to: False
augment_rotation set to: False
augment_shear_val set to: 5
augment_zoom_val set to: 0.05
augment_shift_val set to: 0.05
start_batch set to: 0
chunks set to: None
lockfirst set to: None
load_valid set to: None
drop_landmark set to: None
raw_im_h set to: None
raw_im_w set to: None
n_instances set to: 1
start_sample set to: 0
write_npy set to: None
use_npy set to: False
com_predict_weights set to: None
com_debug set to: None
com_exp set to: None
Setting vid_dir_flag to True.
Setting extension to .avi.
Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0]), 'Camera4': array([0]), 'Camera5': array([0])}.
Setting n_channels_in to 3.
Setting raw_im_h to 600.
Setting raw_im_w to 960.
Experiment 0 using videos in E:/DANNCE_test_210608\videos
Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3', 'Camera4', 'Camera5']
{'0_Camera1': array([], dtype=float64), '0_Camera2': array([], dtype=float64), '0_Camera3': array([], dtype=float64), '0_Camera4': array([], dtype=float64), '0_Camera5': array([], dtype=float64)}
E:/DANNCE_test_210608/20210610_091000_Label3D_dannce.mat
Using dsm downsampling
TRAIN EXPTS: [0]
Initializing Network...
2021-07-12 16:00:46.686321: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-07-12 16:00:46.715985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:65:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 625.94GiB/s
2021-07-12 16:00:46.722695: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-07-12 16:00:46.726800: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-07-12 16:00:46.730339: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-07-12 16:00:46.733860: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-07-12 16:00:46.737403: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-07-12 16:00:46.742918: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-07-12 16:00:46.745728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-07-12 16:00:46.748409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-12 16:00:46.750446: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-12 16:00:46.765413: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2205f214a90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-12 16:00:46.768026: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-07-12 16:00:46.770331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:65:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 625.94GiB/s
2021-07-12 16:00:46.775211: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-07-12 16:00:46.777214: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-07-12 16:00:46.779144: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-07-12 16:00:46.781042: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-07-12 16:00:46.783026: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-07-12 16:00:46.785127: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-07-12 16:00:46.787118: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-07-12 16:00:46.790089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-12 16:00:47.402547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-12 16:00:47.405424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-07-12 16:00:47.407160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-07-12 16:00:47.409633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19144 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:65:00.0, compute capability: 7.5)
2021-07-12 16:00:47.422338: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2200ebf2eb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-12 16:00:47.426846: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5
COMPLETE

2021-07-12 16:00:53.041328: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
2021-07-12 16:00:53.042910: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 1 GPUs
2021-07-12 16:00:53.048371: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cupti64_101.dll
2021-07-12 16:00:53.164661: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
Loading data
Traceback (most recent call last):
File "C:\Users\realtime\anaconda3\envs\dannce\Scripts\com-train-script.py", line 33, in
sys.exit(load_entry_point('dannce', 'console_scripts', 'com-train')())
File "c:\users\realtime\dannce\dannce\cli.py", line 42, in com_train_cli
com_train(params)
File "c:\users\realtime\dannce\dannce\interface.py", line 556, in com_train
ims = train_generator.getitem(i)
File "c:\users\realtime\dannce\dannce\engine\generator_aux.py", line 110, in getitem
X, y = self.__data_generation(list_IDs_temp)
File "c:\users\realtime\dannce\dannce\engine\generator_aux.py", line 184, in __data_generation
self.extension,
File "c:\users\realtime\dannce\dannce\engine\video.py", line 209, in load_vid_frame
cur_video_id = np.nonzero([c <= ind for c in chunks])[0][-1]
IndexError: index -1 is out of bounds for axis 0 with size 0

This seems unrelated to the number of points labeled and seems to be related to the videos. Would appreciate any assistance on the above issue. Also, seems to say that n_channels_in was set to 3 despite me specifying otherwise in the config file.

Thanks!
dannce_test_com_config.txt

from dannce.

diegoaldarondo commented on July 17, 2024

In the case I have to train a new network would you recommend I use the labels with all the points of the skeleton or create a new label file with only COM. Essentially, how do I use the dannce.mat for labels for training the COM network and the DANNCE network - should I use 2 different ones, one for the COM and one for DANNCE.

I like using one file for COMs and another for keypoints. I specify which files are used in the io.yaml via the com_exp and exp fields.

I did run it with the labelling I already had just to check and run into the following error.

The error is possibly the same as #53 before but on a different line.

dannce/dannce/engine/processing.py

Line 737 in cefd782

video_files = [f for f in video_files if ".mp4" in f]

Seems like potential fix could be
video_files = [f for f in video_files if params["extension"] in f]

from dannce.

spoonsso commented on July 17, 2024

In the case I have to train a new network would you recommend I use the labels with all the points of the skeleton or create a new label file with only COM. Essentially, how do I use the dannce.mat for labels for training the COM network and the DANNCE network - should I use 2 different ones, one for the COM and one for DANNCE.

I like using one file for COMs and another for keypoints. I specify which files are used in the io.yaml via the com_exp and exp fields.

com-train and dannce-train can use exactly the same label *dannce.mat files. For the com-train case, the full set of labels will be converted to a COM-only label by taking the average across labeled points. This presumes your labelData_data_2d variables (inside *dannce.mat) are populated correctly with the 2d projections of your labeled 3d points -- if you are using Label3D, then this should all be set up correctly for you.

If you do label additional COM-only frames with Label3D, then these should be kept in separate files from the DANNCE labels, and you can point com-train to these files using the com_exp fields in io.yaml.

from dannce.

spoonsso commented on July 17, 2024

This seems unrelated to the number of points labeled and seems to be related to the videos. Would appreciate any assistance on the above issue. Also, seems to say that n_channels_in was set to 3 despite me specifying otherwise in the config file.

@harshk95 were you able to fix that error by changing that video_files line? n_channels_in getting reset to 3 can occur if the program detects that, while you are using grayscale images, the video files are actually saved as RGB (although seemingly counterintuitive, this is a common way to encode grayscale videos -- with equal pixel values across all RGB channels).

from dannce.

harshk95 commented on July 17, 2024

@spoonsso
Changing the line did indeed fix that error and now it starts the training. I used the labelling from Label3D with all the keypoints and will see how the COM prediction goes.
Thanks for the help in getting me started!

from dannce.

com-train : Mismatch in dimensions about dannce HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent