Comments (7)
Haven't run into this one before, but I think the issue is most likely that the COM network from which you are finetuning was initially trained with 6 camera rgb data rather than 5 camera mono data.
Regardless, while finetuning a COM network is possible and others have had success with it, I don't recommend it. I've had more success with training COM networks from scratch using 150-200 labels. The number of labels is overkill, but produces reliable results and is fairly easy to reach since COM labeling is so quick. You can find an example config in configs/dannce_rig_com_config.yaml
which you would need to modify to fit mono 5 camera data.
In contrast, when you eventually get to training a dannce model, you'll want to finetune from a 5 camera grayscale model trained on Rat7M. @spoonsso can help get you one of those models when he's available.
from dannce.
Hi,
Thanks for the response.
In the case I have to train a new network would you recommend I use the labels with all the points of the skeleton or create a new label file with only COM. Essentially, how do I use the dannce.mat for labels for training the COM network and the DANNCE network - should I use 2 different ones, one for the COM and one for DANNCE.
Thanks again!
from dannce.
Hi,
I did run it with the labelling I already had just to check and run into the following error. I have also attached the config file I used as a text file.
com-train C:\Users\realtime\dannce\configs\dannce_test_com_config.yaml
2021-07-12 16:00:40.700211: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
downfac not found in io.yaml file, falling back to main config
extension not found in io.yaml file, falling back to main config
io_config not found in io.yaml file, falling back to main config
crop_height not found in io.yaml file, falling back to main config
crop_width not found in io.yaml file, falling back to main config
n_channels_in not found in io.yaml file, falling back to main config
camnames not found in io.yaml file, falling back to main config
n_views not found in io.yaml file, falling back to main config
n_channels_out not found in io.yaml file, falling back to main config
batch_size not found in io.yaml file, falling back to main config
sigma not found in io.yaml file, falling back to main config
epochs not found in io.yaml file, falling back to main config
verbose not found in io.yaml file, falling back to main config
loss not found in io.yaml file, falling back to main config
lr not found in io.yaml file, falling back to main config
net not found in io.yaml file, falling back to main config
vid_dir_flag not found in io.yaml file, falling back to main config
metric not found in io.yaml file, falling back to main config
num_validation_per_exp not found in io.yaml file, falling back to main config
debug not found in io.yaml file, falling back to main config
max_num_samples not found in io.yaml file, falling back to main config
dsmode not found in io.yaml file, falling back to main config
medfilt_window not found in io.yaml file, falling back to main config
mono not found in io.yaml file, falling back to main config
com_train_dir set to: .\COM\train_results
com_predict_dir set to: .\COM\predict_results
dannce_train_dir set to: .\DANNCE\train_results\AVG
dannce_predict_dir set to: .\DANNCE\predict_results
dannce_predict_model set to: .\DANNCE\train_results\AVG\weights.1200-12.77642.hdf5
exp set to: [{'label3d_file': 'E:/DANNCE_test_210608/20210610_091000_Label3D_dannce.mat'}]
downfac set to: 2
extension set to: .avi
io_config set to: io.yaml
crop_height set to: [0, 960]
crop_width set to: [0, 576]
n_channels_in set to: 1
camnames set to: ['Camera1', 'Camera2', 'Camera3', 'Camera4', 'Camera5']
n_views set to: 5
n_channels_out set to: 1
batch_size set to: 2
sigma set to: 18
epochs set to: 200
verbose set to: 1
loss set to: mask_nan_keep_loss
lr set to: 5e-5
net set to: unet2d_fullbn
vid_dir_flag set to: True
metric set to: mse
num_validation_per_exp set to: 10
debug set to: False
max_num_samples set to: max
dsmode set to: dsm
medfilt_window set to: 30
mono set to: True
base_config set to: C:\Users\realtime\dannce\configs\dannce_test_com_config.yaml
viddir set to: videos
gpu_id set to: 0
immode set to: vid
mirror set to: False
num_train_per_exp set to: None
augment_hue set to: False
augment_brightness set to: False
augment_hue_val set to: 0.05
augment_bright_val set to: 0.05
augment_rotation_val set to: 5
data_split_seed set to: None
valid_exp set to: None
com_finetune_weights set to: None
augment_shift set to: False
augment_zoom set to: False
augment_shear set to: False
augment_rotation set to: False
augment_shear_val set to: 5
augment_zoom_val set to: 0.05
augment_shift_val set to: 0.05
start_batch set to: 0
chunks set to: None
lockfirst set to: None
load_valid set to: None
drop_landmark set to: None
raw_im_h set to: None
raw_im_w set to: None
n_instances set to: 1
start_sample set to: 0
write_npy set to: None
use_npy set to: False
com_predict_weights set to: None
com_debug set to: None
com_exp set to: None
Setting vid_dir_flag to True.
Setting extension to .avi.
Setting chunks to {'Camera1': array([0]), 'Camera2': array([0]), 'Camera3': array([0]), 'Camera4': array([0]), 'Camera5': array([0])}.
Setting n_channels_in to 3.
Setting raw_im_h to 600.
Setting raw_im_w to 960.
Experiment 0 using videos in E:/DANNCE_test_210608\videos
Experiment 0 using camnames: ['Camera1', 'Camera2', 'Camera3', 'Camera4', 'Camera5']
{'0_Camera1': array([], dtype=float64), '0_Camera2': array([], dtype=float64), '0_Camera3': array([], dtype=float64), '0_Camera4': array([], dtype=float64), '0_Camera5': array([], dtype=float64)}
E:/DANNCE_test_210608/20210610_091000_Label3D_dannce.mat
Using dsm downsampling
TRAIN EXPTS: [0]
Initializing Network...
2021-07-12 16:00:46.686321: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2021-07-12 16:00:46.715985: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:65:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 625.94GiB/s
2021-07-12 16:00:46.722695: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-07-12 16:00:46.726800: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-07-12 16:00:46.730339: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-07-12 16:00:46.733860: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-07-12 16:00:46.737403: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-07-12 16:00:46.742918: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-07-12 16:00:46.745728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-07-12 16:00:46.748409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-12 16:00:46.750446: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-12 16:00:46.765413: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2205f214a90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-07-12 16:00:46.768026: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-07-12 16:00:46.770331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties:
pciBusID: 0000:65:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 24.00GiB deviceMemoryBandwidth: 625.94GiB/s
2021-07-12 16:00:46.775211: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2021-07-12 16:00:46.777214: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2021-07-12 16:00:46.779144: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2021-07-12 16:00:46.781042: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2021-07-12 16:00:46.783026: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2021-07-12 16:00:46.785127: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2021-07-12 16:00:46.787118: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2021-07-12 16:00:46.790089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2021-07-12 16:00:47.402547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-12 16:00:47.405424: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0
2021-07-12 16:00:47.407160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N
2021-07-12 16:00:47.409633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 19144 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:65:00.0, compute capability: 7.5)
2021-07-12 16:00:47.422338: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x2200ebf2eb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-07-12 16:00:47.426846: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): TITAN RTX, Compute Capability 7.5
COMPLETE2021-07-12 16:00:53.041328: I tensorflow/core/profiler/lib/profiler_session.cc:164] Profiler session started.
2021-07-12 16:00:53.042910: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 1 GPUs
2021-07-12 16:00:53.048371: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cupti64_101.dll
2021-07-12 16:00:53.164661: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
Loading data
Traceback (most recent call last):
File "C:\Users\realtime\anaconda3\envs\dannce\Scripts\com-train-script.py", line 33, in
sys.exit(load_entry_point('dannce', 'console_scripts', 'com-train')())
File "c:\users\realtime\dannce\dannce\cli.py", line 42, in com_train_cli
com_train(params)
File "c:\users\realtime\dannce\dannce\interface.py", line 556, in com_train
ims = train_generator.getitem(i)
File "c:\users\realtime\dannce\dannce\engine\generator_aux.py", line 110, in getitem
X, y = self.__data_generation(list_IDs_temp)
File "c:\users\realtime\dannce\dannce\engine\generator_aux.py", line 184, in __data_generation
self.extension,
File "c:\users\realtime\dannce\dannce\engine\video.py", line 209, in load_vid_frame
cur_video_id = np.nonzero([c <= ind for c in chunks])[0][-1]
IndexError: index -1 is out of bounds for axis 0 with size 0
This seems unrelated to the number of points labeled and seems to be related to the videos. Would appreciate any assistance on the above issue. Also, seems to say that n_channels_in was set to 3 despite me specifying otherwise in the config file.
Thanks!
dannce_test_com_config.txt
from dannce.
In the case I have to train a new network would you recommend I use the labels with all the points of the skeleton or create a new label file with only COM. Essentially, how do I use the dannce.mat for labels for training the COM network and the DANNCE network - should I use 2 different ones, one for the COM and one for DANNCE.
I like using one file for COMs and another for keypoints. I specify which files are used in the io.yaml via the com_exp and exp fields.
I did run it with the labelling I already had just to check and run into the following error.
The error is possibly the same as #53 before but on a different line.
dannce/dannce/engine/processing.py
Line 737 in cefd782
Seems like potential fix could be
video_files = [f for f in video_files if params["extension"] in f]
from dannce.
In the case I have to train a new network would you recommend I use the labels with all the points of the skeleton or create a new label file with only COM. Essentially, how do I use the dannce.mat for labels for training the COM network and the DANNCE network - should I use 2 different ones, one for the COM and one for DANNCE.
I like using one file for COMs and another for keypoints. I specify which files are used in the io.yaml via the com_exp and exp fields.
com-train and dannce-train can use exactly the same label *dannce.mat files. For the com-train case, the full set of labels will be converted to a COM-only label by taking the average across labeled points. This presumes your labelData_data_2d variables (inside *dannce.mat) are populated correctly with the 2d projections of your labeled 3d points -- if you are using Label3D, then this should all be set up correctly for you.
If you do label additional COM-only frames with Label3D, then these should be kept in separate files from the DANNCE labels, and you can point com-train to these files using the com_exp
fields in io.yaml.
from dannce.
This seems unrelated to the number of points labeled and seems to be related to the videos. Would appreciate any assistance on the above issue. Also, seems to say that n_channels_in was set to 3 despite me specifying otherwise in the config file.
@harshk95 were you able to fix that error by changing that video_files line? n_channels_in
getting reset to 3
can occur if the program detects that, while you are using grayscale images, the video files are actually saved as RGB (although seemingly counterintuitive, this is a common way to encode grayscale videos -- with equal pixel values across all RGB channels).
from dannce.
@spoonsso
Changing the line did indeed fix that error and now it starts the training. I used the labelling from Label3D with all the keypoints and will see how the COM prediction goes.
Thanks for the help in getting me started!
from dannce.
Related Issues (20)
- Zero training/validation errors but completely wrong in labeled images. HOT 2
- How to train DANNCE with more than 6 cameras? HOT 1
- COM deviate a lot from animal HOT 2
- When running dannce-predict demo script, GPU usage is at 0% HOT 1
- Integration of DANNCE and CAPTURE HOT 1
- Could not find enough inliers in imagePoints and worldPoints HOT 2
- n_views error HOT 1
- dannce-predict loss very small, but result same like normal but shift HOT 6
- Re-train network with new labeled frames HOT 1
- Multi animal COM HOT 2
- how to use rats16.mat skeleton for CAPTURE_demo analysis HOT 1
- calibration HOT 1
- OOM error HOT 2
- File "E:\anaconda\envs\tfnew_25\lib\site-packages\tensorflow\python\framework\ops.py", line 6649, in __init__ raise ValueError("name for name_scope must be a string.") HOT 1
- ValueError: name for name_scope must be a string when doing dannce-predict. HOT 1
- ValueError: bad marshal data (unknown type code) when dannce-predict HOT 1
- frames_with_good_tracking
- Fintune with more than 6 cams HOT 2
- Bad predict result after finetune
- Use dannce_predict on own data
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dannce.