rpm-robotics-lab / srgb-tir Goto Github PK

Repository for synthetic RGB to Thermal Infrared translation module from "Edge-guided multidomain RGB to TIR translation", ICRA 2023 submission

License: MIT License

Python 100.00%

srgb-tir's Introduction

Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels

Accepted Proceedings to ICRA 2023

Dong-Guw Lee, Myung-Hwan Jeon, Younggun Cho, Ayoung Kim at RPM Robotics Lab

Overview of the edge-guided multi-domain RGB2TIR translation network

Proposed pipeline for training vision tasks with challenging labels

Our target tasks are deep optical flow estimation and object detection in thermal images.

Results

Disclaimer

-The same model was used for both synthetic and real RGB to TIR image translation

-The model was trained on identical datasets (sRGB=GTA, TIR=STheReO)

Results on synthetic RGB to TIR translation

Results on real RGB to TIR translation

model trained on synthetic RGB image was adapted to translate real RGB image to TIR image.

Results on thermal optical flow estimation using the proposed method

Video demonstration

https://youtu.be/zq8Qh9ygm6w

TODO

Upload inference code
Upload style selection code
Upload training code for custom data training

Environment Setup

Download Repo

$ git clone https://github.com/rpmsnu/sRGB-TIR.git

Docker support

To make things alot easier for environmental setup, I have uploaded my docker image on Dockerhub,

please use the following command to get the docker
```
$docker pull donkeymouse/donkeymouse:icra
```
*If there persists any problems, please file an issue!

How To Use: RGB to TIR translation

Inference

$ python3 inference_batch.py --input_folder {input dir to your RGB images} --output_folder {output dir to store your translated images} --checkpoint {weight_file address} --a2b 0 --seed {your choice} --num_style {number of tir styles to sample} --synchronized --output_only

For example, to translate RGB images stored under a folder called "input", and say you want to sample 5 styles, run the following command:

$python3 inference_batch.py --input_folder ./input --output_folder ./output --checkpoint ./translation_weights.pt --a2b 0 --seed 1234 --num_style 5 --synchronized --output_only --config configs/tir2rgb_folder.yaml

Network weights

Please download them from here: {link to google drive}

*If the link doesn't work, please file an issue!

Network Details

Edge-guided multi-domain RGB2TIR translation architecture

Network Architecture
- Content Encoder: single 7x7 conv block + four 4x4 conv block + four residual blocks + Instance Normalization
- Style Encoder: single 7x7 conv block + four 4x4 conv block + four residual blocks + GAP + FC layers
- Decoder (Generator): 4x4 conv + residual blocks in encoder-decoder architecture. 2 downsampling layers and reflection padding were used.
- Discriminator: four 4x4 convolutions. Leaky relu activations; LSGAN for loss function, reflection padding was used.
Model codes will be released after the review process has been cleared.
Training details
- Iterations: 60,000
- batch size = 1
- weight decay = 0.001
- Optimizer: Adam with B1 = 0.5, B2= 0.999
- initial learning rate = 0.0001
- step learning rate policy
- Learning rate decay rate(gamma) = 0.5
- Input image size= 640 x 400 for both synthetic RGB and thermal images
Config files will be released after the review process has been cleared

Citation

Please consider citing the paper as:

@ARTICLE{lee-2023-edgemultiRGB2TIR,
author={Lee, Dong-Guw and Kim, Ayoung},
conference={IEEE International Conference on Robotics and Automation}, 
title={Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels}, 
year={2023},
status={underreview}

Also, a lot of the code has been built on top of MUNIT (ECCV2018), so please go cite their paper as well.

Contact

If you have any questions, contact here please

[email protected]

srgb-tir's People

Contributors

Stargazers

Watchers

Forkers

arslan-z meghnaraje1 pengjingchao

srgb-tir's Issues

no file or directory: 'configs/edges2handbags_folder.yaml

Hi,

I got the below issue while running train.py
no file or directory: 'configs/edges2handbags_folder.yaml

Gradients: inf or nan in tensor

Hi,
I used the FLIR dataset to try out the training. From time to time the error above happens and all model parameters are nan's. Could ou observe similiar behaviour?
I fixed it through catching nan values and clipping gradients.

Docker run script?

Thanks for the amazing work!
Can you provide a script to run the docker container?

unrecognized arguments: --configs

Hi,

I faced an issue when I try to run the project.

mgo@MGO-MacBook-Pro ~ % python3 sRGB-TIR/inference_batch.py --input_folder ./Users/mgo/sRGB-TIR/input --output_folder ./Users/mgo/sRGB-TIR/output --checkpoint ./Users/mgo/sRGB-TIR/translation_weights.pt --a2b 0 --seed 1234 --num_style 5 --synchronized --output_only --configs ./Users/mgo/sRGB-TIR/configs/tir2rgb_folder.yaml
usage: inference_batch.py [-h] [--config CONFIG] [--input_folder INPUT_FOLDER]
[--output_folder OUTPUT_FOLDER]
[--checkpoint CHECKPOINT] [--a2b A2B] [--seed SEED]
[--num_style NUM_STYLE] [--synchronized]
[--output_only] [--output_path OUTPUT_PATH]
[--trainer TRAINER] [--compute_IS] [--compute_CIS]
[--inception_a INCEPTION_A]
[--inception_b INCEPTION_B]
inference_batch.py: error: unrecognized arguments: --configs ./Users/mgo/sRGB-TIR/configs/tir2rgb_folder.yaml
mgo@MGO-MacBook-Pro ~ %

Thanks,

Can't use the pre-trained model link

Thank you for your work.
But I can't use the pre-trained model link for now.
Please update this.

These converted images are not thermal images

Thank you for your contribution.

Based on my 15 years experience in thermal imaging applications, unfortunately I have to say that these translated images are not thermal images. These are just inverted grayscale visible images.

Thermal image characteristics is more than converting black to white and white to black.
I aware that it is really difficult problem such that learning from visible data to thermal data during training process should avoid overfitting.

I will try to train this network with another dataset in the near future, then may share the results.

Docker container Terminates

I used the command docker pull donkeymouse/donkeymouse:icra to pull the latest docker image, After running the image the image terminate.
I tried to run the image with -it command for an interactive shell and here is the resulting error:

 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "-it": executable file not found in $PATH: unknown.
ERRO[0000] error waiting for container:

please provide an installation guide and some details on the operating system you used for testing.
or at least a requirements file so we can deduce on what OS those could be installed on.

A confusion about the loss fuction

Hi, thank you for your significant and interesting work!

I have two questions about the loss fucntion:

$$
\begin{aligned}
\mathcal{L}{L a p} & =\mathbb{E}\left[\left|L\left(x{T I R}\right)-L\left(x_{T I R, \text { recon }}\right)\right|1\right] \
L\left(x{T I R}\right) & =\frac{1}{3}\left(L\left(x_{T I R}^1\right)+L\left(x_{T I R}^2\right)+L\left(x_{T I R}^3\right)\right)
\end{aligned}
$$

This loss is LoG loss which constrains the edge similarity between the input RGB image and the generating TIR image. Howerer, I don't understant why it is $L\left(x_{T I R}\right)$ in the above formula, and in my view, the $L\left(x_{T I R}\right)$ only have one channel.

The loss weighting coefficients were set to 20, 10, 10, 20, and 5 respectively. How do you determine these coefficients? Did you try other coefficients? I have a very similar experiment and I found that differernt coefficients have different results which sometimes is good and sometimes bad.

I am very much looking forward to your reply! Thank you again for this meaningful work.

Which weight decay?

Hi,
In your paper at 2) Training details you used a weight decay of 0.5 but in the readme at the bottom a different weight decay of 0.001 is noted. Which did you use or do I confuse something here?

how to process the dataset for training

Thanks for sharing the wonderful work. I want to train the model for RGB-NIR translation, if the method is suitable and how can i process the dataset?
Looking forward for your reply, thanks.

Obtained poor-quality infrared images

The infrared image obtained from the RGB image through the model is not good, it looks quite blurry.
What is the reason for this?

inference problem

hello, I run this command on Windows:
python inference_batch.py --input_folder ./input --output_folder ./output --checkpoint ./pretrained/translatio
n_weights.pt --a2b 0 --seed 1234 --num_style 5 --synchronized --output_only --config configs/tir2rgb_folder.yaml

but i got this error:
RuntimeError: DataLoader worker (pid(s) 21824, 27008, 3960, 11708) exited unexpectedly

what is the reason for this?how can I fix this?

python3 inference_batch.py --input_folder ../data/Train/images --output_folder output --checkpoint ./translation_weights.pt --a2b 0 --seed 123 --num_style 1 --synchronized --output_only --config configs/tir2rgb_folder_unit.yaml 
../data/Train/images/0000001_02999_d_0000005.jpg
/home/hj/Projects/etri2023/sRGB-TIR/inference_batch.py:101: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  images = Variable(images.cuda(), volatile=True)
../data/Train/images/0000001_03499_d_0000006.jpg
Traceback (most recent call last):
  File "/home/hj/Projects/etri2023/sRGB-TIR/inference_batch.py", line 102, in <module>
    content, _ = encode(images)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 120, in encode
    content = self.enc_content(images)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 221, in forward
    return self.model(x)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 254, in forward
    return self.model(x)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 284, in forward
    out = self.model(x)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 342, in forward
    x = self.conv(self.pad(x))
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 23.70 GiB total capacity; 21.79 GiB already allocated; 12.75 MiB free; 22.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

my environment setup is
CPU: Intel i9-10900K
RAM: 128GB
GPU: RTX3090
torch: 1.12.1
python: 3.10.11

how can I run the code?

Thanks