Git Product home page Git Product logo

srgb-tir's Introduction

Accepted Proceedings to ICRA 2023

Overview of the edge-guided multi-domain RGB2TIR translation network

overview_new-1

Proposed pipeline for training vision tasks with challenging labels

  • Our target tasks are deep optical flow estimation and object detection in thermal images.

proposed_method-1

Results

Disclaimer

-The same model was used for both synthetic and real RGB to TIR image translation

-The model was trained on identical datasets (sRGB=GTA, TIR=STheReO)

Results on synthetic RGB to TIR translation

synthetic_rgb_original-1

Results on real RGB to TIR translation

  • model trained on synthetic RGB image was adapted to translate real RGB image to TIR image.

real_rgb_translation_pdf-1

Results on thermal optical flow estimation using the proposed method

optical_flow_comparison-1

Video demonstration

Video Label

https://youtu.be/zq8Qh9ygm6w

TODO

  • Upload inference code
  • Upload style selection code
  • Upload training code for custom data training

Environment Setup

  • Download Repo

    $ git clone https://github.com/rpmsnu/sRGB-TIR.git
  • Docker support

    To make things alot easier for environmental setup, I have uploaded my docker image on Dockerhub,

    please use the following command to get the docker

    $docker pull donkeymouse/donkeymouse:icra
    

    *If there persists any problems, please file an issue!

How To Use: RGB to TIR translation

  • Inference

    $ python3 inference_batch.py --input_folder {input dir to your RGB images} --output_folder {output dir to store your translated images} --checkpoint {weight_file address} --a2b 0 --seed {your choice} --num_style {number of tir styles to sample} --synchronized --output_only 
    

    For example, to translate RGB images stored under a folder called "input", and say you want to sample 5 styles, run the following command:

    $python3 inference_batch.py --input_folder ./input --output_folder ./output --checkpoint ./translation_weights.pt --a2b 0 --seed 1234 --num_style 5 --synchronized --output_only --config configs/tir2rgb_folder.yaml
    
  • Network weights

Please download them from here: {link to google drive}

*If the link doesn't work, please file an issue!

Network Details

Edge-guided multi-domain RGB2TIR translation architecture

  • Network Architecture

    • Content Encoder: single 7x7 conv block + four 4x4 conv block + four residual blocks + Instance Normalization
    • Style Encoder: single 7x7 conv block + four 4x4 conv block + four residual blocks + GAP + FC layers
    • Decoder (Generator): 4x4 conv + residual blocks in encoder-decoder architecture. 2 downsampling layers and reflection padding were used.
    • Discriminator: four 4x4 convolutions. Leaky relu activations; LSGAN for loss function, reflection padding was used.
  • Model codes will be released after the review process has been cleared.

  • Training details

    • Iterations: 60,000
    • batch size = 1
    • weight decay = 0.001
    • Optimizer: Adam with B1 = 0.5, B2= 0.999
    • initial learning rate = 0.0001
    • step learning rate policy
    • Learning rate decay rate(gamma) = 0.5
    • Input image size= 640 x 400 for both synthetic RGB and thermal images
  • Config files will be released after the review process has been cleared

Citation

Please consider citing the paper as:

@ARTICLE{lee-2023-edgemultiRGB2TIR,
author={Lee, Dong-Guw and Kim, Ayoung},
conference={IEEE International Conference on Robotics and Automation}, 
title={Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels}, 
year={2023},
status={underreview}

Also, a lot of the code has been built on top of MUNIT (ECCV2018), so please go cite their paper as well.

Contact

If you have any questions, contact here please

srgb-tir's People

Contributors

rpmsnu avatar donkeymouse avatar

Stargazers

Nasti Pelvo avatar Wang Xuesong avatar  avatar  avatar Xingjian Leng avatar  avatar Pavel Iagnyshev avatar Hussein RKEIN avatar bywl_zts avatar biangbiangmian avatar Fangyuan Mao avatar Xinyu avatar  avatar  avatar Sam (Jong Hoon) Park ⚡ avatar Shixin Jiang avatar deyuli avatar bywlzts avatar Abhishek Khoyani avatar Cong Gao avatar chenjun avatar  avatar Bin Zhu avatar  avatar utku topcuoglu avatar Arsham Khashabi avatar co0814__ avatar Zhangyong Tang avatar config avatar  avatar  avatar nowroad avatar Yazan Murhij avatar Jonathan Selvanathan avatar  avatar H.J Shin avatar LZ~! avatar Shihao Yang avatar Boris Bogaerts avatar  avatar Yongxian avatar Haejun Bae avatar  avatar nagaharish avatar  avatar RaymondLau avatar Ma avatar Arslan avatar  avatar  avatar  avatar Sampsa Ranta avatar Robin Cole avatar  avatar  avatar DWB_True avatar Joezey avatar Big Star avatar Myung-Hwan Jeon avatar  avatar  avatar  avatar Giseop Kim avatar 刘国庆, Guoqing Liu avatar

Watchers

Zvika Ashani avatar nagaharish avatar Boris Bogaerts avatar Joezey avatar Sampsa Ranta avatar 刘国庆, Guoqing Liu avatar Myung-Hwan Jeon avatar  avatar

srgb-tir's Issues

Gradients: inf or nan in tensor

Hi,
I used the FLIR dataset to try out the training. From time to time the error above happens and all model parameters are nan's. Could ou observe similiar behaviour?
I fixed it through catching nan values and clipping gradients.

Docker run script?

Thanks for the amazing work!
Can you provide a script to run the docker container?

unrecognized arguments: --configs

Hi,

I faced an issue when I try to run the project.

mgo@MGO-MacBook-Pro ~ % python3 sRGB-TIR/inference_batch.py --input_folder ./Users/mgo/sRGB-TIR/input --output_folder ./Users/mgo/sRGB-TIR/output --checkpoint ./Users/mgo/sRGB-TIR/translation_weights.pt --a2b 0 --seed 1234 --num_style 5 --synchronized --output_only --configs ./Users/mgo/sRGB-TIR/configs/tir2rgb_folder.yaml
usage: inference_batch.py [-h] [--config CONFIG] [--input_folder INPUT_FOLDER]
[--output_folder OUTPUT_FOLDER]
[--checkpoint CHECKPOINT] [--a2b A2B] [--seed SEED]
[--num_style NUM_STYLE] [--synchronized]
[--output_only] [--output_path OUTPUT_PATH]
[--trainer TRAINER] [--compute_IS] [--compute_CIS]
[--inception_a INCEPTION_A]
[--inception_b INCEPTION_B]
inference_batch.py: error: unrecognized arguments: --configs ./Users/mgo/sRGB-TIR/configs/tir2rgb_folder.yaml
mgo@MGO-MacBook-Pro ~ %

Thanks,

These converted images are not thermal images

Thank you for your contribution.

Based on my 15 years experience in thermal imaging applications, unfortunately I have to say that these translated images are not thermal images. These are just inverted grayscale visible images.

Thermal image characteristics is more than converting black to white and white to black.
I aware that it is really difficult problem such that learning from visible data to thermal data during training process should avoid overfitting.

I will try to train this network with another dataset in the near future, then may share the results.

Docker container Terminates

I used the command docker pull donkeymouse/donkeymouse:icra to pull the latest docker image, After running the image the image terminate.
I tried to run the image with -it command for an interactive shell and here is the resulting error:

 docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "-it": executable file not found in $PATH: unknown.
ERRO[0000] error waiting for container: 

please provide an installation guide and some details on the operating system you used for testing.
or at least a requirements file so we can deduce on what OS those could be installed on.

A confusion about the loss fuction

Hi, thank you for your significant and interesting work!

I have two questions about the loss fucntion:

$$
\begin{aligned}
\mathcal{L}{L a p} & =\mathbb{E}\left[\left|L\left(x{T I R}\right)-L\left(x_{T I R, \text { recon }}\right)\right|1\right] \
L\left(x
{T I R}\right) & =\frac{1}{3}\left(L\left(x_{T I R}^1\right)+L\left(x_{T I R}^2\right)+L\left(x_{T I R}^3\right)\right)
\end{aligned}
$$

This loss is LoG loss which constrains the edge similarity between the input RGB image and the generating TIR image. Howerer, I don't understant why it is $L\left(x_{T I R}\right)$ in the above formula, and in my view, the $L\left(x_{T I R}\right)$ only have one channel.

  1. The loss weighting coefficients were set to 20, 10, 10, 20, and 5 respectively. How do you determine these coefficients? Did you try other coefficients? I have a very similar experiment and I found that differernt coefficients have different results which sometimes is good and sometimes bad.

I am very much looking forward to your reply! Thank you again for this meaningful work.

Which weight decay?

Hi,
In your paper at 2) Training details you used a weight decay of 0.5 but in the readme at the bottom a different weight decay of 0.001 is noted. Which did you use or do I confuse something here?

how to process the dataset for training

Thanks for sharing the wonderful work. I want to train the model for RGB-NIR translation, if the method is suitable and how can i process the dataset?
Looking forward for your reply, thanks.

inference problem

hello, I run this command on Windows:
python inference_batch.py --input_folder ./input --output_folder ./output --checkpoint ./pretrained/translatio
n_weights.pt --a2b 0 --seed 1234 --num_style 5 --synchronized --output_only --config configs/tir2rgb_folder.yaml

but i got this error:
RuntimeError: DataLoader worker (pid(s) 21824, 27008, 3960, 11708) exited unexpectedly

what is the reason for this?how can I fix this?

Regarding custom data training

Hi,
Thanks for the amazing work.
Just wanted to know when the code for training custom data training would be available?

CUDA memory issue during inference

Hi, When I try to run your inference code as follows the error happens as follows:

python3 inference_batch.py --input_folder ../data/Train/images --output_folder output --checkpoint ./translation_weights.pt --a2b 0 --seed 123 --num_style 1 --synchronized --output_only --config configs/tir2rgb_folder_unit.yaml 
../data/Train/images/0000001_02999_d_0000005.jpg
/home/hj/Projects/etri2023/sRGB-TIR/inference_batch.py:101: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  images = Variable(images.cuda(), volatile=True)
../data/Train/images/0000001_03499_d_0000006.jpg
Traceback (most recent call last):
  File "/home/hj/Projects/etri2023/sRGB-TIR/inference_batch.py", line 102, in <module>
    content, _ = encode(images)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 120, in encode
    content = self.enc_content(images)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 221, in forward
    return self.model(x)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 254, in forward
    return self.model(x)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 284, in forward
    out = self.model(x)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/Projects/etri2023/sRGB-TIR/networks.py", line 342, in forward
    x = self.conv(self.pad(x))
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/hj/anaconda3/envs/etri/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 23.70 GiB total capacity; 21.79 GiB already allocated; 12.75 MiB free; 22.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

my environment setup is
CPU: Intel i9-10900K
RAM: 128GB
GPU: RTX3090
torch: 1.12.1
python: 3.10.11

how can I run the code?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.