Git Product home page Git Product logo

zouchuhang / layoutnet Goto Github PK

View Code? Open in Web Editor NEW
417.0 26.0 93.0 74.34 MB

Torch implementation of our CVPR 18 paper: "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image"

Home Page: http://openaccess.thecvf.com/content_cvpr_2018/papers/Zou_LayoutNet_Reconstructing_the_CVPR_2018_paper.pdf

License: MIT License

Lua 2.41% MATLAB 56.30% M 0.01% Objective-C 0.01% C++ 7.84% C 12.40% Makefile 0.11% HTML 17.73% CSS 0.12% Perl 0.01% Python 0.34% PostScript 2.72%
deep-learning 3d-layout layoutnet 3d-reconstruction

layoutnet's Introduction

LayoutNet

New: Please check our official PyTorch implementation for LayoutNet v2

Torch implementation of our CVPR 18 paper: "LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image"

See sample video of 3D reconstruced layouts by our method.

Prerequisites

  • Linux
  • NVIDIA GPU + CUDA CuDNN
  • Torch 7

matio: https://github.com/tbeu/matio

  • Matlab

Data

  • Download preprocessed (aligned to horizontal floor plane) training/validation/testing data to current folder

This includes the panoramas from both the panoContext dataset and our labeled stanford 2d-3d dataset.

  • Download groundtruth data to current folder

This includes the groundtruth 2D position of room corners in .mat format from the two dataset. We've corrected some wrong corner labels in PanoContext to match the layout boundaries.

  • Download preprocessed LSUN training/validation/testing data and related .t7 file under /data/LSUN_data/ folder. We've corrected 10% wrong corner labels.

  • We provide training/tesing .t7 samples (selected from PanoContext) to train for general Manhattan layouts prediction. A few sub-samples have non-cuboid room shape ground truth.

Pretrained model

  • Download our pretrained model to current folder. This includes:
  1. The pretrained full approach on the panoContext dataset, the joint boudary and corner prediction branch, the single boundary prediction branch and the 3D layout box regressor;

  2. The pretrained full approach on the LSUN dataset (we've corrected 10% wrong labels), the joint boudary and corner prediction branch and the single boundary prediction branch.

  • The pretrained model for non-cuboid room shape prediction on the panoContext dataset. Using our labeled non-cuboid and cuboid room shape data.

Image preprocess

We provide sample script to extract Manhattan lines and align the panorama in ./matlab/getManhattanAndAlign.m.

To get gt edge map, corner map and box parameters, see sample script ./matlab/preprocessPano.m

To convert gt data to .t7 file, see sample code preProcess_pano.lua

Train network

  • To train our full approach:
th driver_pano_full.lua

Note that this loads the pretrained joint prediction branch and the 3D layout box regressor.

  • To train the joint prediction branch of boudary and corner:
th driver_pano_joint.lua

Note that this loads the pretrained boundary prediction branch.

  • To train the boudary prediction branch:
th driver_pano_edg.lua
  • To train the layout box regressor:
th driver_pano_box.lua

Test network

  • To test on our full approach:
th testNet_pano_full.lua

This saves predicted boundary, corner and 3D layout parameter in "result/" folder.

Optimization

  • To Add Manhattan constraints and optimize for a better layout, open Matlab, then:
cd matlab
panoOptimization.m

This loads saved predictions from the network output and performs sampling.

Evaluation

We provide the Matlab evaluation code for 3D IoU (compute3dOcc_eval.m) and the generation of 2D layout label (getSegMask_eval.m) for evaluating layout pixel accuracy.

Extension to perspective images

  • To train our full approach:
th driver_persp_joint_lsun_type.lua

Note that this loads the pretrained joint corner and boundary prediction branch.

  • To train the joint prediction branch of boudary and corner:
th driver_persp_joint_lsun.lua

Note that this loads the pretrained boundary prediction branch.

  • To train the boudary prediction branch:
th driver_persp_lsun.lua
  • To test the trained network:
th testNet_persp_full_lsun.lua

Note that this saves predicted boundary, corner and room type in "result/" folder. To get the exact 2D corner position on the image, run the following using Matlab:

cd matlab
getLSUNRes.m

You need to download the LSUN data and the toolbox to run through the experiment.

Miscellaneous

  • To get reference for the labeling tool please check panoLabelTool.m.

Citation

Please cite our paper for any purpose of usage.

@inproceedings{zou2018layoutnet,
  title={LayoutNet: Reconstructing the 3D Room Layout from a Single RGB Image},
  author={Zou, Chuhang and Colburn, Alex and Shan, Qi and Hoiem, Derek},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={2051--2059},
  year={2018}
}

layoutnet's People

Contributors

alexcolburn avatar zouchuhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

layoutnet's Issues

What is the scale of the reconstructed 3D room to the actual room?

Hi ,

Thanks for sharing your work. This is very interesting and has some great use cases.

With regard to the result, is it possible to find the measurements of the actual room from the 3D reconstruction? Or in other words, how accurate is the 3d reconstruction in terms of wall width or length measurements to the actual room ?

Could you share the expected validation loss during training?

So far I trained only the first 2 steps. (I don't really care about the box prediction, just the edge and corner detection).

First I trained using driver_pano_edg.lua and reached validation loss of 0.12333107 after 3260 iterations, which stopped improving for the rest ~4700 iterations.

Then, using this model, I trained with driver_pano_joint.lua and reached validation loss of 0.20790252 after 1480 iterations, which stopped improving for the rest ~6500 iterations.

It seems to produce results that are not as good as the supplied pretrained model.

What is the expected validation loss in each step?

Creating Ground Truth

Hi, thank you for your wonderful work. I would like to create a new dataset for this architecture and was wondering how you created the GT.

I am having trouble interpreting the mat files for the ground truth and how to reproduce them for a series of new images.

Thank you so much for your time in advance, and I sincerely appreciate whatever insights you are able to give.

Seems like gradient computation is not right, correct me if I am wrong

d_aob_x0 = (2*x0-1)*n_v_ao*n_v_bo + dot(v_ao, v_bo) * (x0*n_v_bo/n_v_ao + (x0-1)*n_v_ao/n_v_bo);

line 44 and 45 and other lines after should be
d_aob_x0 = (2*x0-1)n_v_aon_v_bo/(sqrt(1-cos(b_aob)cos(b_aob))+eps) + dot(v_ao, v_bo) * (x0n_v_bo/n_v_ao + (x0-1)*n_v_ao/n_v_bo);
d_aob_x0 = -d_aob_x0 /n_v_ao/n_v_ao/n_v_bo/n_v_bo;

looks like only scale difference, but the optimization steps decrease.

question about perspective training

I want to train the perspective model from scratch, but I can' t find info_edg_stack_tr_lsun_640_d6_sig20_trname.t7, can you tell me where I can get this dataset.

3d layout construction

hello @zouchuhang @alexcolburn
I already checked the issue #3 but I still cannot fully understand the output.
since I do not have any background with 3d environment could you please explain some more details such as which tool to generate the mesh and wrap the pano, which variables to use, etc?

Training does not converge

hello,
thank you for sharing the great work.
I tried the edge training 'th driver_pano_edg.lua' followed with your guidance, but the loss value does not converge. I checked the code and the data but can not find the reason. could you give me some advices?

the output as followed

done
414
Uploaded training
46
Uploaded validation
start training
update param, loss = 0.64732569, gradnorm = 6.7421e+00
update param, loss = 2.26082158, gradnorm = 9.6880e-01
update param, loss = 2.10377669, gradnorm = 1.5707e-03
update param, loss = 2.14185762, gradnorm = 0.0000e+00
update param, loss = 2.16074014, gradnorm = 0.0000e+00
update param, loss = 2.17253232, gradnorm = 0.0000e+00
...
update param, loss = 2.05997038, gradnorm = 0.0000e+00
update param, loss = 2.17931867, gradnorm = 0.0000e+00
update param, loss = 2.07385349, gradnorm = 0.0000e+00
iteration 8000, loss = 2.07385349, gradnorm = 0.0000e+00
validation loss = 2.14375149

Original mapping between dataset ids and panoContext names

Hi

What is the mapping between the images in the .t7 files (from data.zip) and their original name in the panoContext dataset? (and Stanford 2d-3d)

I tried to look in the file ./gt/panoContext_train.txt, but it doesn't,
For example:
./gt/panoContext_train.txt[1] says pano_aurfmkmrmsfgau.png
but it's actually file
./data/panoContext_img_train.t7[{{336},{},{},{}}]

and
./data/panoContext_img_train.t7[{{1},{},{},{}}]
is actually file- pano_93a57c28c5e11bb9c96f944c2a649f2b.jpg which appears as
./gt/panoContext_train.txt[364]

Is there some way to find the original mapping between the datasets?

Also, do you have the rotation matrix that was used to align each image? Or should I just execute the getManhattanAndAlign.m on the images and I will get those exact images?

Some questions about dataset

Hello,

Thank you for opensourcing this great work.
Recently I implemented LayoutNet using Pytorch https://github.com/sunset1995/pytorch-layoutnet and found some questions about the ground truth data:

  1. The numbers of line in gt/pano*txt is 1063 while the numbers of gt/label_cor/**/*matis 1028. Can you please supplement the missing ground truth?
  2. If I'm not wrong, labeled corner under label_cor/**/* should be scaled before visualization or evaluation. The scale for stanford2d3d and paonContext are 4.0 and 8.890625 respectively. Is that correct?

Thanks you and have a nice day :)

Issues reproducing network - resulting with different size

Hello! :)
I'm trying to implement your network with keras and it that the network I built has many more parameters than the amount you declared at your paper.
You've mentioned you have been able to train the entire network with a batch size of 20 using 12GB.
(I've even seen in #5 that you've mentioned you use 10.969GB)
It seems that my gpu has 10.57GiB available, but when I try to use a batch size of 15, which by calculation should fit the gpu, the gpu cannot fit the model into it's memory.
I've even removed the 3-D regresson part and it still fails.

So I wanted to ask if you could help me see if i've made any implementation error :)
Could you for example provide the total number of parameters of your model?
And perhaps even better, provide the number of parameters per layer? :)

Here is the description of my implementation :)
I've defined the network as follows:

def layoutnet():
    # Encoder
    input = layers.Input(shape=(6, 512, 1024))  # chw format
    e1 = conv2d_relu_pool(input, 32, name='e1')  # [?, 32, 256, 512]
    e2 = conv2d_relu_pool(e1, 64, name='e2')  # [?, 64, 128, 256]
    e3 = conv2d_relu_pool(e2, 128, name='e3')  # [?, 128, 64, 128]
    e4 = conv2d_relu_pool(e3, 256, name='e4')  # [?, 256, 32, 64]
    e5 = conv2d_relu_pool(e4, 512, name='e5')  # [?, 512, 16, 32]
    e6 = conv2d_relu_pool(e5, 1024, name='e6')  # [?, 1024, 8, 16]
    e7 = conv2d_relu_pool(e6, 2048, name='e7')  # [?, 2048, 4, 8]
    encoder = Model(input, e7)

    # Top decoder branch
    td1 = up_conv2d_relu(e7, 1024, 'td1')  # [?, 8, 16, 1024]
    td1 = layers.Concatenate(axis=1, name='td1_concat')([td1, e6])  # [?, 1024 * 2, 8, 16]

    td2 = up_conv2d_relu(td1, 512, name='td2')  # [?, 16, 32, 512]
    td2 = layers.Concatenate(axis=1, name='td2_concat')([td2, e5])  # [?, 512 * 2, 16, 32]

    td3 = up_conv2d_relu(td2, 256, name='td3')  # [?, 32, 64, 256]
    td3 = layers.Concatenate(axis=1, name='td3_concat')([td3, e4])  # [?, 256 * 2, 32, 64]

    td4 = up_conv2d_relu(td3, 128, name='td4')  # [?, 64, 128, 128]
    td4 = layers.Concatenate(axis=1, name='td4_concat')([td4, e3])  # [?, 128 * 2, 64, 128]

    td5 = up_conv2d_relu(td4, 64, name='td5')  # [?, 128, 256, 64]
    td5 = layers.Concatenate(axis=1, name='td5_concat')([td5, e2])  # [?, 64 * 2, 128, 256]

    td6 = up_conv2d_relu(td5, 32, name='td6')  # [?, 256, 512, 32]
    td6 = layers.Concatenate(axis=1, name='td6_concat')([td6, e1])  # [?, 32 * 2, 256, 512]

    td7 = up_conv2d_relu(td6, 3, name='td7')  # [?, 512, 1024, 3]
    td = layers.Activation('sigmoid')(td7)
    top_decoder = Model(input, td)

    # Bottom decoder branch
    bd1 = layers.Convolution2D(1024, (3, 3), (1, 1), padding='same', activation='relu', name='bd1_conv'+'_conv')(top_decoder.get_layer('td1_upsample').output)  # [?, 1024, 8, 16]
    bd1 = layers.Concatenate(axis=1, name='bd1_concat')([bd1, td1])  # [?, 1024 * 3, 8, 16]

    bd2 = up_conv2d_relu(bd1, 512, name='bd2')  # [?, 16, 32, 512]
    bd2 = layers.Concatenate(axis=1, name='bd2_concat')([bd2, td2])  # [?, 512 * 3, 16, 32]

    bd3 = up_conv2d_relu(bd2, 256, name='bd3')  # [?, 32, 64, 256]
    bd3 = layers.Concatenate(axis=1, name='bd3_concat')([bd3, td3])  # [?, 256 * 3, 32, 64]

    bd4 = up_conv2d_relu(bd3, 128, name='bd4')  # [?, 64, 128, 128]
    bd4 = layers.Concatenate(axis=1, name='bd4_concat')([bd4, td4])  # [?, 128 * 3, 64, 128]

    bd5 = up_conv2d_relu(bd4, 64, name='bd5')  # [?, 128, 256, 64]
    bd5 = layers.Concatenate(axis=1, name='bd5_concat')([bd5, td5])  # [?, 64 * 3, 128, 256]

    bd6 = up_conv2d_relu(bd5, 32, name='bd6')  # [?, 256, 512, 32]
    bd6 = layers.Concatenate(axis=1, name='bd6_concat')([bd6, td6])  # [?, 32 * 3, 256, 512]

    bd7 = up_conv2d_relu(bd6, 1, name='bd7')  # [?, 512, 1024, 1]
    bd = layers.Activation('sigmoid')(bd7)
    bot_decoder = Model(input, bd)

    # 3D box
    # reg = layers.Concatenate(axis=1, name='reg_input')([td, bd])  # [?, 4, 512, 1024]
    # reg = conv2d_relu_pool(reg, 8, name='reg_downsample1')  # [?, 8, 256, 512]
    # reg = conv2d_relu_pool(reg, 16, name='reg_downsample2')  # [?, 16, 128, 256]
    # reg = conv2d_relu_pool(reg, 32, name='reg_downsample3')  # [?, 32, 64, 128]
    # reg = conv2d_relu_pool(reg, 64, name='reg_downsample4')  # [?, 64, 32, 64]
    # reg = conv2d_relu_pool(reg, 128, name='reg_downsample5')  # [?, 128, 16, 32]
    # reg = conv2d_relu_pool(reg, 256, name='reg_downsample6')  # [?, 256, 8, 16]
    # reg = conv2d_relu_pool(reg, 512, name='reg_downsample7')  # [?, 512, 4, 8]
    # reg = layers.Flatten(name='reg_flatten')(reg)
    # reg = layers.Dense(1024, activation='relu', name='reg_dense1')(reg)
    # reg = layers.Dense(256, activation='relu', name='reg_dense2')(reg)
    # reg = layers.Dense(64, activation='relu', name='reg_dense3')(reg)
    # reg = layers.Dense(6, name='reg_dense4')(reg)

    # model = Model(input, [top_decoder, bot_decoder, reg])
    model = Model(input, [td, bd])
    return model

And the number of parameters per layer is shown here:

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 6, 512, 1024) 0                                            
__________________________________________________________________________________________________
e1_conv (Conv2D)                (None, 32, 512, 1024 1760        input_1[0][0]                    
__________________________________________________________________________________________________
e1_pool (MaxPooling2D)          (None, 32, 256, 512) 0           e1_conv[0][0]                    
__________________________________________________________________________________________________
e2_conv (Conv2D)                (None, 64, 256, 512) 18496       e1_pool[0][0]                    
__________________________________________________________________________________________________
e2_pool (MaxPooling2D)          (None, 64, 128, 256) 0           e2_conv[0][0]                    
__________________________________________________________________________________________________
e3_conv (Conv2D)                (None, 128, 128, 256 73856       e2_pool[0][0]                    
__________________________________________________________________________________________________
e3_pool (MaxPooling2D)          (None, 128, 64, 128) 0           e3_conv[0][0]                    
__________________________________________________________________________________________________
e4_conv (Conv2D)                (None, 256, 64, 128) 295168      e3_pool[0][0]                    
__________________________________________________________________________________________________
e4_pool (MaxPooling2D)          (None, 256, 32, 64)  0           e4_conv[0][0]                    
__________________________________________________________________________________________________
e5_conv (Conv2D)                (None, 512, 32, 64)  1180160     e4_pool[0][0]                    
__________________________________________________________________________________________________
e5_pool (MaxPooling2D)          (None, 512, 16, 32)  0           e5_conv[0][0]                    
__________________________________________________________________________________________________
e6_conv (Conv2D)                (None, 1024, 16, 32) 4719616     e5_pool[0][0]                    
__________________________________________________________________________________________________
e6_pool (MaxPooling2D)          (None, 1024, 8, 16)  0           e6_conv[0][0]                    
__________________________________________________________________________________________________
e7_conv (Conv2D)                (None, 2048, 8, 16)  18876416    e6_pool[0][0]                    
__________________________________________________________________________________________________
e7_pool (MaxPooling2D)          (None, 2048, 4, 8)   0           e7_conv[0][0]                    
__________________________________________________________________________________________________
td1_upsample (UpSampling2D)     (None, 2048, 8, 16)  0           e7_pool[0][0]                    
__________________________________________________________________________________________________
td1_conv (Conv2D)               (None, 1024, 8, 16)  18875392    td1_upsample[0][0]               
__________________________________________________________________________________________________
td1_concat (Concatenate)        (None, 2048, 8, 16)  0           td1_conv[0][0]                   
                                                                 e6_pool[0][0]                    
__________________________________________________________________________________________________
bd1_conv_conv (Conv2D)          (None, 1024, 8, 16)  18875392    td1_upsample[0][0]               
__________________________________________________________________________________________________
td2_upsample (UpSampling2D)     (None, 2048, 16, 32) 0           td1_concat[0][0]                 
__________________________________________________________________________________________________
bd1_concat (Concatenate)        (None, 3072, 8, 16)  0           bd1_conv_conv[0][0]              
                                                                 td1_concat[0][0]                 
__________________________________________________________________________________________________
td2_conv (Conv2D)               (None, 512, 16, 32)  9437696     td2_upsample[0][0]               
__________________________________________________________________________________________________
bd2_upsample (UpSampling2D)     (None, 3072, 16, 32) 0           bd1_concat[0][0]                 
__________________________________________________________________________________________________
td2_concat (Concatenate)        (None, 1024, 16, 32) 0           td2_conv[0][0]                   
                                                                 e5_pool[0][0]                    
__________________________________________________________________________________________________
bd2_conv (Conv2D)               (None, 512, 16, 32)  14156288    bd2_upsample[0][0]               
__________________________________________________________________________________________________
td3_upsample (UpSampling2D)     (None, 1024, 32, 64) 0           td2_concat[0][0]                 
__________________________________________________________________________________________________
bd2_concat (Concatenate)        (None, 1536, 16, 32) 0           bd2_conv[0][0]                   
                                                                 td2_concat[0][0]                 
__________________________________________________________________________________________________
td3_conv (Conv2D)               (None, 256, 32, 64)  2359552     td3_upsample[0][0]               
__________________________________________________________________________________________________
bd3_upsample (UpSampling2D)     (None, 1536, 32, 64) 0           bd2_concat[0][0]                 
__________________________________________________________________________________________________
td3_concat (Concatenate)        (None, 512, 32, 64)  0           td3_conv[0][0]                   
                                                                 e4_pool[0][0]                    
__________________________________________________________________________________________________
bd3_conv (Conv2D)               (None, 256, 32, 64)  3539200     bd3_upsample[0][0]               
__________________________________________________________________________________________________
td4_upsample (UpSampling2D)     (None, 512, 64, 128) 0           td3_concat[0][0]                 
__________________________________________________________________________________________________
bd3_concat (Concatenate)        (None, 768, 32, 64)  0           bd3_conv[0][0]                   
                                                                 td3_concat[0][0]                 
__________________________________________________________________________________________________
td4_conv (Conv2D)               (None, 128, 64, 128) 589952      td4_upsample[0][0]               
__________________________________________________________________________________________________
bd4_upsample (UpSampling2D)     (None, 768, 64, 128) 0           bd3_concat[0][0]                 
__________________________________________________________________________________________________
td4_concat (Concatenate)        (None, 256, 64, 128) 0           td4_conv[0][0]                   
                                                                 e3_pool[0][0]                    
__________________________________________________________________________________________________
bd4_conv (Conv2D)               (None, 128, 64, 128) 884864      bd4_upsample[0][0]               
__________________________________________________________________________________________________
td5_upsample (UpSampling2D)     (None, 256, 128, 256 0           td4_concat[0][0]                 
__________________________________________________________________________________________________
bd4_concat (Concatenate)        (None, 384, 64, 128) 0           bd4_conv[0][0]                   
                                                                 td4_concat[0][0]                 
__________________________________________________________________________________________________
td5_conv (Conv2D)               (None, 64, 128, 256) 147520      td5_upsample[0][0]               
__________________________________________________________________________________________________
bd5_upsample (UpSampling2D)     (None, 384, 128, 256 0           bd4_concat[0][0]                 
__________________________________________________________________________________________________
td5_concat (Concatenate)        (None, 128, 128, 256 0           td5_conv[0][0]                   
                                                                 e2_pool[0][0]                    
__________________________________________________________________________________________________
bd5_conv (Conv2D)               (None, 64, 128, 256) 221248      bd5_upsample[0][0]               
__________________________________________________________________________________________________
td6_upsample (UpSampling2D)     (None, 128, 256, 512 0           td5_concat[0][0]                 
__________________________________________________________________________________________________
bd5_concat (Concatenate)        (None, 192, 128, 256 0           bd5_conv[0][0]                   
                                                                 td5_concat[0][0]                 
__________________________________________________________________________________________________
td6_conv (Conv2D)               (None, 32, 256, 512) 36896       td6_upsample[0][0]               
__________________________________________________________________________________________________
bd6_upsample (UpSampling2D)     (None, 192, 256, 512 0           bd5_concat[0][0]                 
__________________________________________________________________________________________________
td6_concat (Concatenate)        (None, 64, 256, 512) 0           td6_conv[0][0]                   
                                                                 e1_pool[0][0]                    
__________________________________________________________________________________________________
bd6_conv (Conv2D)               (None, 32, 256, 512) 55328       bd6_upsample[0][0]               
__________________________________________________________________________________________________
bd6_concat (Concatenate)        (None, 96, 256, 512) 0           bd6_conv[0][0]                   
                                                                 td6_concat[0][0]                 
__________________________________________________________________________________________________
td7_upsample (UpSampling2D)     (None, 64, 512, 1024 0           td6_concat[0][0]                 
__________________________________________________________________________________________________
bd7_upsample (UpSampling2D)     (None, 96, 512, 1024 0           bd6_concat[0][0]                 
__________________________________________________________________________________________________
td7_conv (Conv2D)               (None, 3, 512, 1024) 1731        td7_upsample[0][0]               
__________________________________________________________________________________________________
bd7_conv (Conv2D)               (None, 1, 512, 1024) 865         bd7_upsample[0][0]               
__________________________________________________________________________________________________
activation (Activation)         (None, 3, 512, 1024) 0           td7_conv[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 1, 512, 1024) 0           bd7_conv[0][0]                   
==================================================================================================
Total params: 94,347,396
Trainable params: 94,347,396
Non-trainable params: 0

How to cover rotEdge(variable in getManhattanAndAlign.m ) to t7 ?

I have successfully ran your code getManhattanAndAlign.m and test testNet_pano_full.lua on your demo, but i want to try it on my own pic.
I see your code
lne_ts = torch.load('./data/panoContext_line_test.t7')
and
img_ts = torch.load('./data/panoContext_img_test.t7')

it needs t7 file, but how can i cover variable in matlab to t7 file? it should be saved to image then saved to t7 file ? really thank you for your help.

preprocessPano requires pano_edge_tr_1024

I'm trying Matlab scripts, but preprocessPano.m requires pano_edge_tr_1024..
Could you show how to run it properly?

>> preprocessPano
pano_aadmuaxyxouqic
Error using load
Unable to read file '..\data\pano_edge_tr_1024\vp\pano_aadmuaxyxouqic.mat'. No
such file or directory.

Error in preprocessPano (line 59)
    load(['..\data\pano_edge_tr_1024\vp\' im_name '.mat']);

Experiment result is not consistent with the result reported on the paper

Hi, thanks a lot for sharing your work!
I download your full approach model pretrained on the panoContext dataset and use it to predict on the test set of the panoContext dataset. However, I find that the results inferred by your pretrained model are 73.85, 1.07 and 3.40 respectively while the results on the paper are 74.48, 1.06 and 3.34 respectively. I doubt wheter the pretrained model can obtain the results reported on the paper or whether I did something wrong...
Can u help me? Thanks a lot!

json to mat

Hello,I have used the PanoAnnotator to mark some panorama images, and I get the .json file. How can I get the .mat file in the label_cor like you?(The label_cor only contains the coordinate of corner)

Max pooling is directly followed by upsampling

The network's structure contains max pooling immediately followed by upsampling.
Maybe I'm missing something but it doesn't seem to make any sense. And just removing it should improved results.

local pool7 = nn.SpatialMaxPooling(2,2,2,2)(conv7_relu)
local unpool00 = nn.SpatialUpSamplingNearest(2)(pool7)

Is there something specific that this structure addresses?

What is the ground-truth mask used during training?

Hi

I saw in the file train_pano_joint.lua that you use somekind of mask to increase the loss at some poisitions.
I couldn't find any reference to that neither in the paper or throughout the repository.

Could you please explain what is the mask, how it is generated and why is it needed?

Thanks

    gtMsk = torch.mul(gtMsk, 4)
    gtMsk = gtMsk:cuda()
    gtMsk_w = torch.cmul(loss_d_1, gtMsk)
    loss_d_1 = torch.add(gtMsk_w, loss_d_1)
    gt2Msk = torch.mul(gt2Msk, 4)
    gt2Msk = gt2Msk:cuda()
    gt2Msk_w = torch.cmul(loss_d_2, gt2Msk)
    loss_d_2 = torch.add(gt2Msk_w, loss_d_2)

some poor images cannot make it

Hi,really good work!
when I evaluated some images at optimization step,it's not work any more, I mean most images I used can make it,but except some of them ,for example, like this one below, can you help me:D

Undefined function or variable 'cor_fn_t'.
Error in samplingPanoBox (line 38)
            line_can(line_n+1:end,score_id(line_id)) =cor_fn_t(2*score_id(line_id),1);

3D ground truth interpretation

Hello,

First, thank you for sharing this great work. A quick question about the panoContext_box_train.t7 tensor:

The paper mentions 6 ground truth 3D parameters: sw, sl, sh, tx, tz, r_theta. The first 6 elements in the box tensor above (box[{{1}{1}{1,6}}]), which I believe contain those parameters for the first example image, read:

sw = -0.5154072972870558
sl = -0.6748731674025037
sh = -1.316387492900166
tx = -0.24216556285261603
tz = -0.2114205765327388
r_theta = 0.08283438070600802

A naive interpretation would suggest that the room is almost 3x higher than it is wide? Is there a reason for the negative scale factors? Any guidance on interpretation would be much appreciated

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.