lioryariv / idr Goto Github PK

License: MIT License

Python 99.81% Shell 0.19%

idr's Introduction

Multiview Neural Surface Reconstruction
by Disentangling Geometry and Appearance

Project Page | Paper | Data

This repository contains an implementation for the NeurIPS 2020 paper Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance.

The paper introduce Implicit Differentiable Renderer (IDR): a neural network architecture that simultaneously learns the 3D geometry, appearance and cameras from a set of 2D images. IDR able to produce high fidelity 3D surface reconstruction, by disentangling geometry and appearance, learned solely from masked 2D images and rough camera estimates.

Installation Requirmenets

The code is compatible with python 3.7 and pytorch 1.2. In addition, the following packages are required:
numpy, pyhocon, plotly, scikit-image, trimesh, imageio, opencv, torchvision.

You can create an anaconda environment called idr with the required dependencies by running:

conda env create -f environment.yml
conda activate idr

Usage

Multiview 3D reconstruction

Data

We apply our multiview surface reconstruction model to real 2D images from the DTU MVS repository. The 15 scans data, including the manually annotated masks and the noisy initializations for the trainable cameras setup, can be download using:

bash data/download_data.sh

For more information on the data convention and how to run IDR on a new data please have a look at data convention.

We used our method to generate 3D reconstructions in two different setups:

Training with fixed ground truth cameras

For training IDR run:

cd ./code
python training/exp_runner.py --conf ./confs/dtu_fixed_cameras.conf --scan_id SCAN_ID

where SCAN_ID is the id of the DTU scene to reconstruct.

Then, to produce the meshed surface, run:

cd ./code
python evaluation/eval.py  --conf ./confs/dtu_fixed_cameras.conf --scan_id SCAN_ID --checkpoint CHECKPOINT [--eval_rendering]

where CHECKPOINT is the epoch you wish to evaluate or 'latest' if you wish to take the most recent epoch. Turning on --eval_rendering will further produce and evaluate PSNR of train image reconstructions.

Training with trainable cameras with noisy initializations

For training IDR with cameras optimization run:

cd ./code
python training/exp_runner.py --train_cameras --conf ./confs/dtu_trained_cameras.conf --scan_id SCAN_ID

Then, to evaluate cameras accuracy and to produce the meshed surface, run:

cd ./code
python evaluation/eval.py  --eval_cameras --conf ./confs/dtu_trained_cameras.conf --scan_id SCAN_ID --checkpoint CHECKPOINT [--eval_rendering]

Evaluation on pretrained models

We have uploaded IDR trained models, and you can run the evaluation using:

cd ./code
python evaluation/eval.py --exps_folder trained_models --conf ./confs/dtu_fixed_cameras.conf --scan_id SCAN_ID  --checkpoint 2000 [--eval_rendering]

Or, for trained cameras:

python evaluation/eval.py --exps_folder trained_models --conf ./confs/dtu_trained_cameras.conf --scan_id SCAN_ID --checkpoint 2000 --eval_cameras [--eval_rendering]

Disentanglement of geometry and appearance

For transferring the appearance learned from one scene to unseen geometry, run:

cd ./code
python evaluation/eval_disentanglement.py --geometry_id GEOMETRY_ID --appearance_id APPEARANCE _ID

This script will produce novel views of the geometry of the GEOMETRY_ID scan trained model, and the rendering of the APPEARANCE_ID scan trained model.

Citation

If you find our work useful in your research, please consider citing:

@article{yariv2020multiview,
title={Multiview Neural Surface Reconstruction by Disentangling Geometry and Appearance},
author={Yariv, Lior and Kasten, Yoni and Moran, Dror and Galun, Meirav and Atzmon, Matan and Ronen, Basri and Lipman, Yaron},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}

Related papers

Here are related works on implicit neural representation from our group:

idr's People

Contributors

Stargazers

Watchers

Forkers

nadavsellam giniestellers karolmajek zebrajack pengsongyou peterzhousz 4da avani17101 eduardramon zumbalamambo cauchyvotary mingzailao bruinxiong chengfanbrain kai-46 kazukiyoshiyama-sony stjordanis thibautlbl youweilyu simonggx hufu6371 chensjtu hengfei-wang gabrielscheibler sergeyprokudin chubei-oppen easy-shu satoshirobatofujimoto olegjakushkin trisct xiaokangw dkasuga luffyfly rlczddl hulabcnu derrick-xwp coolchameleon junleen 5l1v3r1 xinshenghust shubham-goel wangxihao jfkakajfk aliushn arthurlirui likojack shuowang-ai rakeshshrestha31 peterzs peterouzh whoislhc sun510001 gongzhihong cairuinhan maheshkkumar cleardumm mahsa13473 baejjangeee gerwang liuyveg weihongpan sillsill777 hmthanh datu0615 przem-research sunshineywz123 rspezialetti autohdr gg-big-org zncanke xy-cong phoenixdigitalfx gemmechu babyblue26 avinash2468 florianhofherr ginko2501 pyc-kjmkj yuanliang-ma m-richa zichenwang01 miangoleh

idr's Issues

How to plot the mesh of colmap?

Hi! Greak work!
I want to know how to save the mesh as an image like the photos in your paper.
Thanks!

Variable light sources

In the paper, the surface light field assumes, to my understanding, that the light source is fixed in the scene, and the DTU MVS dataset uses fixed lights, so, would moving light sources (like a light attached to the camera) worsen the performance of the method?

On a similar topic, what effect would including images with the same camera poses but different lighting conditions (position or intensity) have on the quality of the 3d reconstructions?

Supervise by normal maps

Hi, thanks for the great work!

I wonder if this differentiable renderer is suitable to supervise the SDF by normal maps.
Since computing normals would implement another gradient computation, is there any trouble with this?

Thanks!

RGB loss calculation

Hi, just going through the rgb loss calculation part. For my understanding, it seems the highlighted part should be: l_1 loss divided by the float((object_mask).sum()), which means averge the valid number of pixels instead of also considering the pixels outside of the object mask. Hope someone can correct me if there is anything wrong, thanks.

rgb_loss stuck at zero

Hi! Thank you for the great work, I'm trying to get system to work on my own images, but am facing some difficulties. I obtain the position information using an aruco board on which I place my object. Now, the algorithm moves past the ray-tracing step properly, but then the algorithm outputs an rgb-loss of "0.0". Did anyone encounter something similar? I am very grateful for any type of response :)

Softplus activation

Hey there,

This is not an issue, just a question. Using softplus activation is significantly slower than using for instance ReLU. Is there any technical (theoretical or practical) reason why you chose softplus in front of other more efficient activations?

Thank you!

dividing dot product get nan in the result

idr/code/model/sample_network.py

Line 15 in 44959e7

 surface_dists_theta = surface_dists - (surface_output - surface_sdf_values) / surface_points_dot 

cameras.npz file

In order to run IDR on new data [DIR PATH], you need to supply image and mask directories, as well as cameras.npz file containing the appropriate camera projection matrices.

Dear authors,

Do we need to convert camera projection matrices in DTU dataset from txt to npy files?

Thanks

The meaning of `preprocess_cameras.py`

I've got the sense that world_mat_ is the projection matrix converting 3d world coordinates to 2d pixel image coordinates; 'scale_mat_' is the normalization transform to move and scale the 3d object in world space to let it be centered at the origin and inside a unit sphere.
My question is why do we need to do that in such a sophisticated way? I checked the normalization matrix on DTU and data_BlendedMVS, It seems that for each object the normalization matrices are the same for all views. Then what kind of scenarios are this preprocess for? Thank you :-)

Preprocessing data problem

Hi, thanks for your awesome work.
I intended to use the preprocess_cameras.py to preprocess the custom data, which can be run on threestudio. However, it always raise the bellow error.
Can you provide some insights of where to modify?

My input camera is proj_mtx_new, mask is extracted from the alpha of a png.
Thanks in advance!

c2w = transform[:3, :4] # transform is a synthesized parameters
mv = torch.linalg.inv(transform)
fovy =  np.random.uniform(self.fovy_range_min, self.fovy_range_max)
self.resolution = [512, 512]
K = np.array([
                [fovy, 0, self.resolution[0] / 2, 0],
                [0, fovy, self.resolution[1] / 2, 0],
                [0, 0, 1, 0],
                [0, 0, 0, 1]
            ])
K = torch.from_numpy(K).float()
proj_mtx_new = K @ mv

Number of points:0
/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py:220: RuntimeWarning: Mean of empty slice.
  centroid = np.array(all_Xs).mean(axis=0)
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:192: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:269: RuntimeWarning: Degrees of freedom <= 0 for slice
  ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:226: RuntimeWarning: invalid value encountered in divide
  arrmean = um.true_divide(arrmean, div, out=arrmean,
/opt/conda/lib/python3.9/site-packages/numpy/core/_methods.py:261: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
Traceback (most recent call last):
  File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 312, in <module>
    get_normalization(opt.source_dir, opt.use_linear_init, dataset)
  File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 266, in get_normalization
    normalization, all_Xs = get_normalization_function(Ps, mask_points_all, number_of_normalization_points, number_of_cameras,masks_all)
  File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 225, in get_normalization_function
    centroid,scale,all_Xs = refine_visual_hull(masks_all, Ps, scale, centroid)
  File "/mnt/petrelfs/wangtengfei/code/NeuS2/tools/preprocess_cameras.py", line 151, in refine_visual_hull
    points = points + center[:, np.newaxis]
IndexError: invalid index to scalar variable.

How to render video?

Hi folks,

Thanks for this awesome work. I'm wondering how to render the "spiral" videos demonstrated in the project page?

Any reply would be appreciated!

Best,
Peihao

Confused about the implementation of equation 3 in the paper

equation 3 in paper

the code in this repo

Compared the implementation of the description of equation 3 to the repo code, I am confused about the code surface_dists_theta = surface_dists - (surface_output - surface_sdf_values) / surface_points_dot, here surface_output is f (c + t 0 v; θ) and is equal to surface_sdf_values which has a detach operation in surface_sdf_values = output[:N, 0:1].detach(), it seems that f (c + t 0 v; θ) - f (c + t 0 v; θ).detach(), should the code consistent with the formula be surface_dists_theta = surface_dists - surface_output / surface_points_dot?

Colored mesh

How can I create a .ply file with colored mesh?

Why transpose rotation matrix when load K, R and t from P?

In the function load_K_Rt_from_P(), the rotation matrix was transposed after got from the cv2.decomposeProjectionmatrix(). As I know, the result from the cv2.decomposeProjectionmatrix() is the really rotation matrix and t and not need to be changed, why the rotation matrix iwas transposed but others was not changed.

Train the model with blender synthetic data

Hi, I am trying to train network with synthetic dataset from blender and trying to figure out the exact meaning 'scale_mats, world_mats' in scene_dataset.py Line41. And I did download the original DTU dataset, it seems only provide the intrinsic and extrinsic camera parameters. I want to know how did you preprocess the data, is it possible to upload the 'data processing code' used in the paper?

Why not apply positional encoding to positions in the rendering network?

Hi,

I have a quick question regarding the design choice made in this work. In the rendering network, positional encoding is applied to view directions but not positions. This is quite different from NeRF, where they positional-encode the 3d positions. I just wonder if there's any special consideration behind this choice? Thank you.

Running on Colmap Data

Thanks for the great work. I was trying to run it on some images captured through my camera. I used the colmap to get the world matrices. But it is giving the following error while trying to run preprocess_camera.py

preprocess_cameras.py:87: DeprecationWarning: np.floatis a deprecated alias for the builtinfloat. To silence this warning, use floatby itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.float64here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations mask_points_all.append(np.stack((xs,ys,np.ones_like(xs))).astype(np.float)) preprocess_cameras.py:45: DeprecationWarning:np.floatis a deprecated alias for the builtinfloat. To silence this warning, use floatby itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, usenp.float64 here. Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations cur_l_1=Fj0 @ np.array([curx,cury,1.0]).astype(np.float) Number of points:0 preprocess_cameras.py:169: RuntimeWarning: Mean of empty slice. centroid = np.array(all_Xs).mean(axis=0) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:189: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:263: RuntimeWarning: Degrees of freedom <= 0 for slice keepdims=keepdims, where=where) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:223: RuntimeWarning: invalid value encountered in true_divide subok=False) /home/kshitiz/miniconda3/envs/idr/lib/python3.7/site-packages/numpy/core/_methods.py:254: RuntimeWarning: invalid value encountered in double_scalars ret = ret.dtype.type(ret / rcount) Traceback (most recent call last): File "preprocess_cameras.py", line 244, in <module> get_normalization(opt.source_dir, opt.use_linear_init) File "preprocess_cameras.py", line 207, in get_normalization normalization,all_Xs=get_normalization_function(Ps, mask_points_all, number_of_normalization_points, number_of_cameras,masks_all) File "preprocess_cameras.py", line 175, in get_normalization_function centroid,scale,all_Xs = refine_visual_hull(masks_all, Ps, scale, centroid) File "preprocess_cameras.py", line 101, in refine_visual_hull points = points + center[:, np.newaxis] IndexError: invalid index to scalar variable.

How to get an initial cameras.npz file？

How do you get the input cameras.npz (world_mat and scale_mat), not the output cameras_new.npz?

Cuda OOM in plots.py during mesh extraction

Hi,

Thank you for sharing the code. I trained the model. When I ran eval.py to extract the mesh using the latest checkpoint I am getting Cuda OOM in the File "../code/utils/plots.py", line 197, in get_surface_high_res_mesh which is the line
grid_points = torch.cat(g, dim=0)
I tried to change the following code snippet in plots.py, but it did not help. My GPU has 11GB memory. I also tried using 2 gpus, but I think the eval code uses only a single gpu. I also tried torch.cuda.empty_cache() to clear the cuda cache but I still get the OOM error. Could you please provide some guidance on how to fix the OOM problem? Thanks.

g = []
for i, pnts in enumerate(torch.split(grid_points, 100000, dim=0)):
    g.append(torch.bmm(vecs.unsqueeze(0).repeat(pnts.shape[0], 1, 1).transpose(1, 2),
                       pnts.unsqueeze(-1)).squeeze() + s_mean)
grid_points = torch.cat(g, dim=0)

Questions about Equation 3 in the paper

Hi There,

Thanks for sharing this great work.

I have difficulties understanding Equation 3 in the paper relating the $\hat{x}$, which is the differentiable ray surface intersection.
According to the code below in implicit_differentiable_renderer.py, I don't think it is implemented as described in the paper.

        points = (cam_loc.unsqueeze(1) + dists.reshape(batch_size, num_pixels, 1) * ray_dirs).reshape(-1, 3)
        sdf_output = self.implicit_network(points)[:, 0:1]
        ray_dirs = ray_dirs.reshape(-1, 3)
        
        if self.training:
                .....
                surface_points = points[surface_mask]
                surface_output = sdf_output[surface_mask]
                .....
                output = self.implicit_network(surface_points)
                surface_sdf_values = output[:N, 0:1].detach()

From the code above, sdf_output and surface_sdf_values are the output of implicit_network with the same input. Hence, they should always be the same. Actually, When I print surface_output - surface_sdf_values, the value is indeed close to zero (within the float value precision limit).

So, does it make sense to update $t(\theta)$ as the following in the sample_network? :

surface_dists_theta = surface_dists - (surface_output - surface_sdf_values) / surface_points_dot

I literally spent days trying to understand this code but failed. Could you elaborate more on this?

Thank you.

why not Normalize the normal which input to the rendering network?

In the code:

idr/code/model/implicit_differentiable_renderer.py

Line 249 in 44959e7

def get_rbg_value(self, points, view_dirs):

I found that the normal predicted by the implicit network is directly input to the rendering network without normalization.

I want to know if it is well-designed by you, and why?

Training time?

Great work! Thanks for sharing the code. Do you have a rough estimate of how long it takes your method to train a single DTU scene? Thank you.

Quantitative Evaluation on DTU

Hi @lioryariv, this is great work!

I noticed that the published code doesn't contain scripts for quantitatively evaluating the reconstructed surface against the DTU ground truth. Your packaged DTU dataset doesn't contain ground-truth shapes either.

Could you please share your scripts for quantitative evaluation of DTU scenes? If you can also share the ground truth shapes (packaged with your DTU data) to use with your scripts, that would be amazing!

Thanks and Regards,
Shubham

Visual hull

I don't need help anymore, thanks.

Mask loss implementation difference

Hi, thank you for this repository.
I have a question about Mask loss (Eq. 7). According to the paper:

But in the implementation, sigmoid is not used:

Could you please explain why?

color mesh.

hello, is there some way to construct the color mesh?

load_K_Rt_from_P

Closing this issue as I've figured it out. Thanks!

Will the batch size affect the result?

Hi I noticed that the default batch size is 1 and I was thinking that increasing the batch size would speed up the training. But After I set the batch size to 25. The result looks blury comparing to the result trained with batch size one. I am wandering what might go wrong in my setting. Have you met such an issue before?

Understanding to the "SampleNetwork" module

Hi there, really nice work!
And I'm trying to understand this line code, it looks different from Eq.3 of the main paper, the "surface_output, surface_sdf_values" seems are the same value and it would be really helpful if you could explain it more. Thank you in advance!

Abnormal transferring appearance results

Thanks for the great work. For my side, I am trying to train one appearance and transfer it to other unseen geometry objects, but the results are not so correct, may I know is

there anyone else also encountered this issue? Thanks in advance.

How to convert the NeRF synthetic dataset

Thanks for your great work! I want to convert the nerf synthetic dataset into the form like yours, can you tell me how to do it? Thanks!

Geometric initialisation

I would like to ask you, how to modify the geometric initialisation part for the case, where the output from the positional encoding does not contain the raw XYZ and consist only of sin and cos of the XYZ at various frequencies? Thank you.

I was able to do training IDR but face normals are generated incorrect shape , mesh rendering is not performed

I have run the code and it nearly took 5:30 hours and after trying with SCAN_ID=24, I have seen there has been this pop up saying that face normal shape incorrect in 2000 epoch,

can you give a solution how to resolve it?

face_normals incorrect shape, ignoring!

RayTracing: object = 1157/2048, secant on 722/741.

dtu_fixed_cameras_24 [2000] (0/49): loss = 0.11349464952945709, rgb_loss = 0.11191627383232117, eikonal_loss = 0.010356109589338303, mask_loss = 5.427672022051411e-06, alpha = 1600.0, lr = 2.5e-05

RayTracing: object = 1359/2048, secant on 634/653.

What is the CHECKPOINT is it any integer i.e., 2000 epoch if so , I did run but there was an error saying error no directory for eval-rendering. can you briefly say what it is?

what I have done is python evaluation/eval.py --conf ./confs/dtu_fixed_cameras.conf --scan_id 24 --checkpoint 2000

resulting face normals incorrect shape for 2 steps after that it notifies face normal is zero for rest of the following and stop executing.

Hoping to know what can be done further.

Pytorch CPU version

I see the Pytorch in the environment.yml file uses pytorch=1.2.0=py3.7_cuda10.0.130_cudnn7.6.2_0.

Can the code run with Pytorch CPU version as well?

conda install -c pytorch pytorch-cpu

Thanks.

question about evaluation

thanks for sharing this great work! I have downloaded the ground truth 3d point clouds from DTU dataset. While the point clouds contain the table and some point clouds are not complete. How did you get the chamfer distance from your generated meshes and the ground truth point coluds?

Additional data in the cameras.npz files

I noticed that the cameras.npz file provided for the DTU dataset has some additional matrices called "camera_mat_{i}". Are these the 3x3 calibration matrices K from P = K[R | t]?

I tried to extract K, R and t from P (which are "world_mat_{i}") using opencv:
K, R, t, _, _, _ = cv2.decomposeProjectionMatrix(P2)

The K that I get from here does not match with the corresponding "camera_mat_{i}". Also the t values that I get are confusing to me. t has the shape (4, 1) and usually the 4th entry is always 1, but that is not happening here.

UPDATE:
The 't' values were actually okay. I forgot to divide it by the 4th entry

How to get scale matrix

Hello, I am a big fan of this project
I wonder how you get the scale matrix with DTU dataset.
If you don't mind, could you share the code which you use to get scale matrix?

I think I solved it
Thank you

How to run on different set of images?

The given readme shows how to run the code on the DTU dataset but if we want to run it on some other set of images, how do we do it?
What all data is needed other than the images to make the code run?

Cameras Coordinate System

Hi there,

First of all congrats on the project and on the code. It's really neat 🔥 .

I'm trying to run the code with my own scene and, even though the losses are printed with kind of meaningful values, they do not decrease and the learnt scene doesn't resemble at all at what it should. I was wondering if the issue could be related to the coordinate system used.

My coordinate system is this way:

Cameras look towards z.
Usually, if the object is at the origin, cameras position are somewhere in -z.

Could it be you are assuming an OpenGL-like coordinate system and that it could be the reason it doesn't work?

Here cameras look towards -z.
Cameras are usually located around +z if the object is at the origin.

Any thoughts will be very welcome.

Thanks in advance!

GEOMETRY_ID and APPEARANCE_ID failed to get evaluated

I did try to run the last instruction , where for GEOMETRY_ID , I have given the trained surface model with surface_world_coordinates.ply and same for appearance_id too, even I have changed those generated evals to code and then tried to run the code by specifying the location_path saying evals/dtu_trained_cameras/surface_world_coordinates.ply

But nothing worked for me, can you be bit more clear about the instructions for the execution.
Thank you

ResolvePackageNotFound Problem in Windows

I encountered a "ResolvePackageNotFound" error when attempting to create a conda virtual environment using environment.yml file.

ResolvePackageNotFound:
  - markupsafe==1.1.1=py37h7b6447c_0
  - libarchive==3.3.3=h5d8350f_5
  - tbb==2020.2=hfd86e86_0
  - libedit==3.1.20181209=hc058e9b_0
  - zeromq==4.3.1=he6710b0_3
  - astropy==3.1.2=py37h7b6447c_0
  - wrapt==1.11.1=py37h7b6447c_0
  - glib==2.56.2=hd408876_0
  - jbig==2.1=hdba287a_0
  - unixodbc==2.3.7=h14c3975_0
  - mkl_random==1.0.2=py37hd81dba3_0
  - secretstorage==3.1.1=py37_0
  - pillow==5.4.1=py37h34e0f95_0
  - hdf5==1.10.4=hb1b8bf9_0
  - pyrsistent==0.14.11=py37h7b6447c_0
  - readline==7.0=h7b6447c_5
  - krb5==1.16.1=h173b8e3_7
  - python==3.7.3=h0371630_0
  - pandas==0.24.2=py37he6710b0_0
  - graphite2==1.3.13=h23475e2_0
  - cffi==1.12.2=py37h2e261b9_1
  - intel-openmp==2019.3=199
  - cairo==1.14.12=h8948797_3
  - pyzmq==18.0.0=py37he6710b0_0
  - get_terminal_size==1.0.0=haa9412d_0
  - libsodium==1.0.16=h1bed415_0
  - lxml==4.3.2=py37hefd8a0e_0
  - pixman==0.38.0=h7b6447c_0
  - pycurl==7.43.0.2=py37h1ba5d50_0
  - sqlite==3.27.2=h7b6447c_0
  - libxml2==2.9.9=he19cac6_0
  - numpy==1.16.2=py37h7e9f1db_0
  - pyqt==5.9.2=py37h05f1152_2
  - llvmlite==0.28.0=py37hd408876_0
  - kiwisolver==1.0.1=py37hf484d3e_0
  - mpc==1.1.0=h10f8cd9_1
  - pywavelets==1.0.2=py37hdd07704_0
  - mkl-service==1.1.2=py37he904b0f_5
  - h5py==2.9.0=py37h7918eee_0
  - lzo==2.10=h49e0be7_2
  - ptyprocess==0.6.0=py37_0
  - curl==7.64.0=hbc83047_2
  - bottleneck==1.2.1=py37h035aef0_1
  - lz4-c==1.8.1.2=h14c3975_0
  - libtiff==4.0.10=h2733197_2
  - libtool==2.4.6=h7b6447c_5
  - statsmodels==0.9.0=py37h035aef0_0
  - libstdcxx-ng==8.2.0=hdf63c60_1
  - numexpr==2.6.9=py37h9e4a6bb_0
  - liblief==0.9.0=h7725739_2
  - fontconfig==2.13.0=h9420a91_0
  - libpng==1.6.36=hbc83047_0
  - libxcb==1.13=h1bed415_1
  - pycosat==0.6.3=py37h14c3975_0
  - scipy==1.2.1=py37h7c811a0_0
  - wurlitzer==1.0.2=py37_0
  - py-lief==0.9.0=py37h7725739_2
  - gevent==1.4.0=py37h7b6447c_0
  - mkl_fft==1.0.10=py37ha843d7b_0
  - yaml==0.1.7=had09818_2
  - xz==5.2.4=h14c3975_4
  - bitarray==0.8.3=py37h14c3975_0
  - mpfr==4.0.1=hdf1c602_3
  - mistune==0.8.4=py37h7b6447c_0
  - libgcc-ng==8.2.0=hdf63c60_1
  - gmp==6.1.2=h6c8ec71_1
  - numba==0.43.1=py37h962f231_0
  - freetype==2.9.1=h8a8886c_1
  - tk==8.6.8=hbc83047_0
  - openssl==1.1.1b=h7b6447c_1
  - lazy-object-proxy==1.3.1=py37h14c3975_2
  - greenlet==0.4.15=py37h7b6447c_0
  - blosc==1.15.0=hd408876_0
  - libcurl==7.64.0=h20c2e04_2
  - libssh2==1.8.0=h1ba5d50_4
  - ruamel_yaml==0.15.46=py37h14c3975_0
  - ncurses==6.1=he6710b0_1
  - pytorch==1.2.0=py3.7_cuda10.0.130_cudnn7.6.2_0
  - msgpack-python==0.6.1=py37hfd86e86_1
  - numpy-base==1.16.2=py37hde5b4d6_0
  - patchelf==0.9=he6710b0_3
  - jpeg==9b=h024ee3a_2
  - bzip2==1.0.6=h14c3975_5
  - cytoolz==0.9.0.1=py37h14c3975_1
  - pango==1.42.4=h049681c_0
  - zstd==1.3.7=h0b5b093_0
  - pcre==8.43=he6710b0_0
  - scikit-image==0.14.2=py37he6710b0_0
  - harfbuzz==1.8.8=hffaf4a1_0
  - scikit-learn==0.20.3=py37hd81dba3_0
  - fribidi==1.0.5=h7b6447c_0
  - libffi==3.2.1=hd88cf55_4
  - mkl==2019.3=199
  - pytables==3.5.1=py37h71ec239_0
  - expat==2.2.6=he6710b0_0
  - tornado==6.0.2=py37h7b6447c_0
  - zlib==1.2.11=h7b6447c_3
  - gmpy2==2.0.8=py37h10f8cd9_2
  - torchvision==0.4.0=cuda100py37hecfc37a_0
  - ninja==1.10.1=py37hfd86e86_0
  - libxslt==1.1.33=h7d1a2b0_0
  - psutil==5.6.1=py37h7b6447c_0
  - icu==58.2=h9c2bf20_1
  - sqlalchemy==1.3.1=py37h7b6447c_0
  - dbus==1.13.6=h746ee38_0
  - gst-plugins-base==1.14.0=hbbd80ab_1
  - pyodbc==4.0.26=py37he6710b0_0
  - gstreamer==1.14.0=hb453b48_1
  - libgfortran-ng==7.3.0=hdf63c60_0
  - pycrypto==2.6.1=py37h14c3975_9
  - pyyaml==5.1=py37h7b6447c_0
  - matplotlib==3.0.3=py37h5429711_0
  - qt==5.9.7=h5867ecd_1
  - cryptography==2.6.1=py37h1ba5d50_0
  - fastcache==1.0.2=py37h14c3975_2
  - sip==4.19.8=py37hf484d3e_0
  - cython==0.29.6=py37he6710b0_0
  - snappy==1.1.7=hbae5bb6_3
  - libuuid==1.0.3=h1bed415_2

Upon further investigation, I suspect the underlying issue may be due to OS-specific dependencies causing incompatibilities. Although I modified the environment.yml based on recommendations from this link, the problem persists.

...>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: -
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abor\
failed

Is it possible to run idr on Windows, or should I transition to Linux?

Any suggestions welcome!

Images preprocessing

In the data_convention.md file, the preprocess script says that it generates the needed cameras.npz file.
The code loads an initial cameras.npz file, but how should we build this cameras.npz file to run the preprocess code on new images ?

quantitative comparision against mvsnet

Hi,

I am trying to get a sense of how this method performs compared with MVSNet and its variants. But seems you are using different measures on DTU dataset. So I wonder do you have some comparisons that you can share? Thank you

The method for mask annotation.

Hello, I'm trying to annotate the binary masks for other DTU objects the same way as your IDR work. So may you provide your annotation methods?

Question about the spherical tracing

Hi, thanks for your great work.
I am reading your codes and one thing confuses me.
For the spherical tracing part, accoring to the paper, it should find the first intersection point (which is also nearer) first and if it is not convergent, the algorithm will begin at the another intersection point, which is the farthest point.
However, when I read the codes, the start_point seems to be farther from the camera center than the end_points because the start_points choose the index 0 of "sphere_intersections_points" and the end_points choose index 1.

Because according to "get_sphere_intersection" function,
the index 0 stores (-under_sqrt - ray_cam_dot) and index 1 stores (under_sqrt - ray_cam_dot), which means the tensors on index 0 have larger absolute value.
So it seems that start_points will be farther than the end_points, which is comfusing.
Did I get anything wrong?