Git Product home page Git Product logo

grf's People

Contributors

alextrevithick avatar yang7879 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

grf's Issues

Out of memory (OOM)

My GPU has 24GB.
I decrease parameters as you said.
--chunk [number of rays processed in parallel, decrease if running out of memory] --netchunk [number of pts sent through network in parallel, decrease if running out of memory]
But it is useless even if these two parameters are set to one.

Positional Encoding

I'm using the positional encoder found in the NERF paper to encode my images after stacking the view points on the colors as mentioned in the paper however I'm unable to get the shapes to line up for input into the CNN. In NERF's implementation they flatten before sending their inputs to encoder.

To give a more concrete example of what I'm talking about here is my code

# reshape inputs to [20, 378, 504, 6] concatenating view to colorspace
inputs = torch.tensor(np.concatenate([images, np.broadcast_to(np.expand_dims(C, (1,2)), images.shape)], axis=-1))
# create embedder with length 5 as specified in the paper
embed, input_ch = get_embedder(5, 0)
# flatten. not sure if this step is required shape of [3810240, 6]
inputs_flat = torch.reshape(inputs, [-1, inputs.shape[-1]])
# apply embedding for a output shape of  [3810240, 66]
embedding = embed(inputs)

Not really sure where to go from here to submit to the CNN.

Confusion on section 3.3

I'm rather confused on this section because you cite P as this function based on multi view geometry and then describe two approximations. Do these approximations represent P? Also I am confused about how to implement these approximations outside of checking inside and outside of the image, specifically how do you "duplicate its features to the 3D point"?

How to get the unseen category/scene result as section 4.2, 4.3 and How to train several classes together for generlization?

Hello! thanks for sharing your great art work!
I wonder how to get the rendering result of the unseen category/scene.
according to 4.3, it says "We train a single model on randomly selected 4 scenes, i.e., Chair, Mic, Ship, and Hotdog, ...." and I wonder how to train several classes in each image batch(like all Nerf Synthetic datasets together to get the generalized GRF model). I think there are configs which contain only one single class for each config text file.
I guess, if a batch contains 4 classes and each class has 8 views, then the batch has 32 images in total. Or, a batch could contain 32 images of only one single class and it sequencially see all classes one by one.
Can you describe your train configuration for generalization in more detail?

Plus, If 2 views or 6 views are fed, only 2/6images are input? or corresponding poses are also required?
or can you provide us all the required data for section 4.1~4.3 if possible(it must be the best for me to understand it clearly)?

Please give me some hint. ;)

Cheers!

Question about the CNN model

Hi, Alex
I notice that different CNN models are used for different datasets. I wonder if there were some special considerations when designing the CNNs. And if I want to design a CNN for my dataset, what should I pay attention to?
Thanks!

About Intrinsic Matrix

May I ask how to get the intrinsic matrix of a photo if I want to use my own data to train GRF? And without the intrinsic matrix, will the performance of GRF significantly degrade?

Code release time

Thanks for sharing this very interesting work! Do you have an estimate when the code will be released? I'm thinking about whether I should wait or start implementing myself.

How do you organize your dataset directory?

I noticed that when using the shapenet data set for training, your dataset loading module uses path like “train” and "train_val", but this is inconsistent with the raw dataset which Vincent provides. May I ask how do you organize your project’s dataset folders? Thank you in advance.

Training time

Hi, Alex
Can you show me how much time is needed for training on the ShapeNetv2 and other datasets?

A question about the section 3.5

In the figure 6 in section 3.5, the input of the MLP is 3D point feature and viewpoint (x,y,z) (correspond to the 3-D posotion in the classical NeRF). I wonder whether the 2-D direction is required is needed for the input of the MLP?

Some questions about the paper

Hi, thanks for the great work!

I have some questions:

  1. How much is the computational overhead introduced by the CNN feature extraction? At inference maybe it's not that much because we only need to do 1 forward pass for each image and store the features in a buffer, but at training, we need to perform it on the entire images at each iteration, and we only train on a very little portion (800-1000 rays), so I wonder isn't it somewhat inefficient and slow, or maybe you have some outstanding implementation to accelerate this part.
  2. As for the generalization, is it correct to understand that it only generalizes to objects within the same class (experiments on shapenetv2) with very similar visual and pose settings? For example, if we train on 7 NeRF-synthetic scenes, does it generalize to the 8th?

A question on Section 4.1 (SHAPENETV2)

Hey, thanks for showing the great work.
I have a question on Figure 4,

"3) To further demonstrate the advantage of GRF over SRNs, we directly evaluate the trained SRNs model on unseen objects (of the same category) without retraining. For comparison, we also directly evaluate the trained GRF model on the same novel objects. Figure 4 shows the qualitative results. It can be seen that if not retrained, SRNs completely fails to".

According to that the SRNs model does not train on unseen objects, the latent code z for unseen object is not optimized, so is it randomly initialized? Then, how can it generates novel views of unseen cars similar to GT views? My understanding is that the randomly initialized latent code z may generate unpredictable cars, but similar to training set, which seems to be conflict with the above quoted words. It confuses me for hours.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.