Comments (3)
And another thing that confuses me is that "scaled by a factor of 5" in the paper, is that implemented in the code? I could not find that.
from zero123plus.
- Original parameters; LoRA would barely work if the rank is as low as 4, but can work to some extent if the rank is more than 64.
- ControlNet generally assume local spatial correlations between input condition and output image, so you will need multi-view control images; for off-the-shelf ones you may want to try https://github.com/haofanwang/ControlNet-for-Diffusers; I am not familiar with T2I-Adapter so I cannot tell now. The training process is after the training of UNet, the same as regular ControlNets.
- It is not possible; in general you will need other tricks such as Gaussian blob initialization (I am referring to the Instant3D paper by Adobe) to provide more global clues if you do not change the schedule. I think changing the schedule is the principly-correct way to go though. Epsilon models can be used to enhance local details, which is one of the future works I mentioned in Zero123++ report and I am currently working on.
from zero123plus.
By default the SD VAE output needs to be rescaled by about 0.18 (vae.config.scaling_factor
) before sending into diffusion; we skipped that step for the condition branch (so it is roughly scaled by a factor of 5), and we have an extra function called scale_latents
that normalizes the residual by shifting and rescaling the latents according to statistics we compute with Objaverse renders.
from zero123plus.
Related Issues (20)
- Is there an approach to remove the gray background purely? HOT 4
- camera pose HOT 1
- scaling about reference attention HOT 2
- training data HOT 1
- camera FOV HOT 1
- something about trainning detail.. HOT 2
- training resolution 320^2 instead of 512^2? HOT 2
- depth control for v1.2 ? HOT 1
- RuntimeError: "LayerNormKernelImpl" not implemented for 'Half' HOT 1
- About the object normalization in v1.2 HOT 2
- How to align the coordinate system of zero123 in threestudio?
- Would it be feasible to provide more than one view angle as input to better the quality of the outputs.
- Can I get train code?
- How to change Camera Parameters of output view?
- Depth map size ratios for depth ControlNet
- About dataset
- img_to_mv.py without cuda HOT 1
- The difference between v1.2 and v1.1 in the network architecture
- CUDA out of memory. Tried to allocate error HOT 1
- would you release the training code in the future?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zero123plus.