heyoeyo / muggled_dpt Goto Github PK

View Code? Open in Web Editor NEW

47.0 47.0 4.0 4.07 MB

Muggled DPT: Depth estimation without the magic

License: Apache License 2.0

Python 100.00%

deeplearning depth-estimation educational-project monocular-depth-estimation

muggled_dpt's People

Contributors

Stargazers

Watchers

Forkers

depth-estimation zmqzhm eacsai

muggled_dpt's Issues

Any consideration to support the Metric Depth models of DepthAnything?

Hi
Ill like to first thank you for the work you did. Was struggling to get proper live-camera inference to work with sufficient fps but your work made it much faster.
I had successfully tired out using the VITSmall model but when i wanted to attempt using the Indoor_Metric_Depth model (from here) i get this error: "NotImplementedError: Bad model type: unknown, no support for this yet!"

Is there work being done to add support to the metric depth models?

No Issue -- Just wanted to say loved the title "Muggled" 👍

Size of depth .npy file

Hi,

Thanks for your amazing work.

I have one question. I am trying the fusion scaling demo on a 640x480 image. However, the saved depth .npy file tells that its size is 504x364.

Is this because the image_preprocessor of the DPT to fit a 14px patch?

If so, how do we get the depth matching the raw size.

Thanks for your help in advance.

Just interested to know about training code

Hey! very nice to see your prediction , is there possible to share training code and any idea , if we like to add custom depthmap data set training on there ?

confusion about network architecture for depth anything v2?

Thank you for your awesome project!
But, I am confused about which you write comments to extract features from the last 4 blocks in depth_anything_v2 architecture.

muggled_dpt/lib/v2_depthanything/image_encoder_model.py

Lines 40 to 41 in 1961e57

  Note that unlike the original DPT models, this model does not output intermediate tokens 

  from the transformers, instead it outputs the last 4 (consecutive) blocks!

However, in the code implementation, you average the features from the middle not last(depth anything v1 is like this). Is this an issue to fix the comment?

muggled_dpt/lib/v2_depthanything/image_encoder_model.py

Lines 60 to 66 in 1961e57

 num_stages= 4 

 layers_per_stage = int(round(num_blocks / num_stages)) 

 stages_list= [] 

 for _ in range(num_stages): 

 one_stage = TransformerStage(layers_per_stage, features_per_token, num_heads) 

 stages_list.append(one_stage) 

 self.stages = nn.ModuleList(stages_list)

And, this feature extraction method is better or not than output intermediate tokens from the transformers like the original DPT model?

heyoeyo / muggled_dpt Goto Github PK

muggled_dpt's People

Contributors

Stargazers

Watchers

Forkers

muggled_dpt's Issues

Any consideration to support the Metric Depth models of DepthAnything?

No Issue -- Just wanted to say loved the title "Muggled" 👍

Size of depth .npy file

Just interested to know about training code

confusion about network architecture for depth anything v2?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	Note that unlike the original DPT models, this model does not output intermediate tokens
	from the transformers, instead it outputs the last 4 (consecutive) blocks!

	num_stages= 4
	layers_per_stage = int(round(num_blocks / num_stages))
	stages_list= []
	for _ in range(num_stages):
	one_stage = TransformerStage(layers_per_stage, features_per_token, num_heads)
	stages_list.append(one_stage)
	self.stages = nn.ModuleList(stages_list)