Thanks for this great piece of work! When I change the template code from 'normal' to

size mismatch for decoder_output.0.weight and decoder_output.0.bias about midlevel-reps HOT 3 CLOSED

alexsax commented on September 21, 2024

size mismatch for decoder_output.0.weight and decoder_output.0.bias

from midlevel-reps.

Comments (3)

alexsax commented on September 21, 2024

Hi @IsaacKam--thanks for raising this issue!

Good catch here. It looks like for the segmentation decoders I accidentally set the wrong default output size. The output size should be 64 channels, not 128. I just pushed a fix that you can apply on your end by running the following:

pip uninstall visualpriors
pip install https://github.com/alexsax/midlevel-reps/archive/visualpriors-v0.3.1.zip

Aside from the above shape issues, I also want to note that I imagine the decodings are primarily useful for debugging. Visualizing those outputs will give you confidence that everything is working correctly.

For learning, though I've found the encodings to be generally more useful than the decodings. This is because the encodings all have a homogeneous shape (8 x 16 x 16), while the decodings can take various forms. For example segment_unsup2d produces a 64-channel image, while class_object is a 1000-dimensional vector. And using the encodings doesn't really sacrifice anything: I've anecdotally found that downstream performance using the encodings is usually at least as good, if not better, than using the decodings.

I'm closing this issue for now, but if the above doesn't solve your problem then please feel free to reopen.

from midlevel-reps.

IsaacKam commented on September 21, 2024

This is really useful :), thank you for the prompt reply. For learning from the encodings, what would you recommend as the best way to utilise them. I.e would you flatten it at this point and apply linear layers or is there a benefit of apply some conv layers here.

from midlevel-reps.

IsaacKam commented on September 21, 2024

Hi Alex, it now seems to output a torch.Size([1, 64, 256, 256]) tensor when i use 'segment_unsup2d' is this supposed to be correct, if so what do the channels represent (different segments?)

from midlevel-reps.

size mismatch for decoder_output.0.weight and decoder_output.0.bias about midlevel-reps HOT 3 CLOSED

Comments (3)

Related Issues (9)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent