The following code generates a position embedding of shape

Confused about the shape of relative position encoding about axial-deeplab HOT 4 OPEN

csrhddlam commented on August 13, 2024

Confused about the shape of relative position encoding

from axial-deeplab.

Comments (4)

csrhddlam commented on August 13, 2024

Does (C, W) mean global position encoding?
Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

from axial-deeplab.

Jensen-Su commented on August 13, 2024

Does (C, W) mean global position encoding?
Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

Thanks for your helpful reply. I did mean global position encoding by (C, W).
I am also confused about the following line:

axial-deeplab/lib/models/axialnet.py

Line 44 in fe1d052

relative_index = key_index - query_index + kernel_size - 1

The confusion is that, since all the position encodings are initialized randomly, I expected that whatever orders we index the relative encoding should result in similar results. So maybe we can index it with a simpler way. But clearly you don't think so by using this relative_index. What do I miss?

from axial-deeplab.

phj128 commented on August 13, 2024

They are randomly initialized, but for different position they have different relative positional encoding while the same relative distance ones should share the weights.

from axial-deeplab.

mcahny commented on August 13, 2024

Note that our positional encoding is relative and shared across the other axis. For example, in w-axis attention, each pixel corresponds to W other pixels and W relative positions, so there are (W, W) relative positional encodings in total.

If the span-size K is smaller than the width W, then do we have the size of (C,W,K) for the relative position encoding matrix?
So that it's einsumed with the query like Q (H,(W,C)) * r^q ((W,C),K) -> A (H,W,K)? (A: attention matrix)

from axial-deeplab.

Recommend Projects

Confused about the shape of relative position encoding about axial-deeplab HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent