Comments (3)
Hi,
Question 1
I want to know what are considered as positional encoding while working with images.
Positional encoding takes a xy coordinate in [0, 1] and convert the xy into a vector of 256 elements. The encoding for x and y are the same, so for the sake of simplicity let's only look at the x part.
In
detr/models/position_encoding.py
Lines 30 to 31 in 0fb754c
we create an image tensor which is similar in spirit to
meshgrid
, but that supports images with different sizes (read masks) in each batch. This way, we have a grid of xy, which we normalize afterwards so that they are between 0 and 1 (in this case, we scale by 2 * pi
as well, but that's a detail) detr/models/position_encoding.py
Lines 32 to 35 in 0fb754c
Then, in
detr/models/position_encoding.py
Lines 40 to 43 in 0fb754c
The positional embeddings only depend on the feature map shapes and the masks (as there could be padding between different images), and not on the content of the feature maps.
Question 2
How do you calculate masks when using images in transformers?
Those are calculated in
Line 299 in 0fb754c
Basically, everything that corresponds to zero padding the image so that they have the same size are filled with
True
for the mask.
I believe I have answered your questions, and as such I'm closing the issue, but let us know if you have further questions.
from detr.
Perfect explanation!
from detr.
Hi,
Question 1
I want to know what are considered as positional encoding while working with images.
Positional encoding takes a xy coordinate in [0, 1] and convert the xy into a vector of 256 elements. The encoding for x and y are the same, so for the sake of simplicity let's only look at the x part.
Indetr/models/position_encoding.py
Lines 30 to 31 in 0fb754c
we create an image tensor which is similar in spirit to
meshgrid
, but that supports images with different sizes (read masks) in each batch. This way, we have a grid of xy, which we normalize afterwards so that they are between 0 and 1 (in this case, we scale by2 * pi
as well, but that's a detail)
detr/models/position_encoding.py
Lines 32 to 35 in 0fb754c
Then, in
detr/models/position_encoding.py
Lines 40 to 43 in 0fb754c
we apply standard sine embedding in a vectorized fashion for x and y separately, and concatenate them afterwards for x and y, yielding the spatial positional embedding.
The positional embeddings only depend on the feature map shapes and the masks (as there could be padding between different images), and not on the content of the feature maps.Question 2
How do you calculate masks when using images in transformers?
Those are calculated in
Line 299 in 0fb754c
Basically, everything that corresponds to zero padding the image so that they have the same size are filled with
True
for the mask.
I believe I have answered your questions, and as such I'm closing the issue, but let us know if you have further questions.
Hi @fmassa , I have one doubt, For positional encoding sine what is the input format. is tensor_list.mask kind of 0's and 1's where 1 is bounding box area and 0 is outer the bbox. so using that mask we are finding positional embedding is that right.
I have implemented the position encoding for my project where to extract spatial positional features. currently I just used one hot encoder by dividing the image into a grid, so if the bounding box is overlap the grid make it has one and if not zero and s o on. but I encountered this sine positional encoding so planning to add this positional encoding. and if possible please explain what's the difference between one hot encoding with grid and this positional encoding
Thanks
from detr.
Related Issues (20)
- if object query is random, as shown in code, how to evaluate to get a steady result? HOT 1
- unable to download annoations from the main readme.md
- using vit as image backbone HOT 1
- Keyerror: image_id (training detr on custom dataset
- How to train with a custom dataset on mac m2?
- continuously growing memory
- Question about object queries. HOT 4
- I want to train the DETR model on a CPU. How can I make it possible on a small computer, 8gb RAM HOT 3
- Why positional encoding is added to different role in encoder and decoder. HOT 1
- 🐛 Bug: Architecture diagram in README.md renders incorrectly when using dark mode
- continue training with chekckpoint
- How to finetune DETR for semantic segmentation task?
- I do not understand what the mask meaning in "samlpes"
- Process finished with exit code 137 (interrupted by signal 9: SIGKILL)Please read & provide the following
- Very low performance for segmentation task.
- box_cxcywh_to_xyxy
- ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 6 (pid: 257736) of binary: /home/public/anaconda3/envs/DL/bin/python
- Average Precision of each class for best epoch and then it's mean HOT 1
- the mAP is chage
- I think there are some errors in the posted code HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detr.