Dear Yiling, Thanks for your help! Can you help clarify: <

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

[clarify] 1.create_database , 2.sliding_window.json about view-finding-network HOT 7 CLOSED

speculaas commented on August 25, 2024

[clarify] 1.create_database , 2.sliding_window.json

from view-finding-network.

Comments (7)

speculaas commented on August 25, 2024

"how you created the database?"
Sorry, I may not be clear in previous comment.

What I want to ask is your design idea.
A general picture of:
In the training script,
how images are arranged in the trn.tfrecords
and how score(full image) - score(crop) is computed as the result of arrangement you designed?

Especially,
when putting the image into trn.tfrecords, full image and its crop seemed to be right next to each other,
but when computing the score(full) minus score(crop)
what
q = score(feature_vec)
p = tf.matmul(loss_matrix,q)

these two lines doing is as if the full image and its crop are:

batch_size apart from each other, as the following script seems to say:
loss_matrix[k,k] = 1
loss_matrix[k,k+batch_size] = -1

Best Regard,
JimmyYS

from view-finding-network.

kloppjp commented on August 25, 2024

Hey,

your observation is correct: each entry in the DB has six channels, three for the crop and three for the original. However, these are split and separately queued for batching. At training time, we take out one batch of crops its corresponding batch of originals and concatenate them to a single array, please see also https://github.com/yiling-chen/view-finding-network/blob/master/vfn_train.py#L82

For the sliding window part, @yiling-chen will help you ;)

Jan

from view-finding-network.

speculaas commented on August 25, 2024

Dear Yiling,
After I traced : vfn_train.py and create_dbs.py
again, I can see the how full img and crop img are arranged:

when writing to trn.tfrecords: full and crop are indeed next to each other:
img_comb = (np.append(img_crop, img_full ...
and when doing "def read_and_decode()" , the image_raw is split along axis=2:
return tf.split(image, 2, 2)

and then arranged such that "training_images" contains an array of cropped images followed by an array of full images:
crop, full = read_and_decode(
return tf.concat([crops, fulls], 0)

https://www.tensorflow.org/api_docs/python/tf/split

Embarrassingly, question no. 1 seemed to be trivial, and I can now see clearly how the images are arranged in trn.tfrecords.

What remained to clarify is question no.2

Best Regard,
JimmyYS

from view-finding-network.

speculaas commented on August 25, 2024

Dear Jan,
Thanks for your help,
I didn't see your response while I was clarify my own embarrassingly trivial question no.1
And I see your response now.
Thanks again for helping me so quickly.

from view-finding-network.

yiling-chen commented on August 25, 2024

Hi @speculaas,

sliding_window.json, as its name suggests, is simply sliding windows. :)
It was originally used in my another work.
https://github.com/yiling-chen/flickr-cropping-dataset
Since our goal was to provide a fair benchmark between all baseline image croppers, we used a fixed set of candidate windows to let every baseline pick the best crop and compare the accuracy (with ground truth). Note that to enhance the performance of an image cropper, you are welcome to apply more advanced methods to generate good proposal windows before feeding them into the image croppers.

You can find a sample implementation of generating the sliding windows on-the-fly and evaluate an image cropper with saliency map here.
https://github.com/yiling-chen/flickr-cropping-dataset/blob/master/baselines/saliency_crop.py

from view-finding-network.

speculaas commented on August 25, 2024

Dear Yiling,
Thanks for your pointer!
Your response is exactly what I was looking for.

I think what I had in mind is a model to suggest a good composition for a camera user.

BR,
JimmyYS

from view-finding-network.

speculaas commented on August 25, 2024

Sorry, my reponse seemed imcomplete:

"I think what I had in mind is a model to suggest a good composition for a camera user."
As a result, when I found your paper, I was looking not only for a ranker, but also something like crop generator. And then I saw the crop are pre-generated.
And thanks for your clarification, I see that:

how the crops can be generated,
and 2. these pre-generated crops also serve as a fair benchmark

from view-finding-network.

[clarify] 1.create_database , 2.sliding_window.json about view-finding-network HOT 7 CLOSED

Comments (7)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent