Git Product home page Git Product logo

Comments (10)

zhuziyu-edward avatar zhuziyu-edward commented on July 18, 2024

Hi, thank you for your interest in our work. In these 3D-VL tasks evaluation, there are two settings, results using ground-truth mask and results using predicted mask. On benchmarks like Sr3D, Nr3D, you should use gt mask for evaluation. On benchmarks like ScanRefer, ScanQA, Scan2Cap, SQA3D, you should set pc_type to pred.

Best,
Ziyu

from 3d-vista.

dingjiansw101 avatar dingjiansw101 commented on July 18, 2024

Hi Ziyu,

Thanks for your reply. However, I found that the pc_type is set to "gt" during training and testing in the "ScanQA" task.

Best,
Jian Ding

from 3d-vista.

zhuziyu-edward avatar zhuziyu-edward commented on July 18, 2024

Yes, setting pc_type to "gt" will make testing faster(for checking model performance only). If you want to submit the result to benchmark, you should change it to "pred" for comparison with other paper.

Best,
Ziyu

from 3d-vista.

dingjiansw101 avatar dingjiansw101 commented on July 18, 2024

Are the data for pc_type "pred" read from the path "./data/scanfamily/save_mask"? How to generate such files? Are they predicted by 3dvista or other models?

Best,
Jian Ding

from 3d-vista.

zhuziyu-edward avatar zhuziyu-edward commented on July 18, 2024

They are predicted by mask3d segmentation model. These masks can be found in this issue #12.

Best,
Ziyu

from 3d-vista.

dingjiansw101 avatar dingjiansw101 commented on July 18, 2024

Thanks for the reply.
Have you fine-tuned the mask3d according to the labels of scanqa, or just used the pretrained mask3d model?

Best,
Jian Ding

from 3d-vista.

dingjiansw101 avatar dingjiansw101 commented on July 18, 2024

An additional question, have you evaluated on the task "Object localization performance on the ScanQA dataset"?

from 3d-vista.

zhuziyu-edward avatar zhuziyu-edward commented on July 18, 2024

Hi

  1. We use pre-trained mask3d from their repo(ScanNet200 checkpoint).
  2. In our implementation, object localization accuracy is around 56% using the ground-truth mask in ScanQA.

Best,
Ziyu

from 3d-vista.

dingjiansw101 avatar dingjiansw101 commented on July 18, 2024

Hi Ziyu,
Thanks for your reply. I still have a few questions since I am new to this area.

  1. What is the meaning of "using the ground-truth mask"? Did you use the ground truth mask just for evaluation, or send the ground truth mask to the model and just predict the localization scores? Is 56% under the metric [email protected]?
  2. Have you included the evaluation code for object localization in the repo?
  3. An additional question, the scanqa paper found that the object localization loss is helpful for the qa task. Did you have the similar finding?
  4. I found it seems that you used 607 raw categories from scannetv2. However, in the scanqa paper, they used 18 categories. I am confused about this. Are the 18 categories merged from the 607 categories? And will the category difference influence the qa performance?
    Best,
    Jian Ding

from 3d-vista.

zhuziyu-edward avatar zhuziyu-edward commented on July 18, 2024

Hi

  1. means "setting pc to gt" to evaluate localization acc regardless of IoU.
  2. Yes, it is in "eval_qa" function, "og_acc" metric.
  3. We did not conduct rigorous experiments to study the effect of this localization loss, it is usually used in ScanQA to prevent overfitting. Thus, we follow this setting and add this loss.
  4. Raw 607 class is the full set of ScanNet semantics. 607 class can be merged to 18 categories(ScanNet20, remove wall and floor) or 200 categories(ScanNet200).

Best,
Ziyu

from 3d-vista.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.