yashkant / sam-textvqa Goto Github PK

View Code? Open in Web Editor NEW

62.0 62.0 13.0 1010 KB

Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.

Home Page: https://yashkant.github.io/projects/sam-textvqa

Python 97.13% Shell 0.13% C 2.74%

eccv language textvqa vision

sam-textvqa's People

Contributors

Stargazers

Watchers

Forkers

cjs0410 amy-hyunji jonpoveda junj1ehx namnaku87 originofamonia vhzy codewithflycat haolibai shwetkm b-matchlsr donglongzi yanfang-research

sam-textvqa's Issues

Question about data files from Dropbox link

Hi!
Is the obj.lmdb file in the Dropbox link the feature extracted by the ResNeXT-152 based Faster R-CNN model?And the ocr.Imdb file feature extracted by the Google OCR?

Missing a file in datasets on Dropbox

Hello, is there missing a data.mdb file in data/stvqa/stvqa_test_obj.lmdb? Thanks for your attention.

error about cphoc

Hello,

I was trying to run the code, but encountered this issue:
File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/qiyuan/2022spring/sam-textvqa/train.py", line 18, in <module> from evaluator import Evaluator File "/home/qiyuan/2022spring/sam-textvqa/evaluator.py", line 11, in <module> from sam.datasets.metrics import STVQAANLSEvaluator, TextVQAAccuracyEvaluator File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/__init__.py", line 1, in <module> from .stvqa_dataset import STVQADataset File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/stvqa_dataset.py", line 7, in <module> from sam.datasets.textvqa_dataset import ImageDatabase, TextVQADataset File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/textvqa_dataset.py", line 14, in <module> from .processors import * File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/processors.py", line 81, in <module> from ..phoc import build_phoc File "/home/qiyuan/2022spring/sam-textvqa/sam/phoc/__init__.py", line 1, in <module> from .build_phoc import build_phoc # NoQA File "/home/qiyuan/2022spring/sam-textvqa/sam/phoc/build_phoc.py", line 3, in <module> from .cphoc import build_phoc as _build_phoc_raw ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

Then I tried to run the compile.sh, but encountered this error:
cphoc.c:1:10: fatal error: Python.h: No such file or directory 1 | #include <Python.h> | ^~~~~~~~~~ compilation terminated. gcc: error: cphoc.o: No such file or directory rm: cannot remove 'cphoc.o': No such file or directory

So how to solve this phoc related issue?

Regards,
Qiyuan

Visual Grounding

Hi,
As mentioned in your paper, is it possible to the share the code for the task of visual grounding using your model or point me in the right direction.
Thanks

Question about reproduce result.

I reproduce the baseline tvqa-c3 and the final accuracy is about 42.70% on the validation set. But it is reported the 43.9% on val set in the paper. Are there any details that I ignored? Or what is the reason for that?

Questions about the code

Hi,
I read your code carefully, but I have questions about some of the code.

attention_mask_quadrants: [1,2]
I don't know what's the meaning of "attention_mask_quadrants" in .yml files. Do you mean to stop paying attention to the relationship between 1 and 2
self.matrix_type_map
self.matrix_type_map = { "none": "1", "share3": "3", "share5": "5", "share7": "7", "share9": "9", }
If I set it to none, does that mean focusing only on the one relationship, or on all, like the traditional transformers

Thanks!

question about training the model?

Hi,thanks for open source your code.
I run the code on my server with 62G memory.After running for a while, the training was interrupted.
I found a similar phenomenon in the previous issue:
#2 (comment)
I wonder how much memory is needed to train this model?
Also,should I convert the dataset into npy files?

What's the format of box?

Hello,

Thanks for your code. I was trying to plot boxes on images. What's the format of boxes? e.g.: xyxy or xywh or something else?

Regards.

ModuleNotFoundError: No module named 'sam.phoc.cphoc'

after I configure the environment, the code ran with an error:
ModuleNotFoundError: No module named 'sam.phoc.cphoc'
how can i run the code in windows? thanks!

Visualization results from prediction

Hello, good work, thanks for open source your code! Right now I want to visualize the question, image and predicted answer. I am wondering whether there is any function/demo file which takes in an image and gives out the prediction for that image. Thanks.

Unable to download dataset

Under folder sam_textvqa/data/pretrained_models, best_model.tar file is corrupt.

Request code for improved M4C baseline in paper

Hi sir, can you share the code for "improved M4C baseline" mentioned in the paper? 🙏

Question regarding beam search

Hey Yash,

I noticed that you have turned off beam search in your code. Can you share what is the problem with the beam search code in the repo?

Thanks

Issue running textvqa dataset on AWS EC2 instance

Hi,
I'm trying to run the pretrained model on an AWS EC2 instance. I'm running it on a g4dn.4xlarge instance with 64GB of RAM and 500GB of disk space. I was trying to run the evaluation command but my process got killed. I was running with the num_workers =0. Once I tried to rerun the command, I got an EOF error. I was wondering if you had any ideas of where my problem could be.

I ran this command: python train.py --config configs/train-tvqa-eval-tvqa-c3.yml --pretrained_eval data/pretrained-models/best_model.tar

It loaded this:

Made it all the way here and then the process was killed:

Then when I tried to run the same command, I got this error:

I thought it might be a memory error but I'm not sure.

Thank you for your consideration.

About the missing pre-processed data of STVQA

How to extract frcnn features for rotated OCR bounding boxes

Google Cloud OCR outputs rotated bounding boxes. How do you extract FRCNN features for these bounding boxes (rotated) of the detected OCR text.
Thanks

Unable to run your code in Colab

Hi Mr @yashkant
First of all thanks for your great code.
I decided to run your code with Colab. So, I started with installing requirement packages, loading data and code in Colab as following image that I attached.

But when I run your project with following code, it gives me below error. Should I change config file? If yes, I changed the code as below, but I can’t solve this problem.

!python /content/samtextvqa/train.py --config /content/samtextvqa/configs/train-tvqa-eval-tvqa-c3.yml --pretrained_eval /content/samtextvqa/data/pretrained-models/best_model.tar

Error that show me.
File "/content/samtextvqa/sam/datasets/_image_features_reader.py", line 66, in __init__ self.env = lmdb.open( lmdb.Error: /samtextvqa/data/textvqa/tvqa_trainval_obj.lmdb: No such file or directory

Changed code in config file.
/content/samtextvqa/data/textvqa/tvqa_{}_obj.lmdb

**I'd be so appreciated if you could help me to solve this problem in running. **

Regards

where is save/debug/command.txt

Q1. What should I do to solve the problem?

after I finish all settings, I want to try pretrained-models.
So, I ran the below code
python train.py \ --config configs/train-tvqa_stvqa-eval-tvqa-c3.yml \ --pretrained_eval data/pretrained-models/best_model.tar
I get the error message

(click the photo, and you see the bigger photo.)

Q2. What should I do to solve the problem?

How can I get "wiki.en.bin"?

So, I ran the below code.
python train.py \ --config configs/train-stvqa-eval-stvqa-c3.yml \ --tag debug
and I have trouble with fastText.

I and my colleague have to trouble with Q2!
(Warning : load_model does not return WordVectorModel or SupervisedModel any more, but a FastText object which is very similar.)

How to get the processed lmdb data from raw images?

Thx for the nice repo, I want to add new jpg images to train the model, but I can't find any preprocess code in the repo. Can you tell me how to do it or any repo I can refer? thank you

Error when running with newer version of python

Hi @yashkant and @junj1ehx
You have runned this project with 2 titan gpu and python 3.6
Since version of python is 3.6, so Cuda should be 10 or less for computer despite couldn't detecting GPU.
But now I want to run this project with RTX 3090 (24 ram) Cuda for this nvidia should be more than 11 cause otherwise couldn't detected GPU in that program.
Now I run this project with cuda 12, python 3.8 and pythorch with cuda 11.7 when it runned gpu was detected but I have following error.

    from .cphoc import build_phoc as _build_phoc_raw
ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

What should I do?
How can I convert python to newer version from 3.6 to 3.8 or 3.9 and that project run correctly?