yashkant / sam-textvqa Goto Github PK
View Code? Open in Web Editor NEWOfficial code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
Home Page: https://yashkant.github.io/projects/sam-textvqa
Official code for paper "Spatially Aware Multimodal Transformers for TextVQA" published at ECCV, 2020.
Home Page: https://yashkant.github.io/projects/sam-textvqa
Hi!
Is the obj.lmdb file in the Dropbox link the feature extracted by the ResNeXT-152 based Faster R-CNN model?And the ocr.Imdb file feature extracted by the Google OCR?
Hello, is there missing a data.mdb file in data/stvqa/stvqa_test_obj.lmdb? Thanks for your attention.
Hello,
I was trying to run the code, but encountered this issue:
File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/qiyuan/2022spring/sam-textvqa/train.py", line 18, in <module> from evaluator import Evaluator File "/home/qiyuan/2022spring/sam-textvqa/evaluator.py", line 11, in <module> from sam.datasets.metrics import STVQAANLSEvaluator, TextVQAAccuracyEvaluator File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/__init__.py", line 1, in <module> from .stvqa_dataset import STVQADataset File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/stvqa_dataset.py", line 7, in <module> from sam.datasets.textvqa_dataset import ImageDatabase, TextVQADataset File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/textvqa_dataset.py", line 14, in <module> from .processors import * File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/processors.py", line 81, in <module> from ..phoc import build_phoc File "/home/qiyuan/2022spring/sam-textvqa/sam/phoc/__init__.py", line 1, in <module> from .build_phoc import build_phoc # NoQA File "/home/qiyuan/2022spring/sam-textvqa/sam/phoc/build_phoc.py", line 3, in <module> from .cphoc import build_phoc as _build_phoc_raw ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory
Then I tried to run the compile.sh
, but encountered this error:
cphoc.c:1:10: fatal error: Python.h: No such file or directory 1 | #include <Python.h> | ^~~~~~~~~~ compilation terminated. gcc: error: cphoc.o: No such file or directory rm: cannot remove 'cphoc.o': No such file or directory
So how to solve this phoc related issue?
Regards,
Qiyuan
Hi,
As mentioned in your paper, is it possible to the share the code for the task of visual grounding using your model or point me in the right direction.
Thanks
I reproduce the baseline tvqa-c3 and the final accuracy is about 42.70% on the validation set. But it is reported the 43.9% on val set in the paper. Are there any details that I ignored? Or what is the reason for that?
Hi,
I read your code carefully, but I have questions about some of the code.
attention_mask_quadrants: [1,2]
self.matrix_type_map
self.matrix_type_map = { "none": "1", "share3": "3", "share5": "5", "share7": "7", "share9": "9", }
Thanks!
Hi,thanks for open source your code.
I run the code on my server with 62G memory.After running for a while, the training was interrupted.
I found a similar phenomenon in the previous issue:
#2 (comment)
I wonder how much memory is needed to train this model?
Also,should I convert the dataset into npy files?
Hello,
Thanks for your code. I was trying to plot boxes on images. What's the format of boxes? e.g.: xyxy or xywh or something else?
Regards.
after I configure the environment, the code ran with an error:
ModuleNotFoundError: No module named 'sam.phoc.cphoc'
how can i run the code in windows? thanks!
Hello, good work, thanks for open source your code! Right now I want to visualize the question, image and predicted answer. I am wondering whether there is any function/demo file which takes in an image and gives out the prediction for that image. Thanks.
Under folder sam_textvqa/data/pretrained_models, best_model.tar file is corrupt.
Hi sir, can you share the code for "improved M4C baseline" mentioned in the paper? π
Hey Yash,
I noticed that you have turned off beam search in your code. Can you share what is the problem with the beam search code in the repo?
Thanks
Hi,
I'm trying to run the pretrained model on an AWS EC2 instance. I'm running it on a g4dn.4xlarge instance with 64GB of RAM and 500GB of disk space. I was trying to run the evaluation command but my process got killed. I was running with the num_workers =0. Once I tried to rerun the command, I got an EOF error. I was wondering if you had any ideas of where my problem could be.
I ran this command: python train.py --config configs/train-tvqa-eval-tvqa-c3.yml --pretrained_eval data/pretrained-models/best_model.tar
Made it all the way here and then the process was killed:
Then when I tried to run the same command, I got this error:
I thought it might be a memory error but I'm not sure.
Thank you for your consideration.
Google Cloud OCR outputs rotated bounding boxes. How do you extract FRCNN features for these bounding boxes (rotated) of the detected OCR text.
Thanks
Hi Mr @yashkant
First of all thanks for your great code.
I decided to run your code with Colab. So, I started with installing requirement packages, loading data and code in Colab as following image that I attached.
But when I run your project with following code, it gives me below error. Should I change config file? If yes, I changed the code as below, but I canβt solve this problem.
!python /content/samtextvqa/train.py --config /content/samtextvqa/configs/train-tvqa-eval-tvqa-c3.yml --pretrained_eval /content/samtextvqa/data/pretrained-models/best_model.tar
Error that show me.
File "/content/samtextvqa/sam/datasets/_image_features_reader.py", line 66, in __init__ self.env = lmdb.open( lmdb.Error: /samtextvqa/data/textvqa/tvqa_trainval_obj.lmdb: No such file or directory
Changed code in config file.
/content/samtextvqa/data/textvqa/tvqa_{}_obj.lmdb
**I'd be so appreciated if you could help me to solve this problem in running. **
Regards
after I finish all settings, I want to try pretrained-models.
So, I ran the below code
python train.py \ --config configs/train-tvqa_stvqa-eval-tvqa-c3.yml \ --pretrained_eval data/pretrained-models/best_model.tar
I get the error message
(click the photo, and you see the bigger photo.)
So, I ran the below code.
python train.py \ --config configs/train-stvqa-eval-stvqa-c3.yml \ --tag debug
and I have trouble with fastText.
I and my colleague have to trouble with Q2!
(Warning : load_model
does not return WordVectorModel or SupervisedModel any more, but a FastText
object which is very similar.)
Thx for the nice repo, I want to add new jpg images to train the model, but I can't find any preprocess code in the repo. Can you tell me how to do it or any repo I can refer? thank you
Hi @yashkant and @junj1ehx
You have runned this project with 2 titan gpu and python 3.6
Since version of python is 3.6, so Cuda should be 10 or less for computer despite couldn't detecting GPU.
But now I want to run this project with RTX 3090 (24 ram) Cuda for this nvidia should be more than 11 cause otherwise couldn't detected GPU in that program.
Now I run this project with cuda 12, python 3.8 and pythorch with cuda 11.7 when it runned gpu was detected but I have following error.
from .cphoc import build_phoc as _build_phoc_raw
ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory
What should I do?
How can I convert python to newer version from 3.6 to 3.8 or 3.9 and that project run correctly?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.