Git Product home page Git Product logo

sam-textvqa's People

Contributors

junj1ehx avatar yashkant avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

sam-textvqa's Issues

Question about data files from Dropbox link

Hi!
Is the obj.lmdb file in the Dropbox link the feature extracted by the ResNeXT-152 based Faster R-CNN model?And the ocr.Imdb file feature extracted by the Google OCR?

error about cphoc

Hello,

I was trying to run the code, but encountered this issue:
File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 265, in run_path return _run_module_code(code, init_globals, run_name, File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 97, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/qiyuan/miniconda3/envs/sam/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/qiyuan/2022spring/sam-textvqa/train.py", line 18, in <module> from evaluator import Evaluator File "/home/qiyuan/2022spring/sam-textvqa/evaluator.py", line 11, in <module> from sam.datasets.metrics import STVQAANLSEvaluator, TextVQAAccuracyEvaluator File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/__init__.py", line 1, in <module> from .stvqa_dataset import STVQADataset File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/stvqa_dataset.py", line 7, in <module> from sam.datasets.textvqa_dataset import ImageDatabase, TextVQADataset File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/textvqa_dataset.py", line 14, in <module> from .processors import * File "/home/qiyuan/2022spring/sam-textvqa/sam/datasets/processors.py", line 81, in <module> from ..phoc import build_phoc File "/home/qiyuan/2022spring/sam-textvqa/sam/phoc/__init__.py", line 1, in <module> from .build_phoc import build_phoc # NoQA File "/home/qiyuan/2022spring/sam-textvqa/sam/phoc/build_phoc.py", line 3, in <module> from .cphoc import build_phoc as _build_phoc_raw ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

Then I tried to run the compile.sh, but encountered this error:
cphoc.c:1:10: fatal error: Python.h: No such file or directory 1 | #include <Python.h> | ^~~~~~~~~~ compilation terminated. gcc: error: cphoc.o: No such file or directory rm: cannot remove 'cphoc.o': No such file or directory

So how to solve this phoc related issue?

Regards,
Qiyuan

Visual Grounding

Hi,
As mentioned in your paper, is it possible to the share the code for the task of visual grounding using your model or point me in the right direction.
Thanks

Question about reproduce result.

I reproduce the baseline tvqa-c3 and the final accuracy is about 42.70% on the validation set. But it is reported the 43.9% on val set in the paper. Are there any details that I ignored? Or what is the reason for that?

Questions about the code

Hi,
I read your code carefully, but I have questions about some of the code.

  • attention_mask_quadrants: [1,2]
    I don't know what's the meaning of "attention_mask_quadrants" in .yml files. Do you mean to stop paying attention to the relationship between 1 and 2
  • self.matrix_type_map
    self.matrix_type_map = { "none": "1", "share3": "3", "share5": "5", "share7": "7", "share9": "9", }
    If I set it to none, does that mean focusing only on the one relationship, or on all, like the traditional transformers

Thanks!

question about training the model?

Hi,thanks for open source your code.
I run the code on my server with 62G memory.After running for a while, the training was interrupted.
I found a similar phenomenon in the previous issue:
#2 (comment)
I wonder how much memory is needed to train this model?
Also,should I convert the dataset into npy files?

What's the format of box?

Hello,

Thanks for your code. I was trying to plot boxes on images. What's the format of boxes? e.g.: xyxy or xywh or something else?

Regards.

Visualization results from prediction

Hello, good work, thanks for open source your code! Right now I want to visualize the question, image and predicted answer. I am wondering whether there is any function/demo file which takes in an image and gives out the prediction for that image. Thanks.

Question regarding beam search

Hey Yash,

I noticed that you have turned off beam search in your code. Can you share what is the problem with the beam search code in the repo?

Thanks

Issue running textvqa dataset on AWS EC2 instance

Hi,
I'm trying to run the pretrained model on an AWS EC2 instance. I'm running it on a g4dn.4xlarge instance with 64GB of RAM and 500GB of disk space. I was trying to run the evaluation command but my process got killed. I was running with the num_workers =0. Once I tried to rerun the command, I got an EOF error. I was wondering if you had any ideas of where my problem could be.

I ran this command: python train.py --config configs/train-tvqa-eval-tvqa-c3.yml --pretrained_eval data/pretrained-models/best_model.tar
image

It loaded this:
image

Made it all the way here and then the process was killed:
image

Then when I tried to run the same command, I got this error:
image

I thought it might be a memory error but I'm not sure.

Thank you for your consideration.

Unable to run your code in Colab

Hi Mr @yashkant
First of all thanks for your great code.
I decided to run your code with Colab. So, I started with installing requirement packages, loading data and code in Colab as following image that I attached.

sam-1

But when I run your project with following code, it gives me below error. Should I change config file? If yes, I changed the code as below, but I can’t solve this problem.

!python /content/samtextvqa/train.py --config /content/samtextvqa/configs/train-tvqa-eval-tvqa-c3.yml --pretrained_eval /content/samtextvqa/data/pretrained-models/best_model.tar

Error that show me.
File "/content/samtextvqa/sam/datasets/_image_features_reader.py", line 66, in __init__ self.env = lmdb.open( lmdb.Error: /samtextvqa/data/textvqa/tvqa_trainval_obj.lmdb: No such file or directory

Changed code in config file.
/content/samtextvqa/data/textvqa/tvqa_{}_obj.lmdb

**I'd be so appreciated if you could help me to solve this problem in running. **

Regards

where is save/debug/command.txt

Q1. What should I do to solve the problem?

after I finish all settings, I want to try pretrained-models.
So, I ran the below code
python train.py \ --config configs/train-tvqa_stvqa-eval-tvqa-c3.yml \ --pretrained_eval data/pretrained-models/best_model.tar
I get the error message
image
(click the photo, and you see the bigger photo.)

Q2. What should I do to solve the problem?

How can I get "wiki.en.bin"?

So, I ran the below code.
python train.py \ --config configs/train-stvqa-eval-stvqa-c3.yml \ --tag debug
and I have trouble with fastText.
image

image

I and my colleague have to trouble with Q2!
(Warning : load_model does not return WordVectorModel or SupervisedModel any more, but a FastText object which is very similar.)

Error when running with newer version of python

Hi @yashkant and @junj1ehx
You have runned this project with 2 titan gpu and python 3.6
Since version of python is 3.6, so Cuda should be 10 or less for computer despite couldn't detecting GPU.
But now I want to run this project with RTX 3090 (24 ram) Cuda for this nvidia should be more than 11 cause otherwise couldn't detected GPU in that program.
Now I run this project with cuda 12, python 3.8 and pythorch with cuda 11.7 when it runned gpu was detected but I have following error.

    from .cphoc import build_phoc as _build_phoc_raw
ImportError: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory

What should I do?
How can I convert python to newer version from 3.6 to 3.8 or 3.9 and that project run correctly?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.