Git Product home page Git Product logo

generativeimage2text's Introduction

Introduction

This repo presents some example codes to reproduce some results in GIT: A Generative Image-to-text Transformer for Vision and Language.

Installation

  • Install azfuse. The tool is used to automatically download the data. The configuration of AzFuse has already been in this repo.

  • Download the source code by

    git clone https://github.com/microsoft/GenerativeImage2Text.git
    cd GenerativeImage2Text
  • Install the package

    pip install -r requirements.txt
    python setup.py build develop

Inference

  • Inference on a single image or multiple frames:

    # single image, captioning
    AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', \
          'image_path': 'aux_data/images/1.jpg', \
          'model_name': 'GIT_BASE', \
          'prefix': '', \
    }"
    # single image, question answering
    AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', \
          'image_path': 'aux_data/images/1.jpg', \
          'model_name': 'GIT_BASE_VQAv2', \
          'prefix': 'what is it?', \
    }"
    # multiple images, captioning
    AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', \
          'image_path': ['aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg'], \
          'model_name': 'GIT_BASE_VATEX', \
          'prefix': '', \
    }"
    # multiple images, question answering
    AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', \
          'image_path': ['aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg'], \
          'model_name': 'GIT_BASE_MSRVTT_QA', \
          'prefix': 'what is it?', \
    }"
    • If prefix is empty, it is effectively the captioning task.

    • If prefix is a question, it is effectively the visual question answering task.

    • Use a list for image_path if it is for video. The example here is 6 identical images, only for a demo purpose. It should be different image frames from a video.

    • model_name here can be the following. Performance details can be found in the reference paper.

      model_name Information Performance
      GIT_BASE pretrained on 4M images
      GIT_BASE_COCO fine-tuned on COCO CIDEr: 131.4
      GIT_BASE_TEXTCAPS fine-tuned on TextCaps for captioning val/CIDEr: 64.9
      GIT_BASE_VQAv2 fine-tuned on VQAv2 test-dev: 72.72
      GIT_BASE_TEXTVQA fine-tuned on TextVQA val/acc: 18.81
      GIT_BASE_VATEX fine-tuned on VATEX for captioning public/test/CIDEr: 60.0
      GIT_BASE_MSRVTT fine-tuned on MSRVTT for captioning test/CIDEr: 57.8
      GIT_BASE_MSRVTT_QA fine-tuned on MSRVTT for question answering acc: 41.0
      GIT_LARGE pretrained on 14M images
      GIT_LARGE_COCO fine-tuned on COCO CIDEr: 138.5
      GIT_LARGE_TEXTCAPS fine-tuned on TextCaps for captioning val/CIDEr: 106.3
      GIT_LARGE_VQAv2 fine-tuned on VQAv2 test-dev: 75.51
      GIT_LARGE_TEXTVQA fine-tuned on TextVQA val/acc: 37.47
      GIT_LARGE_VATEX fine-tuned on VATEX for captioning public/test/CIDEr: 72.5
      GIT_LARGE_MSRVTT fine-tuned on MSRVTT for captioning test/CIDEr: 64.1
      GIT_LARGE_MSRVTT_QA fine-tuned on MSRVTT for question answering acc: 42.7
    • In the dataset of cc12m, the caption may contain some special tags to hide person names and the model might also predict such special tokens. To eliminate this issue, we remove these captions (around 25% in cc12m), and re-trained the large-sized model. The base-sized model is not affected as cc12 is not part of the training data.

      model_name Information Performance
      GIT_LARGE_R pretrained on 14M images with special tag removed
      GIT_LARGE_R_COCO fine-tuned on COCO CIDEr: 137.6
      GIT_LARGE_R_TEXTCAPS fine-tuned on TextCaps for captioning val/CIDEr: 105.3
  • Inference on a TSV file, which is a collection of multiple images.

    • Data format (for information only)
      • image TSV: Each row has two columns. The first is the image key; the second is base64-encoded jpg or png bit string.
      • caption or question tsv: Each row has two columns. The first is the image key; the second is a list of dictionaries in the json format. For caption TSV, the dictionary should contain at least the field of 'caption'. For the question answering TSV, it should contain at least question_id and question.
    • inference on COCO Karpathy test.
      1. Inference.
        # base
        AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_tsv', \
              'image_tsv': 'data/coco_caption/test.img.tsv', \
              'model_name': 'GIT_BASE_COCO', \
              'question_tsv': null, \
              'out_tsv': 'inference/GIT_BASE_COCO/coco.tsv', \
        }"
        # GIT_LARGE_COCO. If there are 8 GPUs, it can parallel by mpirun -n 8
        AZFUSE_TSV_USE_FUSE=1 mpirun -n 8 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_tsv', \
              'image_tsv': 'data/coco_caption/test.img.tsv', \
              'model_name': 'GIT_LARGE_COCO', \
              'question_tsv': null, \
              'out_tsv': 'inference/GIT_LARGE_COCO/coco.tsv', \
        }"
      2. Calculate the evaluation metric
        # base
        AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'evaluate_on_coco_caption', \
              'res_file': 'inference/GIT_BASE_COCO/coco.tsv', \
              'label_file': 'data/coco_caption/test.caption.tsv', \
        }"
        The CIDEr score should be 131.35 for GIT_BASE_COCO and 138.45 for GIT_LARGE_COCO. If you get lower score (e.g. 126 for the base model), the reason could be the misalignment of the environment, e.g. pytorch version.
      3. (optional) To exactly reproduce the number, please run the following:
        nvidia-docker run --ipc=host amsword/setup:py38pt19u20cu11 \
            bash -c "mkdir -p /tmp/code \
                    && cd /tmp/code \
                    && pip install git+https://github.com/microsoft/azfuse.git \
                    && git clone https://github.com/amsword/generativeimage2text.git \
                    && cd generativeimage2text \
                    && pip install -r requirements.txt \
                    && python setup.py build develop \
                    && AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_tsv', \
                             'image_tsv': 'data/coco_caption/test.img.tsv', \
                             'model_name': 'GIT_BASE_COCO', \
                             'question_tsv': null, \
                             'out_tsv': 'inference/GIT_BASE_COCO/coco.tsv', \
                       }" \
                    &&  AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'evaluate_on_coco_caption', \
                        'res_file': 'inference/GIT_BASE_COCO/coco.tsv', \
                        'label_file': 'data/coco_caption/test.caption.tsv', \
                        'outfile': 'inference/GIT_BASE_COCO/coco.score.json', \
                        }" \
                    && cat inference/GIT_BASE_COCO/coco.score.json \
                    "
    • Inference on vqa test
      1. Inference

        # base model
        AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_tsv', \
              'image_tsv': 'data/TaxVQAv2/test.tsv', \
              'model_name': 'GIT_BASE_VQAv2', \
              'question_tsv': 'data/TaxVQAv2/test.caption.tsv', \
              'out_tsv': 'inference/GIT_BASE_VQAv2/snapshot/vqav2.tsv', \
        }"
        # GIT_LARGE_VQAv2 with 8 GPUs.
        AZFUSE_TSV_USE_FUSE=1 mpirun -n 8 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_tsv', \
              'image_tsv': 'data/TaxVQAv2/test.tsv', \
              'model_name': 'GIT_LARGE_VQAv2', \
              'question_tsv': 'data/TaxVQAv2/test.caption.tsv', \
              'out_tsv': 'inference/GIT_LARGE_VQAv2/snapshot/vqav2.tsv', \
        }"
      2. Convert the output tsv to the json format for submission to evalai

        # base model
        AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'convert_tsv_to_vqa_json', \
              'predict_file': 'inference/GIT_BASE_VQAv2/snapshot/vqav2.tsv', \
              'out_json': 'inference/GIT_BASE_VQAv2/snapshot/vqav2.json', \
        }"
        # large model
        AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'convert_tsv_to_vqa_json', \
              'predict_file': 'inference/GIT_LARGE_VQAv2/snapshot/vqav2.tsv', \
              'out_json': 'inference/GIT_LARGE_VQAv2/snapshot/vqav2.json', \
        }"

        Submit the file of inference/GIT_BASE_VQAv2/snapshot/vqav2.json to evalai and you should get 72.72 on test-dev. If it is GIT_LARGE_VQAv2, the accuracy is 75.51.

      3. (optional) To exactly reproduce the number, you can use the following:

        # base model
        nvidia-docker run --ipc=host amsword/setup:py38pt19u20cu11 \
            bash -c "mkdir /tmp/code \
                    && cd /tmp/code \
                    && pip install git+https://github.com/microsoft/azfuse.git \
                    && git clone https://github.com/amsword/generativeimage2text.git \
                    && cd generativeimage2text \
                    && pip install -r requirements.txt \
                    && python setup.py build develop \
                    && AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_tsv', \
                        'image_tsv': 'data/TaxVQAv2/test.tsv', \
                        'model_name': 'GIT_BASE_VQAv2', \
                        'question_tsv': 'data/TaxVQAv2/test.caption.tsv', \
                        'out_tsv': 'inference/GIT_BASE_VQAv2/snapshot/vqav2.tsv', \
                    }" \
                    &&  AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'convert_tsv_to_vqa_json', \
                        'predict_file': 'inference/GIT_BASE_VQAv2/snapshot/vqav2.tsv', \
                        'out_json': 'inference/GIT_BASE_VQAv2/snapshot/vqav2.json', \
                    }" \
        }"

        Note that, please modify the docker command properly so that the output file can be saved permanently to the host machine. It is also recommended to run it inside the docker container by

        nvidia-docker run --ipc=host amsword/setup:py38pt19u20cu11 sleep infinity
        docker ps # get the docker container ID
        docker exec -it container_id /bin/bash # attach inside the docker container
        # all other commands to run the inference.

Training

The repo shows the key code path of constructing the network input with transformations and forward/backward. The code can be plugged into any trainer easily. Here is the example for the base model.

  • Pretraining/captioning
    python -m generativeimage2text.train -p "{'type': 'forward_backward_example', \
                    'image_files': ['aux_data/images/1.jpg', 'aux_data/images/2.jpg'], \
                    'captions': ['a couple of boats in a large body of water.', 'a view of a mountain with a tree'], \
                }"
    
  • VQA
    python -m generativeimage2text.train -p "{'type': 'forward_backward_example', \
                    'image_files': ['aux_data/images/1.jpg', 'aux_data/images/2.jpg'], \
                    'prefixs': ['what is this?', 'how many trees?'], \
                    'captions': ['several boats in a large body of water', '1'], \
                }"
    

ImageNet

Class ID to unique readable names

  • Save the file of LOC_synset_mapping.txt from Kaggle. under aux_data/imagenet/

  • Convert the wordnet ID to readable names as follows

    python -m generativeimage2text.data_prepare -p "{'type': 'generate_imagenet_unique_names'}"

    The input file is hard coded as ./aux_data/imagenet/LOC_synset_mapping.txt and the output file is ./aux_data/imagenet/imagenet_unique_readable_names.txt

Citation

Please consider to cite the following reference if it helps.

@article{wang2022git,
  title={GIT: A Generative Image-to-text Transformer for Vision and Language},
  author={Wang, Jianfeng and Yang, Zhengyuan and Hu, Xiaowei and Li, Linjie and Lin, Kevin and Gan, Zhe and Liu, Zicheng and Liu, Ce and Wang, Lijuan},
  journal={arXiv preprint arXiv:2205.14100},
  year={2022}
}

Misc

The model is now available in 🤗 Transformers. You can also find a fine-tuning guide on image captioning with GIT here. Thanks to Niels Rogge for contributing the model to 🤗 Transformers and Sayak Paul for the fine-tuning guide.

Acknowledgement

Part of the code is based on transformers, clip, maskrcnn-benchmark, oscar, virtex.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

generativeimage2text's People

Contributors

amsword avatar joeyism avatar sayakpaul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

generativeimage2text's Issues

azfuse: PublicAccessNotPermitted

I got this when tring to fetch GIT models. Clip models can be downloaded without problems but GIT doesn't work

<?xml version="1.0" encoding="utf-8"?><Error><Code>PublicAccessNotPermitted</Code><Message>Public access is not permitted on this storage account.
RequestId:24154b22-801e-001a-802e-f0cc44000000
Time:2022-11-04T09:20:10.2370177Z</Message></Error>```

[ unused0 ] in the generated captions.

By using GIT_BASE_COCO model, I expect to generate a caption: Newt Gingrich sitting in a chair.
But the generated caption is: [ unused0 ] sitting in a chair.

The image is in below.
28

Is there any legal issues so you cannot show a celebrity's name in a caption? In a similar issus, you mentioned that [ unused0 ] can be replaced by in GIT_BASE_COCO, but why only instead that person's real name?

Also, can I train more images for the base model that eventually it can generate a caption with the celebrity's name?

The reason that I want the celebrity's real name is because I am trying to build up a image recommendation system, and the name is a word that contains a very heavy weight to recommend a accurate image for the user.

Please explain this issue more. Thanks.

Question about the number of image features(embeddings) projected into the input to text decoder.

Hi, I read paper and checked the code about the number of image features projected into the input to text decoder.
Is it right the only feature of [CLS] token from Vision Transformer used to be projected into the input to text decoder?
And then, Does text decoder decode with the only one token of projected image features and others tokens of text features?

Thanks about your nice research!

How to generate more captions

Hello,

I would like to generate multiple candidate captions for one image when doing image captioning. How could I do this? Is there any parameter I can set?

Thanks!

Question about Fine-tuning on Video

Hi, I'm glad to see your excellent work.
When adapting a GIT-based model to the video domain using the provided code, is it necessary to ensure that the input sizes for both image and video features are the same? Specifically, the current image input size is [1,197,768] and the video input size is [1,1182,768] for the text decoder, but is it possible to generalize the image domain to the video domain without requiring identical input sizes?
Thank you!

Use of GIT model in a jupyter notebook?

Thank you for your amazing work on the GIT model.

Is there any chance you might be willing to provide a minimal set of code to load and run the various kinds of model inference in a jupyter notebook?

Thank you in advance!

Individual caption for individual frames

Currently when you supply multiple images it treats them as frames of video. I want to know if there any way to generate captions for multiple images individually.
example
Input

 'image_path': ['aux_data/images/1.jpg', 'aux_data/images/2.jpg', 'aux_data/images/1.jpg', 'aux_data/images/2.jpg', 'aux_data/images/1.jpg', 'aux_data/images/1.jpg'],

current output : Single caption for all frames combined

clouds in the sky

What i want : Individual caption for individual frames

a large body of water
Cloud in the sky
a large body of water
Cloud in the sky

Using for loops in shell can be the solution but it will load and unload the model everytime which makes the task slower.

Question about character [unused]

I use model GIT_LARGE_COCO to test caption on an image https://wx4.sinaimg.cn/mw2000/632a985cly1fi3b3qohw2j210d1jktcr.jpg use bash command
CUDA_VISIBLE_DEVICES=1 AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image',
'image_path': './Images/Characters_and_celebrities-Angelina_Jolie4.jpg',
'model_name': 'GIT_LARGE_COCO',
'prefix': '',
}"

But we got a caption: ##0 ] as [ unused0 ] in the twilight zone'' [ unused0 ], [ unused0 ], [ unused0 ], [ unused0 ], [ unused0]
And this happened on some other images too, why this happen?

Inference for videos

Hi,

Thank you for your excellent work.

I have some videos, and want to generate captions for them.
I now have generate 6 frames (images) for each video.
How can I run the inference for them.

For example, for inference code
AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_tsv', \ 'image_tsv': 'data/coco_caption/test.img.tsv', \ 'model_name': 'GIT_BASE_COCO', \ 'question_tsv': null, \ 'out_tsv': 'inference/GIT_BASE_COCO/coco.tsv', \ }"

Could you please let me know what the tsv file should look like, and how to generate it?
Thanks in advance!

I got an error saying No such file or directory: 'azcopy': 'azcopy'

Thanks for your contribution. But I got an error trying inference on single image.

The error says that "FileNotFoundError: [Errno 2] No such file or directory: 'azcopy': 'azcopy'".

My inference command is:

AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p
"{'type': 'test_git_inference_single_image',
'image_path': '/mnt/bd/zhixinling-diffusion/mlx_workspace/3446700194249458680.png',
'model_name': 'GIT_BASE',
'prefix': '',
}"

The whole error traceback is pasted below:

2022-07-17 15:49:41,121.121 122690:inference.py:315 (): param:
{'image_path': '/mnt/bd/zhixinling-diffusion/mlx_workspace/3446700194249458680.png',
'model_name': 'GIT_BASE',
'prefix': '',
'type': 'test_git_inference_single_image'}
/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/torchvision/transforms/transforms.py:333: UserWarning: Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. Please use InterpolationMode enum.
"Argument 'interpolation' of type int is deprecated since 0.13 and will be removed in 0.15. "
2022-07-17 15:49:51,642.642 122690:cloud_storage.py:952 az_download_once(): if sync, no need to save to temp first
2022-07-17 15:49:51,642.642 122690:common.py:296 cmd_run(): start to cmd run: azcopy cp https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt /tmp/publicgit/output/GIT_BASE/snapshot/model.pt
2022-07-17 15:49:51,685.685 122690:cloud_storage.py:1013 az_download_once(): ['azcopy',
'cp',
'https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt',
'/tmp/publicgit/output/GIT_BASE/snapshot/model.pt']
2022-07-17 15:49:51,685.685 122690:common.py:209 limited_retry_agent(): fails with
[Errno 2] No such file or directory: 'azcopy': 'azcopy': tried 1/5-th time
Traceback (most recent call last):
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 204, in limited_retry_agent
return func(*args, **kwargs)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 1011, in az_download_once
cmd_run(cmd, stdout=stdout)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 323, in cmd_run
stderr=stderr,
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'azcopy': 'azcopy'
2022-07-17 15:49:53,797.797 122690:cloud_storage.py:952 az_download_once(): if sync, no need to save to temp first
2022-07-17 15:49:53,797.797 122690:common.py:296 cmd_run(): start to cmd run: azcopy cp https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt /tmp/publicgit/output/GIT_BASE/snapshot/model.pt
2022-07-17 15:49:53,837.837 122690:cloud_storage.py:1013 az_download_once(): ['azcopy',
'cp',
'https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt',
'/tmp/publicgit/output/GIT_BASE/snapshot/model.pt']
2022-07-17 15:49:53,837.837 122690:common.py:209 limited_retry_agent(): fails with
[Errno 2] No such file or directory: 'azcopy': 'azcopy': tried 2/5-th time
Traceback (most recent call last):
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 204, in limited_retry_agent
return func(*args, **kwargs)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 1011, in az_download_once
cmd_run(cmd, stdout=stdout)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 323, in cmd_run
stderr=stderr,
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'azcopy': 'azcopy'
2022-07-17 15:49:57,220.220 122690:cloud_storage.py:952 az_download_once(): if sync, no need to save to temp first
2022-07-17 15:49:57,220.220 122690:common.py:296 cmd_run(): start to cmd run: azcopy cp https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt /tmp/publicgit/output/GIT_BASE/snapshot/model.pt
2022-07-17 15:49:57,259.259 122690:cloud_storage.py:1013 az_download_once(): ['azcopy',
'cp',
'https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt',
'/tmp/publicgit/output/GIT_BASE/snapshot/model.pt']
2022-07-17 15:49:57,259.259 122690:common.py:209 limited_retry_agent(): fails with
[Errno 2] No such file or directory: 'azcopy': 'azcopy': tried 3/5-th time
Traceback (most recent call last):
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 204, in limited_retry_agent
return func(*args, **kwargs)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 1011, in az_download_once
cmd_run(cmd, stdout=stdout)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 323, in cmd_run
stderr=stderr,
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'azcopy': 'azcopy'
2022-07-17 15:50:02,207.207 122690:cloud_storage.py:952 az_download_once(): if sync, no need to save to temp first
2022-07-17 15:50:02,207.207 122690:common.py:296 cmd_run(): start to cmd run: azcopy cp https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt /tmp/publicgit/output/GIT_BASE/snapshot/model.pt
2022-07-17 15:50:02,255.255 122690:cloud_storage.py:1013 az_download_once(): ['azcopy',
'cp',
'https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt',
'/tmp/publicgit/output/GIT_BASE/snapshot/model.pt']
2022-07-17 15:50:02,255.255 122690:common.py:209 limited_retry_agent(): fails with
[Errno 2] No such file or directory: 'azcopy': 'azcopy': tried 4/5-th time
Traceback (most recent call last):
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 204, in limited_retry_agent
return func(*args, **kwargs)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 1011, in az_download_once
cmd_run(cmd, stdout=stdout)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 323, in cmd_run
stderr=stderr,
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'azcopy': 'azcopy'
2022-07-17 15:50:06,331.331 122690:cloud_storage.py:952 az_download_once(): if sync, no need to save to temp first
2022-07-17 15:50:06,331.331 122690:common.py:296 cmd_run(): start to cmd run: azcopy cp https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt /tmp/publicgit/output/GIT_BASE/snapshot/model.pt
2022-07-17 15:50:06,371.371 122690:cloud_storage.py:1013 az_download_once(): ['azcopy',
'cp',
'https://publicgit.blob.core.windows.net/data/output/GIT_BASE/snapshot/model.pt',
'/tmp/publicgit/output/GIT_BASE/snapshot/model.pt']
2022-07-17 15:50:06,372.372 122690:common.py:209 limited_retry_agent(): fails with
[Errno 2] No such file or directory: 'azcopy': 'azcopy': tried 5/5-th time
Traceback (most recent call last):
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 204, in limited_retry_agent
return func(*args, **kwargs)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 1011, in az_download_once
cmd_run(cmd, stdout=stdout)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 323, in cmd_run
stderr=stderr,
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'azcopy': 'azcopy'
Traceback (most recent call last):
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/mnt/bd/zhixinling-diffusion/repos_imagecaption/GenerativeImage2Text/generativeimage2text/inference.py", line 318, in
locals()function_name
File "/mnt/bd/zhixinling-diffusion/repos_imagecaption/GenerativeImage2Text/generativeimage2text/inference.py", line 81, in test_git_inference_single_image
checkpoint = torch_load(pretrained)['model']
File "/mnt/bd/zhixinling-diffusion/repos_imagecaption/GenerativeImage2Text/generativeimage2text/torch_common.py", line 42, in torch_load
with File.open(filename, 'rb') as fp:
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/azfuse.py", line 50, in open
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 451, in open
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 492, in open_to_read
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 595, in ensure_remote_to_cache
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 571, in remote_to_cache
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 934, in az_download
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 204, in limited_retry_agent
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/cloud_storage.py", line 1011, in az_download_once
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/site-packages/azfuse-0.1-py3.7.egg/azfuse/common.py", line 323, in cmd_run
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 800, in init
restore_signals, start_new_session)
File "/mnt/bd/zhixinling-diffusion/envs/miniconda3/envs/GIT/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'azcopy': 'azcopy'

Question about ground-truth answers during fine-tuning on VQAv2

Hello,

Thanks for your stellar work and sharing your code!

I have a quick question. When you calculated the loss during fine-tuning on VQAv2, did you use one ground-truth answer as a target? If so, how did you choose one from ten ground-truth answers? Perhaps the one appears most frequently?

Thank you!

fine-tuning config on COCO

Thanks for your awesome work.
Would you like to share the fine-tuning setup such as the number of training epochs, learning rate and optimizer configs, as I did not find corresponding in your paper.

Setting for Zero-shot ImageNet classification

Hi, thanks for your great work!

Could you please provide your evaluation code for zero-shot imageNet classification?
More specifically, I would like to know how to implement "limit the candidate tokens during each token prediction, such that the predicted result is guaranteed to be within the vocabulary."
Thanks in advance.

Sincerely,
Zhenfang

TextCaps pre-trained models

Hi,

Thank you for releasing this awesome repo.
Will you be releasing models fine-tuned with TextCaps or ST+MJ data?

Thanks!

Can you provide examples for evaluation using tsv files?

Hi, Thank you for interesting work! I would be very glad to have tsv files such as data/coco_caption/test.img.tsv, inference/GIT_BASE_COCO/coco.tsv, data/TaxVQAv2/test.caption.tsv you mentioned in README.md files.

Thanks in adavance!

Joe Jang

Slow prediction on Video

image
Generate caption for one video takes 3 minutes, what should I do?
I'm use function from inference

Question about reproducing the caption results in the paper

Hi,

Thanks for sharing this wonderful work.
When reproducing the results in the paper, i.e., image captioning, I found the results mismatch with the one mentioned in the paper.

For example, to reproduce Fig10. (19), the expected caption is A marilyn monroe photo with a black background.
However, what I got is a close up of a woman's face and earrings.
The test image is from https://prod-images.tcm.com/Master-Profile-Images/MarilynMonroe.jpg with following commands:

AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', \
      'image_path': 'aux_data/images/MarilynMonroe.jpeg', \
      'model_name': 'GIT_LARGE_TEXTCAPS', \
      'prefix': '', \
}"

And with GIT_BASE, I got a portrait of blues artist.
I think there might be mistakes in the steps that caused the incorrect result of the experiment, could you please share the commands to reproduce Fig10?

GIT has been added to 🤗 Transformers

Hi folks!

Impressive work :) as I really liked this work I decided to contribute it to 🤗 Transformers.

Documentation can be found here: https://huggingface.co/docs/transformers/main/en/model_doc/git

Users can now use GIT in a few lines of code :)

Here's a demo notebook illustrating inference with GIT for image/video captioning, and image/video QA: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/GIT/Inference_with_GIT_for_image_video_captioning_and_image_video_QA.ipynb.

How to increase sample frames number to more than 6?

Hi, thank you so much for the great works! I have questions about sampled frame number, in the paper mentioned

During inference, we uniformly sample 6 frames with center crop.

I am keen to know is possible for us to sample more than 6 frames during inferences? I got this error when I try to use more than 6 frames per video clip.

IndexError: index 6 is out of range

Look forward to your response. Thank you!

About Fine-tuning on Video-QA dataset

Hi, could you please share more settings during the fine-tuning GIT_base? The batch size and optimizer choices cannot be found in the appendix. Do you use the same setting as visual question answering (batch size=576)?

Chinese caption

Thank you for your work!Is it possible for you to release the Chinese caption model later?Or could you give me some suggestions on Chinese caption? Thank you very much.

lineidx for custom TSV file

Thank you for your great work on this model and repo.

I'm trying to use this model to caption some of my own images and created an image TSV file as described in the README.md with two columns, one with the image key and one with the encoded byte string. However, when I run the example single TSV command with my own tsv file, entitled my_images.tsv , I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'my_images.lineidx'

What is the lineidx mean here? Evidently, there is something else that I need to include in order for the model to run on my TSV but I can't find any information about it in the repo. What else do I need to do?

About generation results.

Hi,

When I use GIT_LARGE_COCO to generate captions, the results show many "[ unused0 ]" tokens in the captions.
For example:

  • "[ unused0 ]'s [ unused0 ]'s [ unused0 ]".
  • "[ unused0 ] in a scene from the movie [ unused0 ]''"

So what is "[ unused0 ]"? Does it mean an unknown word?
Why it generates many "[ unused0 ]" tokens?
How could I avoid these situations?

Thanks!

struct Inputs for provisioned git-large-coco fine-tuned model on AWS Sagemaker

Hi,
I was looking for documents on how to set inputs for model inputs but non of those solutions work. So I have to ask you here in case anyone done this before.

I deployed the microsoft/git-large-coco successfully with script below:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker
import boto3

sess = sagemaker.Session()

# sagemaker session bucket -> used for uploading data, models and logs
# sagemaker will automatically create this bucket if it not exists
sagemaker_session_bucket=None
if sagemaker_session_bucket is None and sess is not None:
    # set to default bucket if a bucket name is not given
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError as e:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
        'HF_MODEL_ID':'microsoft/git-large-coco',
	'HF_TASK':'image-to-text'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.26.0',
	pytorch_version='1.13.1',
	py_version='py39',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.t2.large', # ec2 instance type
        endpoint_name='git-large-coco-inference-endpoint'
)

now struggling with model calling.

import boto3
import json

runtime = boto3.Session().client('sagemaker-runtime')
endpoint = 'git-large-coco-inference-endpoint'

response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/json', Body=json.dumps({
        "inputs": {
            "type": "test_git_inference_single_image",
            "image": "https://images.pexels.com/photos/1198172/pexels-photo-1198172.jpeg"
            # 'prefix': 'what is it?', 
        }
    })
)

result = json.loads(response['Body'].read().decode())
print(result)

error message:

Traceback (most recent call last):
  File "/Users/leo/Downloads/python_stock_predition/sagemaker/test_sagemaker_endpoint.py", line 47, in <module>
    response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/json', Body=json.dumps({
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/site-packages/botocore/client.py", line 960, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
  "code": 400,
  "type": "InternalServerException",
  "message": "Incorrect format used for image. Should be an url linking to an image, a local path, or a PIL image."
}
". See https://eu-central-1.console.aws.amazon.com/cloudwatch/home?region=eu-central-1#logEventViewer:group=/aws/sagemaker/Endpoints/huggingface-pytorch-git-large-coco-inference in account 711703683918 for more information.

please advice on how to solve that issue?
thanks in advance.

inference on single image

when inference on single image

Traceback (most recent call last):
File "E:\Anaconda\lib\site-packages\azfuse\common.py", line 204, in limited_retry_agent
return func(*args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: 'output/GIT_BASE/snapshot/model.pt'
Traceback (most recent call last):
File "E:/REC/GenerativeImage2Text/inference.py", line 317, in
locals()function_name
File "E:/REC/GenerativeImage2Text/inference.py", line 81, in test_git_inference_single_image
checkpoint = torch_load(pretrained)['model']
File "E:\REC\GenerativeImage2Text\generativeimage2text\torch_common.py", line 42, in torch_load
with File.open(filename, 'rb') as fp:
File "E:\Anaconda\lib\site-packages\azfuse\azfuse.py", line 52, in open
return exclusive_open_to_read(fname, mode)
File "E:\Anaconda\lib\site-packages\azfuse\common.py", line 113, in exclusive_open_to_read
fp = limited_retry_agent(10, open, fname, mode)
File "E:\Anaconda\lib\site-packages\azfuse\common.py", line 204, in limited_retry_agent
return func(*args, **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: 'output/GIT_BASE/snapshot/model.pt'
2022-08-24 17:10:13,540.540 12420:common.py:209 limited_retry_agent(): fails with
[Errno 2] No such file or directory: 'output/GIT_BASE/snapshot/model.pt': tried 10/10-th time

how to get the file 'output/GIT_BASE/snapshot/model.pt‘ ?

About pretraining

Hi,

The pretraining command in README.md is:
python -m generativeimage2text.train -p "{'type': 'forward_backward_example', \ 'image_files': ['aux_data/images/1.jpg', 'aux_data/images/2.jpg'], \ 'captions': ['a couple of boats in a large body of water.', 'a view of a mountain with a tree'], \ }"

I would like to pretraining on my own datasets. I have checked the code in train.py and I find it doesn't divide batches of data for training and there is no optimizer to update the model.
Would you provide codes for formal pretraining?

Thanks!

error about azcopy

Hello! When I run

AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', \
      'image_path': 'aux_data/images/1.jpg', \
      'model_name': 'GIT_BASE', \
      'prefix': '', \
}"

The following error occurred. I'm not sure what the reason is, whether it was an error installing azfuse. If you could provide suggestions, I would greatly appreciate it.
image

image

readme install typo

In the readme: python setup build develop is missing the .py extension (setup.py)

What is the best way to reduce the appearance of false positive text detection in captions using git-large-r-textcaps?

I have noticed that while scene detection with GIT is very good, I am getting a large amount of text detection in images that have no text. Something in the range of 6% to 18% depending on the image set used.

I have looked at some of the templates and found some settings that could be used to try to mitigate this, but I would like to hear from people who may have already found a fix for this problem.

My main idea would be to decrease the aggressiveness of the text detection, but if not possible even exclude it completely.

I am using git-large-r-textcaps to do caption detection on images. Apart from this problem, I am having great results.

Video captioning model finetuning

Your text gives examples of neural network training for image captioning, but no examples for video captioning. Can you give a code example to solve this problem?

(Windows) ModuleNotFoundError: No module named 'fcntl'

I'm trying to run on Windows and getting a ModuleNotFoundError for fcntl, which apparently isn't support on Windows. Does GIT support Windows? Here's my output:

$ AZFUSE_TSV_USE_FUSE=1 python -m generativeimage2text.inference -p "{'type': 'test_git_inference_single_image', 'image_path': 'aux_data/images/1.jpg', 'model_name': 'GIT_BASE', 'prefix': ''}"
2022-09-18 19:58:30,062.062 39616:inference.py:318 (): param:
{'image_path': 'aux_data/images/1.jpg',
'model_name': 'GIT_BASE',
'prefix': '',
'type': 'test_git_inference_single_image'}
Traceback (most recent call last):
File "C:\Users\Caelen\anaconda3\envs\git\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Caelen\anaconda3\envs\git\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Caelen\Desktop\GenerativeImage2Text\generativeimage2text\inference.py", line 321, in
locals()function_name
File "C:\Users\Caelen\Desktop\GenerativeImage2Text\generativeimage2text\inference.py", line 69, in test_git_inference_single_image
if File.isfile(f'aux_data/models/{model_name}/parameter.yaml'):
File "C:\Users\Caelen\anaconda3\envs\git\lib\site-packages\azfuse\azfuse.py", line 39, in isfile
cls.ensure_initialized()
File "C:\Users\Caelen\anaconda3\envs\git\lib\site-packages\azfuse\azfuse.py", line 25, in ensure_initialized
cls.fuser = create_cloud_fuse()
File "C:\Users\Caelen\anaconda3\envs\git\lib\site-packages\azfuse\cloud_storage.py", line 155, in create_cloud_fuse
config = load_from_yaml_file(fname)
File "C:\Users\Caelen\anaconda3\envs\git\lib\site-packages\azfuse\common.py", line 259, in load_from_yaml_file
with exclusive_open_to_read(file_name, 'r') as fp:
File "C:\Users\Caelen\anaconda3\envs\git\lib\site-packages\azfuse\common.py", line 108, in exclusive_open_to_read
lock_fd = acquireLock(op.join('/tmp',
File "C:\Users\Caelen\anaconda3\envs\git\lib\site-packages\azfuse\common.py", line 181, in acquireLock
import fcntl
ModuleNotFoundError: No module named 'fcntl'

GIT and GIT2

Thank you for this code release! Will you also be able to release either of the larger models GIT and/or GIT2?

GPU util training from scratch

Hi,
thanks for your work.
I am curious about how many GPUs you use when training.
Because I infer an image in single 3090,
It occupies 75% GPU utilization and inference about a second per image (GIT_LARGE_COCO)
So I guess training with 512 batches needs a lot of GPUs?

BTW, how can I speed up the inference speed when I have multiple "different classes"?
(e.g. multiple batches for different frames)
currently, the inference FPS is 1.

Blank output with question prefix on inference with `test_git_inference_single_image`

I managed to get caption completion working alright (both in leaving the prefix text blank or passing in a partial caption for completion), but when I type a question of the form "this is a question?" (with a "?" at the end) I always get output: followed by nothing.

Similarly, I attempted using [MASK] tokens for a kind of fill-in-the-blank prompt and also get output: followed by nothing.

What am I missing?

finetuning

am i able to finetune on custom dataset this model via GIT model on hugging face(video captioning task)

GIT2

Hi,

Do you have any plan to release the weights of GIT2 or GIT trained with 0.8B?
Thanks!

fine tune git

can you give me some advices to finetune git model on my own on dataset if finetuning has any sense(video captioning task)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.