Git Product home page Git Product logo

pica's Introduction

PICa

An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA

by Zhengyuan Yang, Zhe Gan, Jianfeng Wang, Xiaowei Hu, Yumao Lu, Zicheng Liu, and Lijuan Wang

The 36th AAAI Conference on Artificial Intelligence (AAAI), 2022, Oral

Introduction

Can GPT-3 benefit multimodal tasks? We provide an empirical study of GPT-3 for knowledge-based VQA, named PICa. We show that prompting GPT-3 via the use of image captions with only 16 examples surpasses supervised sota by an absolute +8.6 points on the OK-VQA dataset (from 39.4 to 48.0).

Citation

@inproceedings{yang2021empirical,
  title={An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA},
  author={Yang, Zhengyuan and Gan, Zhe and Wang, Jianfeng and Hu, Xiaowei and Lu, Yumao and Liu, Zicheng and Wang, Lijuan},
  booktitle={AAAI},
  year={2022}
}

Prerequisites

Installation

  1. Clone the repository

    git clone https://github.com/microsoft/PICa.git
    
  2. Prepare the data The cached files for converted OKVQA data, predicted text representations, and similarity features are in the coco_annotations, input_text, and coco_clip_new folders, respectively.

Running

  1. We experimented with the older engine davinci instead of the current default text-davinci-001 that is boosted for instruction tuning, see more discussion here.
    python gpt3_api_okvqa.py --apikey xxx --output_path output
    
    ## for example
    python gpt3_api_okvqa.py --apikey xxx --output_path output --engine davinci --similarity_metric random --n_ensemble 1 --n_shot 16
    python gpt3_api_okvqa.py --apikey xxx --output_path output --engine davinci --similarity_metric imagequestion --n_ensemble 5 --n_shot 16
    

Results

  1. Outputs will be saved to format_answer and prompt_answer folders. format_answer is used for final evaluation, following the vqav2 format. prompt_answer contains the input prompt for human interpretation.

  2. output_saved provides the cached predictions.

pica's People

Contributors

microsoft-github-operations[bot] avatar microsoftopensource avatar zyang-ur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pica's Issues

A strange question

I run this command:python gpt3_api_okvqa.py --apikey xxx --output_path output --engine davinci --similarity_metric random --n_ensemble 1 --n_shot 16. I don't know why it doesn't occupy the memory of the GPU, and I can't even see the process number.
and Is the whole project just a .py file?

About GPU usage

Hello, thank you for doing such an excellent job, how many GPUs did you use for this work?

train.score.json.tsv file json decode error

Hi, thanks for sharing your inspiring work. I met a bug when running your code. The caption tag file train.score.json.tsv can not load. The error message is: json.decoder.JSONDecodeError: Expecting ':' delimiter: line 1 column 466 (char 465)
The other two files for val and test are good. How can I fix the problem with that of the training set?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.