Git Product home page Git Product logo

nlp1-2017-vqa's Introduction

NLP1-2017-VQA

Project Description

In this project you will combine Natural Language Processing with Computer Vision for high-level scene interpretation. In particular, you will implement a system that is capable to answer to questions related to pictures. Given precomputed visual features and a question, you will have to implement at the least two, incrementally more sophisticated, models to process the (textual) question and to combine it with the visual information, in order to eventually answer the question. The first model you will implement is a Bag-of-Words (BoW) model, the second a Recurrent Neural Network (RNN).

What we provide

You will receive a subset of the Visual Question Answering dataset. The provided dataset contains 60k Q/A pairs, which have been balanced by answer type ('yes/no', 'number', 'other). Details on how the dataset was created and its structure can be found in the notebook which was used for the dataset creation: VQA Dataset Structure.ipynb. The provided dataset follows the exact same structure as the original VQA dataset. Further, we will also provide the visual features which have been computed using ResNet.

Requirements

As a final product, you will be asked to write a report with your findings, which should at least contain:

  • A background section, in which you write about techniques that connect language and vision (e.g., visual question answering, text-based image retrieval, visual dialogue, etc) and the problem that you are trying to address;
  • A description of the model that you use, and of its individual components;
  • A summary of your models’ learning behavior, including learning curves and hyper-parameter search;
  • A qualitative analysis of each model by showing and discussing (interesting) correctly and wrongly classified examples;
  • A systematic comparison of the models you trained, including qualitative measures such as top1 and top5 accuracy, per-type of-answer accuracy (e.g., only yes/no answers, counting answers, etc); qualitative analysis as in previous point, but where the analysis is conducted between different models.
  • A section where you discuss future work based on your experience and what you think could significantly improve performance (but you didn’t find the time to investigate);
  • Besides the report, please also provide a link to your github repository with your implementation.

The general report requirements can be found here.

Further Readings and useful Links

nlp1-2017-vqa's People

Contributors

timbmg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.