Git Product home page Git Product logo

Complex Imperative Program Induction From Terminal Rewards (CIPITR)

This repository contains the implementation of the program induction model proposed in the TACL paper Complex Program Induction for Querying Knowledge Bases in the Absence of Gold Programs and links to download associated datasets.

Currently this code only handles program induction where the input variables to the program are gold i.e. for example if KBQA requires entity, relation type linking on the query before program induction, this code sends the oracle entity, relation, type linker's output to CIPITR.

Datasets

Datasets on Complex Question answering on Knowledge Bases, used for evaluating CIPITR

  1. Complex Sequential Question Answering (https://amritasaha1812.github.io/CSQA/) Dataset
  2. WebQuestionsSP (https://www.microsoft.com/en-us/download/details.aspx?id=52763) Dataset

Experiments on CQA

  • Step 7: For running the experiments on CQA (or any subset of CQA) with the gold entity, relation, type linking, we recommend using the tensorflow version.

  • Step 8: To do so go inside CSQA_TACL_FINAL/code/NPI/tensorflow/gold_WikiData_CSQA folder

  • Step 9: Each of the experiments are configured with a parameter file (in the parameter folder). There are seven question types (simple, logical, verify, quanti, quanti_count, comp, comp_count) and each of the variants can be run on either the smaller subset of the dataset ( i.e. CQA subset with 100 QA pairs per question type) or the full dataset. For e.g. for running on the simple question type on CQA-100 subset, use the parameter file parameters_simple_small.json and to run on full CQA dataset, use the parameter file parameters_simple_big.json (small is for 100 QA pair subset of CQA and big is for full CQA)

  • Step 10: Create a folder model.

  • Step 11: To run training on any of the question categories (simple/logical/verify/quanti/comp/quanti_count/comp_count) run python train.py <parameter_file> <time-stamp> (time-stamp is the ID of the current experiment run). Example script to run is in run.sh. This script will start the training as well as dump the trained model in the model and also run validation.

  • Step 12: To load the trained model and run test, run python load.py <parameter_file> <time-stamp> (use the same ID as used during training)

  • Step 13: To download pre-trained models and log files:

  • For e.g. to train and test the tensorflow code on simple question type on 100-QA pair subset of CQA:

    • cd CSQA_TACL_FINAL/code/NPI/tensorflow/gold_WikiData_CSQA
    • python train.py parameters/parameters_simple_small.json small_Jan_7 #this will create a folder model/simple_small_Jan_7 to dump the trained model
    • python load.py parameters/parameters_simple_small.json small_Jan_7 #this will run the trained model on the test data, as mentioned in the parameter file

Experiments on WebQuestionsSP

  • Step 1: For experiments on the WebQuestionsSP dataset, download the preprocessed version of the dataset and the corresponding subset of freebase, i.e. freebase_webqsp.zip (https://drive.google.com/file/d/1CuV4QJxknTqDmAaLwBfO1kyNW7IXTd1Q/view?usp=sharing)

  • Step 2: For running any of the tensorflow scripts, go inside CSQA_TACL_FINAL/code/NPI/tensorflow and install the dependencies by running $pip install -r requirements.txt

  • Step 3: Similarly, for running any of the pytorch scripts, go inside CSQA_TACL_FINAL/code/NPI/pytorch and install the dependencies by running $pip install -r requirements.txt

  • Step 4: Go inside code/NPI/pytorch/gold_FB_webQuestionsSP folder.

  • Step 5: Each of the experiments are configured with a parameter file (in the parameters folder). The experiments on the gold entity, relation, type (ERT) linking data have parameters inside the parameters/gold folder and the experiments on the noisy ERT linking data have parameters inside the parameters/noisy folder. There are five categories of questions, 1infc and 1inf (i.e. questions with inference chain length 1, with and without additional non-temporal constraint), 2infc and 2inf (i.e. questions with inference chain length 2, with and without additional non-temporal constraint), c_date (i.e. questions with temporal constraints, having inference chain of any length). Accordingly parameter files in the gold folder are named parameters_<category>.json. For e.g. to run an experiment on questions with length-1 inference chain and no constraint with gold ERT linker data, use the parameter file parameters/gold/parameters_1inf.json

  • Step 6: Create a folder model

  • Step 7: To run training on any of the question categories (1inf/1infc/2inf/2infc/c_date) run python train.py <parameters_file> <time-stamp> (time-stamp is the ID of the current experiment run). Example script to run is in run.sh. This script will start the training as well as dump the trained model in the model and also run validation.

  • Step 8: To load the trained model and run test, run python load.py <parameter_file> <time-stamp> (use the same ID as used during training)

  • Step 9: To download pre-trained models and log files:

  • For e.g. to train and test the pytorch code on 1inf question type on WebQuestionsSP:

    • cd CSQA_TACL_FINAL/code/NPI/pytorch/gold_FB_webQuestionsSP
    • python train.py parameters/gold/parameters_1inf.json Jan_7 #this will create a folder model/1inf_Jan_7 to dump the trained model
    • python load.py parameters/gold/parameters_1inf.json Jan_7 #this will run the trained model on the test data with gold ERT linking, as mentioned in the parameter file
    • python load.py parameters/noisy/parameters_1inf.json Jan_7 #this will run the trained model on the test data with noisy ERT linking, as mentioned in the parameter file

RL Environment for CQA Dataset

We have also provided a simple RL environment for doing Question answering over the CQA dataset using Wikidata Knowledge base.

  • Step 1: The RL environment is located at NPI/RL_ENVIRONMENT_CSQA/code/ directory.
  • Step 2: To incorporate the environment one has to simply import the environment.py file.
  • Step 3: To instantiate an environment you will need to input a parameter file. Sample parameter files are located in the parameters folder.
  • Step 4 A detailed and sufficient instruction on using and instantiating an environment object is provided in the sample_env_usage.ipynb notebook.

cipitr's Projects

cipitr icon cipitr

Complex Imperative Program Induction from Terminal Rewards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.