Git Product home page Git Product logo

dialoguediscourseparsing's Introduction

Dialogue Discourse Parsing

Code for our paper:

Zhouxing Shi and Minlie Huang. A Deep Sequential Model for Discourse Parsing on Multi-Party Dialogues. In AAAI, 2019.

@inproceedings{shi2019deep,
  title={A Deep Sequential Model for Discourse Parsing on Multi-Party Dialogues},
  author={Shi, Zhouxing and Huang, Minlie},
  booktitle={AAAI},
  year={2019}
}

Requirements

  • Python 2.7
  • Tensorflow 1.3

Data Format

We use JSON format for data. The data file should contain an array consisting of examples represented as objects. The format for each example object looks like:

{
    // a list of EDUs in the dialogue
    "edus": [ 
        {
            "text": "text of edu 1",
            "speaker": "speaker of edu 1"
        },    
        // ...
    ],
    // a list of relations
    "relations": [
        {
            "x": 0,
            "y": 1,
            "type": "type 1"
        },
        // ...
    ]
}

STAC Corpus

We used the linguistic-only STAC corpus. The latest available verison on the website is stac-linguistic-2018-05-04.zip. It appears that test data is missing in this version. We share the test data we used from the 2018-03-21 version.

To process a raw dataset into our JSON format, run:

python data_pre.py <input_dir> <output_json_file>

There are 1086 dialogues in the training data and 111 dialogues in the test data. This version of dataset and the processing script produce data that are slightly different from those used for our publication.

On the website of STAC, there is a latest version in python pandas dataframes format, but it appears to be more different from the version mentioned above and it is yet unclear to us how this dataset should be processed.

Word Vectors

For pre-trained word verctors, we used GloVe (100d).

How to Run

python main.py {--[option1]=[value1] --[option2]=[value2] ... }

Available options can be found at the top of main.py.

For example, to train the model with default settings:

python main.py --train

Baselines

For baselines (Deep+MST, Deep+ILP, Deep+Greedy), see here.

dialoguediscourseparsing's People

Contributors

shizhouxing avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.