Git Product home page Git Product logo

dplp's Introduction

RST Parser

1. Required Package

scipy, numpy, sklearn, nltk, python-tk

The last two packages are required to draw the RST tree structure in the PostScript format. It is highly recommended to install these two packages for visualization. Otherwise, you have to read the bracketing results and to imagine what the tree looks like :-)

2. RST Parsing with Raw Documents

To do RST parsing on raw documents, you will need syntactic parses and also the discourse segmentations on the documents. Let's start with preparing data

First, we need to collect all the documents into one folder, as all the following commands will scan this folder and process the documents in the same way. Let's assume the directory is ./data

All the following commands will be run in a batch fasion, which means every command will scan will the documents in the data folder and process them once.

2.1 Data Processing

  1. Run the Stanford CoreNLP with the given bash script corenlp.sh with the command "./corenlp.sh path_to_dplp/data"

    • This is a little awkward, as I am not sure how to call the Stanford parser from any other directory.
  2. Convert the XML file into CoNLL-like format with command "python convert.py ./data"

    • Now, we can come back to the DPLP folder to run this command

2.2 Segmentation and Parsing

  1. Segment each document into EDUs with command "python segmenter ./data"

  2. Finally, run RST parser with command "python rstparser ./data"

    • The RST parser will generate the bracketing file for each document, which can be used for evaluation
    • It will also generate the PostScript file of the RST structure. Generating (and saving) these PS files will slow down the parsing procedure a little. You can turn it off with "python rstparser ./data False"

3. Training Your Own RST Parser

TODO

Reference

Please read the following paper for more technical details

Yangfeng Ji, Jacob Eisenstein. Representation Learning for Text-level Discourse Parsing. ACL 2014

dplp's People

Contributors

elisaf avatar jiyfeng avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.