Git Product home page Git Product logo

dpar's Introduction

dpar

Introduction

dpar is a neural network transition-based dependency parser. The original Go version can be found in the oldgo branch.

Dependencies

Build-time

Run-time

  • Tensorflow

Building dpar

To compile and install dpar, run the following in the main project directory:

cargo install --path dpar-utils

To do a debug build and run unit tests, run cargo build in the main project directory. To generate API documentation, run cargo doc.

dpar's People

Contributors

danieldk avatar divefish avatar sebpuetz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dpar's Issues

Feature name restricted to ASCII_ALPHANUMERIC

The feature definition file is restricted to ASCII_ALPHANUMERIC feature names. I don't know if that's wanted, but there are probably cases where people (me) have features with underscores or some other non-ascii-alphanumeric chars.

Quick fix for this would be replace ASCII_ALPHANUMERIC in the grammar definition file by a newly defined char rule:

feature_name = ${ char+ }
char = _{ !(WHITESPACE | "|" | ":" ) ~ ANY }

or a more defensive change to feature_name by whitelisting non-ascii-alphanumeric letters explicitly

feature_name = ${ (ASCI_ALPHANUMERIC | "_" | "-" )+ }

Reduce memory use during training

We vectorize all the data before optimizing the graph. This worked fine when we were just storing indices, but now that we store embeddings for embedding layers, memory use is getting out of hand (~30GB on TüBa-D/Z).

I guess we we should generate the batches on the fly instead.

Replace various lookups by one data structure

There is a lot of overlap between the transition-specific lookup and feature table lookup. This can be factored out to one class that replaces Numberer.

Maybe this should be a separate crate, because it is generally useful.

Support pseudo-projective parsing

Currently, dpar assumes that all dependencies are projective, even though they are read from the non-projective column.

Support for pseudo-projective parsing should be added to deal with non-projective structures.

Make the features field addressable

The CoNLL-X features field is often used for e.g. adding a morphological analyses for tokens. Since these are effective parsing features, they should be addressable too.

We probably want to put some restrictions on the permitted formatting of the field. E.g. using the vertical bar (|) to separate features seems to be more or less standardized. However what the actual features look like isn't.

For our purpose it would be best if these were attribute-values (e.g. with the format a=v or a:v), which would allow addressing specific features. For instance:

STACK 0 FEATURE number STACK 1 FEATURE number

Rewrite addressed value parser in nom

Ragel dropped support for all languages outside C/C++/ASM. To change/regenerate the addressed value parser, one has to manually compile an old Ragel version.

It would be nicer to just switch to nom, which is Rust-native and well-maintained.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.