Git Product home page Git Product logo

tuw-nlp's Introduction

TUW-NLP

NLP utilities developed at TUW informatics.

Install and Quick Start

Install the tuw-nlp repository from pip:

pip install tuw-nlp

Or install from source:

pip install -e .

On Windows and Mac, you might also need to install Graphviz manually.

You will also need some additional steps to use the library:

Download nltk stopwords:

import nltk
nltk.download('stopwords')

Download stanza models for UD parsing:

import stanza

stanza.download("en")
stanza.download("de")

And then finally download ALTO and tuw_nlp dictionaries:

import tuw_nlp

tuw_nlp.download_alto()
tuw_nlp.download_definitions()

Also please make sure to have JAVA on your system to be able to use the parser!

Then you can parse a sentence as simple as:

from tuw_nlp.grammar.text_to_4lang import TextTo4lang

tfl = TextTo4lang("en", "en_nlp_cache")

fl_graphs = list(tfl("brown dog", depth=1, substitute=False))

# Then the fl_graphs will directly contain a networkx graph object
fl_graphs[0].nodes(data=True)

For more examples you can check the jupyter notebook under notebooks/experiment

Services

We also provide services built on our package. To get to know more visit services.

Text_to_4lang service

To run a browser-based demo (also available online) for building graphs from raw texts, first start the graph building service:

python services/text_to_4lang/backend/service.py

Then run the frontend with this command:

streamlit run services/text_to_4lang/frontend/demo.py

In the demo you can parse english and german sentences and you can also try out multiple algorithms our graphs implement, such as expand, substitute and append_zero_paths.

Modules

text

General text processing utilities, contains:

  • segmentation: stanza-based processors for word and sentence level segmentation
  • patterns: various patterns for text processing tasks

graph

Tools for working with graphs, contains:

  • utils: misc utilities for working with graphs

grammar

Tools for generating and using grammars, contains:

  • alto: tools for interfacing with the alto tool
  • irtg: class for representing Interpreted Regular Tree Grammars
  • lexicon: Rule lexica for building lexicalized grammars
  • ud_fl: grammar-based mapping of Universal Dependencies to 4lang semantic graphs.
  • utils: misc utilities for working with grammars

Contributing

We welcome all contributions! Please fork this repository and create a branch for your modifications. We suggest getting in touch with us first, by opening an issue or by writing an email to Gabor Recski or Adam Kovacs at [email protected]

Citing

If you use the library, please cite our paper

@inproceedings{Recski:2021,
  title={Explainable Rule Extraction via Semantic Graphs},
  author={Recski, Gabor and Lellmann, Bj{\"o}rn and Kovacs, Adam and Hanbury, Allan},
  booktitle = {{Proceedings of the Fifth Workshop on Automated Semantic Analysis
of Information in Legal Text (ASAIL 2021)}},
  publisher = {{CEUR Workshop Proceedings}},
  address = {São Paulo, Brazil},
  pages="24--35",
  url= "http://ceur-ws.org/Vol-2888/paper3.pdf",
  year={2021}
}

License

MIT license

tuw-nlp's People

Contributors

adaamko avatar recski avatar gkinga avatar supermx avatar eszti avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.