`Mathematical Token Definition Extraction`

Introduction

This work describes and answers a problem in Mathematical Named-Entity Recognition: Given a mathematical object and the context in which it, can we extract its definition? This repo contains the Mathematical Token Definition Extraction (MTDE) dataset as well as the implemetations of five different neural defintion extraction models. [1] is a peer-reviewed full analysis on the MTDE problem and how the models perform on the MTDE dataset.

The Dataset

The MTDE dataset contains around 10,000 entries of variable names, the contexts in which they are defined, their ‘short’ definitions and their ’long’ definition. Here, a short defintion is a 1-word-long definition and a long definition is a one-or-more-word-long definition. The data was collected from a random sampling of mathematical and scientific arXiv preprint manuscripts. The manuscripts cover a wide range of mathematic and scientific disciplines including Physics, Computer Science, and Biology. Candidate data was generated via a corpus crawler and then pruned and cleaned manually.

The Models

In this repo, the following models are implemented in jypter notebook tutorials:

Vanilla Seq2Seq
Transformer Seq2Seq
Pointer Network
Match-LSTM
BERT (Huggingface's BertForQuestionAnswering)

These tutorials are aimed to throughly explain the mechanisms behind each mathematical defintion extraction model examined in [1] as well as serve as a blueprint for future experiements on this problem.

Contact

Email: [email protected]

Please message me with any feedback or errors you may find! Any help is appriciated :)

Notes

There is a small error in right-most subfigure of figure 1 in [1]. The correct figure should be:

References

[1] Hamel, E., Zheng, H., & Kani, N. (2022). An Evaluation of NLP Methods to Extract Mathematical Token Descriptors. In International Conference on Intelligent Computer Mathematics (pp. 329-343). Springer, Cham.

emhamel / mathematical-text-understanding Goto Github PK

mathematical-text-understanding's Introduction

`Mathematical Token Definition Extraction`

Introduction

The Dataset

The Models

Contact

Notes

References

mathematical-text-understanding's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent