nkrnrnk / bertpunc Goto Github PK

View Code? Open in Web Editor NEW

179.0 179.0 43.0 230 KB

SOTA punctation restoration (for e.g. automatic speech recognition) deep learning model based on BERT pre-trained model

License: Apache License 2.0

Python 6.02% Jupyter Notebook 93.98%

bertpunc's People

Contributors

Stargazers

Watchers

bertpunc's Issues

Error in train.py

Hi, I think there is a bug in the train.py file, just when the main function starts the variable punctuation_enc is defined twice as you can see below. The second definition needs to be commented out in order to use train the model with LREC dataset.

   punctuation_enc = {
        'O': 0,
        'COMMA': 1,
        'PERIOD': 2,
        'QUESTION': 3
    }

    punctuation_enc = {
        'O': 0,
        'PERIOD': 1,
    }

How the function `insert_target()` in data.py works?

Question:

I found if using insert_target() in data.py, the input data will be split to many sequences which have a lot of overlapping words to each other.

I would like to know why process like this? I think it makes a lot of repeating data.

Data Format?

@nkrnrnk : Could you please add the format for the input data?

datasets

Could you please upload an example of the datasets you load in train.py, lines 190-192?

Is BertPunc Code available in Tensorflow

Given train and evaluate code is in pytorch which is difficult to someone like me to read or convert into Tensorflow since, with pytorch it goes OOM error.

How apply inference on text of length less than then segment size?

Hi,
I have trained the model with segment size of 32. Now i want to apply the inference on unpuntucated text of length less than the segment size. I got struck here, can anyone help me for the same.

Thanks in advance,
Venkatesh

Issue running in Colab in April 2020

Just fyi for future users, I got this code running in April 2020 in a colab notebook by reverting to some earlier versions of libraries. I'm not sure what was originally used, so I was guessing based on the original code being from ~March 2019.
!pip install -q torch==1.0.0 torchvision==0.2.0
!pip install pytorch_pretrained_bert==0.5.0

Warning! I don't know if it actually worked as a match to the original experiment since I don't have an exact dataset match.

	COMMA	PERIOD	QUESTION
0.062041	0.063562	0.001647	0.042417
0.307018	0.231150	0.171429	0.236532
0.103223	0.099707	0.003264	0.068731

(for test2011asr)

Could you please tell me why the results are so bad?

nkrnrnk / bertpunc Goto Github PK

bertpunc's People

Contributors

Stargazers

Watchers

Forkers

bertpunc's Issues

Error in train.py

How the function `insert_target()` in data.py works?

Data Format?

datasets

Is BertPunc Code available in Tensorflow

How apply inference on text of length less than then segment size?

Issue running in Colab in April 2020

Pre-trained weights

Missing dataset

Results

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent