belambert / asr-evaluation Goto Github PK
View Code? Open in Web Editor NEWPython module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
License: Apache License 2.0
Python module for evaluating ASR hypotheses (e.g. word error rate, word recognition rate).
License: Apache License 2.0
When trying to print the table of average WER grouped by reference sentence length, I get the following error:
Traceback (most recent call last):
File "/home/david/miniconda3/bin/wer", line 11, in <module>
load_entry_point('asr-evaluation', 'console_scripts', 'wer')()
File "/home/david/asr-evaluation/asr_evaluation/__main__.py", line 59, in main
other_main(args)
File "/home/david/asr-evaluation/asr_evaluation/asr_evaluation.py", line 82, in main
print_wer_vs_length()
File "/home/david/asr-evaluation/asr_evaluation/asr_evaluation.py", line 371, in print_wer_vs_length
avg_wers = list(map(lambda x: (x[0], mean(x[1])), values))
File "/home/david/asr-evaluation/asr_evaluation/asr_evaluation.py", line 371, in <lambda>
avg_wers = list(map(lambda x: (x[0], mean(x[1])), values))
IndexError: list index out of range
Running Python 3.7.7.
This is a really nice project! I noticed that WRR and SER return wrong values. I installed asr-evaluation.
I have a simple example for reproducing the problem. I also clned this repository, and tested it directly. The returned values looks correct.
Is my usage wrong? Or, some bugs?
i have dog
did you pen
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
i have a dog
do you have a pen
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
hello tom
wer ref.txt hyp.txt Sentence count: 13
WER: 14.286% ( 4 / 28)
WRR: 96.429% ( 27 / 28)
SER: 100.000% ( 13 / 13)
cd asr_evaluation
python __main__.py ../../ref.txt ../../hyp.txt
Sentence count: 13
WER: 12.903% ( 4 / 31)
WRR: 87.097% ( 27 / 31)
SER: 15.385% ( 2 / 13)
Is there Python API that can be called from within Python?
In Sphinx format the hypothesis file has the following form:
hypothesis_text (file_id score)
while transcriptions:
transcription_text (file_id)
So when I run the wer command the following error occurs:
$ wer transcriptions hypothesis -id
Reference and hypothesis IDs do not match! ref="(data_005)" hyp="-7716)"
File lines in hyp file should match those in the ref file.
I think this occurs because you have not take into account the score parameter. It compares the file id of the transcriptions to the score instead of the file id of the hypothesis.
Unable to run for UTF-8 encoding files.
For example, using the attached file as HYP and REF will reproduce the error.
small_trans.txt
It's probably me but with the following two input files:
ref.txt
the crazy frog jumps over the lazy dog
the crazy frog jumps over the lazy dog
the frog jumps over the lazy dog
the crazy frog jumps over the lazy dog
the crazy frog jumps over the lazy dog extended
and hyp.txt
the crazy frog jumps over the lazy dog
the frog jumps over the lazy dog
the crazy frog jumps over the lazy dog
the craz frog jumps over the lazy dog
the crazy frog jumps over the lazy dog
Running the wer
command:
wer ref.txt hyp.txt`
Gives me the result:
Sentence count: 5
WER: 10.000% ( 4 / 40)
WRR: 92.500% ( 37 / 40)
SER: 80.000% ( 4 / 5)
For some reason it would seem that only 4 senteneces in hyp.txt are recognized?
$ pip show asr-evaluation
Name: asr-evaluation
Version: 2.0.2
Summary: Evaluating ASR (automatic speech recognition) hypotheses, i.e. computing word error rate.
Home-page: UNKNOWN
Author: Ben Lambert
Author-email: [email protected]
License: LICENSE.txt
Location: /home/sfalk/miniconda3/envs/t2t/lib/python3.5/site-packages
Requires: termcolor, edit-distance
Required-by:
@belambert
Very useful package, however, there might be a little mistake when calcualtes the WER and WRR if we use python2.7:
I think 'error_count' and 'match_count' should be convert to type 'float' if you use python2.7
print('WRR: {0:f} % ({1:10d} / {2:10d})'.format(100 * match_count / ref_token_count, match_count, ref_token_count))
print('WER: {0:f} % ({1:10d} / {2:10d})'.format(100 * error_count / ref_token_count, error_count, ref_token_count))
Anyway, it's a good package!
Can you name a paper on which this ASR Evaluation is based on?
thank you very much
Can you give me an example of command line usage of this
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.