roxana-lafuente / mttt Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 5.0 3.61 MB

Machine Translation Training Tool (MTTT): Machine translation made easy for human translators!

License: GNU General Public License v3.0

Python 63.63% Perl 0.49% Java 4.94% Makefile 0.27% CSS 23.75% Shell 0.03% HTML 6.83% Batchfile 0.05%

human-translators machine-translation moses portable python

mttt's People

Contributors

Stargazers

Watchers

Forkers

paulaestrella miguelemosreverte lulzzz

mttt's Issues

Program should start on "Corpus Preparation" tab

Program should start on "Corpus Preparation" tab, instead it is starting on "Machine Translation" tab

Closing the program sometimes generates the following error

Traceback (most recent call last): File "main.py", line 1119, in final_responsabilities self.PostEditing.saveChangedFromPostEditing() File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 319, in saveChangedFromPostEditing self.show_the_available_stats() File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 245, in show_the_available_stats insertions = self.calculate_insertions_per_segment()[0] File "/home/migue/Desktop/TTT_GTK/post_editing.py", line 165, in calculate_insertions_per_segment percentaje_spent_by_segment=self.tables["translation_table"].calculate_insertions_or_deletions_percentajes(False) File "/home/migue/Desktop/TTT_GTK/table.py", line 401, in calculate_insertions_or_deletions_percentajes modified_segments = map(str.strip, modified_segments) TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'

where the main thing to be asked is, sure it is important, a "final responsibility", of the program, to save unsaved changes. But how on earth is it important, or even relevant, to call for the calculation of the statistics during closing?

Of course this rant is against code of my own, thus the complete lack of tact shown in the criticism. Will be fixed soon.

In "Evaluation" tab, problem with HTER and GTM

Both metrics are not defined. I think this was working before.

Output:

HTER.....
GTM.....

In "Evaluation" tab, output directory expects a file but is asking for a directory.

Which one is correct? Does it need a directory or a file?

The source and target on the HTML statistics are always the same

The directory dialogs, when canceled, changes the directory anyway

Steps to reproduce:
1.Select a directory as normal, one that is not the default
2.Re open the dialog, and close
Now the default directory will have replaced the old one you selected on the first step.

In "Post editing" tab, after modifying a part, color is not persistent.

I modified the text, it changes to blue but after moving to another segment, it goes back to white again.

Moses

Hdhhdhshhs

When performing a MT, the output is saved in the /home dir

Instead, it should be saved in the output directory.

When a modified segment gets completely erased, it is ignored by the deletion stats

>& (bash only) can be changed to 2>&1 so that is works in dash

Why sometimes the deletions and insertions differ

Say I have a log where I saved a ton of u's in a segment:

When I come back in a different TTT session, and add another "u" in the middle of the u's, the statistics do not show any insertion made to the segment

In evaluation script when it tries to save the output, crash ocurrs

It seems as if it was trying to use the output directory as a valid filename, and failing in the process.
Addinga proper filename should do the trick. ie.: evaluation_output_filename = output_directory + "/evaluation_output.txt"

Assertion Error in Machine Translation tab

When using the machine translation tab wrong, that is straight up selecting a file and asking for it to be translated, the following error shows:


Traceback (most recent call last):
  File "main.py", line 774, in _machine_translation
    adapt_path_for_cygwin(self.is_windows, self.output_text.get_text()) + "/train/model/moses.ini",
  File "/home/migue/TTT/constants.py", line 39, in adapt_path_for_cygwin
    assert len(directory) > 0
AssertionError

Possible solutions:
Maybe a friendlier error message could be shown?
Not to allow the user to ask for translation if its going to cause an error

Issue with GIZA 64bits

When using GIZA 64 bits, you need to add option -mgiza -mgiza-cpus 2 to the command moses-64bit/scripts/training/train-model.perl. Otherwise it won't find the path for the translation model.

In "Evaluation" tab, problem with HTER and GTM

When you choose two identical files (source text and reference), HTER and GTM are empty. Should it show an error message or a value?

In "Evaluation" tab, start evaluation exception

When you do not choose a file but choose (WER, BLEU, PER, HTER, BLEU3GRAM, BLEU4GRAM, GTM) and click on "Start Evaluation" button
Info:

Traceback (most recent call last):
File "main.py", line 941, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: ''

In "Evaluation" tab, need to check if files exist

When the path to source text and/or reference text does not exist, the program breaks.

Traceback:
Traceback (most recent call last):
File "main.py", line 944, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: '5e/corpus/translate.en'

Traceback (most recent call last):
File "main.py", line 944, in _evaluate
self.evaluation_reference.get_text())
File "/home/rlafuente/TTT/evaluation.py", line 89, in evaluate
key = (test,creation_date(test),reference,creation_date(reference), checkbox_indexes_constants[checkbox_index])
File "/home/rlafuente/TTT/evaluation.py", line 37, in creation_date
stat = os.stat(path_to_file)
OSError: [Errno 2] No existe el archivo o el directorio: '/home/rlafuente/cor5ource.en'

When saving no changes the time_per_segment statistics calculations blows up

Solution 1: Do not, ever let the save button be shown if no changes have been done: The save button, when clicked not only saves, but also calculates the statistics to see which one can be shown. Insertion and deletion stats can be calculated on empty changes, but the same cannot be said about the time_per_segment stats, because to calculate it it tries to access the last key of the self.source_log dictionary. The last modification, in other words. And when doing so, when no changes have been added to the log, no value can be accessed from the keys() array, blowing up.

Solution 2: See if self.source_log.keys() is not empty before accessing it.

The statistics can only be seen after saving, else the button is unresponsive

Because the original idea was to have the statistic menu only pop up when saving, now that it is always visible after the first save, it is evident its uncooperativeness when it comes to react to the user input. It just does not want to cooperate, and wont show anything until something is saved once more.

Well, to be fair with the button, its behavior obeys the idea that no dated, obsolete, statistics should ever reach the user eye. So with the point given, I believe the first solution to the problem would be to unlock it for the user to use, and to have it save the information itself, once used.

Machine translation tab error

It is not performing the machine translation.

Info:
Traceback (most recent call last):
File "main.py", line 765, in _machine_translation
self._has_empty_last_line(in_file):
File "main.py", line 753, in _has_empty_last_line
last_line_is_empty = "\n" in (f.readlines()[-1])
IndexError: list index out of range

After creating a model in "Machine Translation" tab, it goes back to the "Corpus Preparation" tab.

After creating a model in "Machine Translation" tab, it goes back to the "Corpus Preparation" tab. However, it should stay on the "Machine Translation" tab.

In "Post Editing" tab, problem with term search.

Term search is not working on ubuntu. Pressing enter is doing nothing.