hbuschme / textgridtools Goto Github PK

View Code? Open in Web Editor NEW

121.0 9.0 30.0 604 KB

Read, write, and manipulate Praat TextGrid files with Python

License: GNU General Public License v3.0

Python 100.00%

praat textgrid python elan annotation data-analysis linguistics

textgridtools's People

Contributors

Stargazers

Watchers

textgridtools's Issues

Minor difference in long textgrid output for point tiers

This is a minor detail, but when comparing the long output from TextGridTools to Praat, I found a difference for point tiers:

[...]
		xmax = 1.283265306122449
		intervals: size = 6
		points [1]:
			number = 0.10218212453545583
			mark = "The"
[...]

while Praat has points: size = 6.

Reading it into Praat is fine, since Praat seems to skip over the rest of the text until it finds a number: see https://github.com/praat/praat/blob/master/sys/Collection.cpp#L98 or https://github.com/praat/praat/blob/master/sys/Collection.cpp#L134.

But it might become a potentially annoying mistake if things change in the future inside Praat.

The line in TextGridTools seems to be this one, being added for both IntervalTiers and PointTiers: https://github.com/hbuschme/TextGridTools/blob/master/tgt/io3.py#L268

How do we modify an existing textgrid?

Hello,

Here is my issue. I have a text grid file where the end time is "wrong", and I want to edit it. Is there a way to do this with TGT?
I have directly tried to set the end_time attribute of the TGT object, but it does not work. I can't find a workaround that would work for me directly without creating a text grid from scratch.

Thanks for your help

Problem reading "ooTextFile short" short textgrids

There is alternative (older?) short TextGrid format that starts with:

File type = "ooTextFile short"
"TextGrid"

This format is used by Penn Forced Aligner and presumably some other software. read_textgrid is not processing them.

I will submit a pull request.

Feature

cannot write_to_file to the same file opened before

I opened a TextGrid file a.TextGrid, changed something, then I wanted to write back to a.TextGrid. I got an error that permission denied. I think maybe because a.TextGrid was still open. but I cannot find a function to close the file object. how can I do to overwrite the original file?

collections.Sequence is deprecated

A passed parameter tiers is checked to be a sequence here:

TextGridTools/tgt/core.py

Line 60 in 3a441f3

if isinstance(tiers, collections.Sequence):

collections.Sequence is deprecated and no longer works in Python 3.10 and up. Changing it to collections.abc.Sequence fixes the problem, but will break compatibility with Python 2, if that is still a concern?

If it is, Sequence could be imported like here: https://stackoverflow.com/a/53978543

Sorting tiers with concatenate

When concatenating TextGrids it appears that tiers are re-sorted in alphabetical order. Is there any way to disable this?

[bug] Tier entries that have blank labels are not read

Like in timmahrt/praatIO#29

Loading fails if `DEL` present in multi-line interval annotation

The following TextGrid does not load:
DEL.TextGrid.txt

If either the DEL (ASCII 7f) character or the newline is removed from the annotation, the file loads fine, but if both are present I get

>>> tgt.io.read_textgrid('DEL.TextGrid')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/daniel/.local/lib/python3.8/site-packages/tgt/io3.py", line 51, in read_textgrid
    return read_long_textgrid(filename, stg, include_empty_intervals)
  File "/home/daniel/.local/lib/python3.8/site-packages/tgt/io3.py", line 158, in read_long_textgrid
    num_obj = int(get_attr_val(stg[index + 5]))
IndexError: list index out of range

Problem with shifting

If I run this script with last line commented I dont get an error.
The error seems to be raised on tgt.write_to_file. which is confusing. I guess it should be raised already on tier.add_interval()

Overlapping annotation on a tier

Currently TextGridTools does not support overlapping annotations on a single Tier. In my opinion this behaviour is reasonable. Overlapping annotations do not make sense and cannot be represented in the TextGrid file format.

Recently, however, I came across an ELAN file (example file) with overlapping annotations. These cannot be created in ELAN, but ELAN is able to open them without a problem and preserves them when saving a file containing them. I was not even able to load the file using TextGridTools because we are very strict. My question now is whether we should add an option that relaxes this constraint when loading ELAN files (it could for example result in a warning and move the overlapping annotation boundary).

function misspelled in init.py

tgt/__init__.py: 'read_textgird', 'read_eaf', 'write_to_file',

Use a proper parser to read TextGrid files

tgt.write_to_file issue

As shown below, I tried to loop through the directory of TextGrid files, extract the first and second tier of each TextGrid and save them as designed file name:
`import tgt
import os

os.chdir('/Users/ziweizh/Documents/ISU_ALT/2017_Spring/ENGL_515/Final_Project/RSMTool/Feature_Extraction/Prosody/MFA_output')
tg_list = os.listdir('/Users/ziweizh/Documents/ISU_ALT/2017_Spring/ENGL_515/Final_Project/RSMTool/Feature_Extraction/Prosody/MFA_output')[1:-1]

for i in tg_list:
tg = tgt.read_textgrid(i)
word_tier = tg.get_tier_by_name('words')
phone_tier = tg.get_tier_by_name('phones')
os.chdir('/Users/ziweizh/Documents/ISU_ALT/2017_Spring/ENGL_515/Final_Project/RSMTool/Feature_Extraction/Prosody/TG_MFA')
tgt.write_to_file(word_tier,i[:5]+'-'+'word'+'.'+'TextGrid',format = "short")
tgt.write_to_file(phone_tier,i[:5]+'-'+'phone'+'.'+'TextGrid',format = "short")
`

But when I an error message as shown below:
AttributeError: 'Interval' object has no attribute 'tier_type'

Is it because, unlike TextGrid object, an interval tier cannot be written to file?

tgt should provide information about the location of errors occur when loading files

It would be helpful if TextGridTools would provide more information when it encounters an error when loading TextGrid/ELAN files, e.g., the type of error and the line number on which it occurred.

Make annotations immutable objects

Annotations should be immutable objects as changing start and end times may currently result in inconsistent representations of tiers (see Issue #7).

Is there any sample for writing meta data into textgrid file ?

Hello, I am working on music transcriptions. I have some data which contain "start, end, word" record like that:

23.970000	24.146741	life
24.323482	24.500223	is
24.676964	24.853704	a
25.030445	25.383927	mo
25.383927	25.737409	ment
25.737409	25.914150	in

Can I create TextGrid ojbect and write them to .textgrid file and visualize in Praat ? Thanks a lot.

Extracting and creating new textgrid from existing textgrid file

Hello,

An interesting feature of the library is the possibility to extract parts from an existing textgrid file. But the feature does not seem to work. I systematically get an AttributeError: 'IntervalTier' object has no attribute 'extract_part'.

Is this normal? How can I make it work?

By recreating the feature using your written function, I have the correct behavior, but with a textgrid that is not nice as the original textgrid. I would like to preserve all other properties of the original textgrid file, but only extracting intervals I need.

Thanks

Cannot concatenate TextGrids despite identical number and name of tiers

I have two small TextGrids tg1 and tg2. They have the same number of tiers and tier names, but the concatenate code fails and raises an error.

tg1 = tgt.read_textgrid(file1)
tg2 = tgt.read_textgrid(file2)
print(len(tg1.tiers) == len(tg2.tiers))
print(tg1.get_tier_names() == tg2.get_tier_names())
tg = tgt.util.concatenate_textgrids([tg1,tg2])

Result:

/usr/local/lib/python3.10/dist-packages/tgt/util.py in concatenate_textgrids(textgrids, ignore_nonmatching_tiers, use_absolute_time)
151 if (not ignore_nonmatching_tiers and
152 not all([len(common_tiers) == len(tg) for tg in textgrids])):
--> 153 raise TextGridToolsException(
154 'Different numbers of tiers or non-matching tier names.')
155 ccd_tiers = {}
TextGridToolsException: Different numbers of tiers or non-matching tier names.

I get the same error even when I concatenate a file with itself:

9 tg = tgt.util.concatenate_textgrids([tg1,tg1])

Python2 with data containing umlauts – UnicodeEncodeError: 'ascii' codec can't encode character ... ordinal not in range(128)

If you get this error with python2, switch to python3, as that defaults internally to utf-8, and seems to work.

I was trying to open utf-8 TextGrid files, which have Finnish umlauts (ä,ö, and å for Swedish). Non-umlaut files work fine.

TextGridTools' internal representation(?) of ascii range(128) excludes the umlauts somewhere on the way.

Hence, alas:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 69: ordinal not in range(128)

import tgt
tg=tgt.read_textgrid('t.tg')
tg.get_tier_names()
Out[55]: [u'utterance', u'word', u'morph', u'phone']

In [57]: tg.get_tier_by_name(u'phone')
Out[57]: ---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/IPython/core/formatters.pyc in call(self, obj)
697 type_pprinters=self.type_printers,
698 deferred_pprinters=self.deferred_printers)
--> 699 printer.pretty(obj)
700 printer.flush()
701 return stream.getvalue()

/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
401 if cls is not object
402 and callable(cls.dict.get('repr')):
--> 403 return _repr_pprint(obj, self, cycle)
404
405 return _default_pprint(obj, self, cycle)

/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/IPython/lib/pretty.pyc in repr_pprint(obj, p, cycle)
701 """A pprint that just redirects to the normal repr function."""
702 # Find newlines and replace them with p.break()
--> 703 output = repr(obj)
704 for idx,output_line in enumerate(output.splitlines()):
705 if idx:

/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/tgt/core.pyc in repr(self)
461 def repr(self):
462 return '{0}(start_time={1}, end_time={2}, name="{3}", objects={4})'.format(self.class.name,
--> 463 self.start_time, self.end_time, self.name, self._objects)
464
465

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 24: ordinal not in range(128)

--- clip: t.tg ---

item[4]:
    class = "IntervalTier"
    name = "phone"
    xmin = 0.000
    xmax = 33.024
    intervals: size = 174

    intervals [47]:
        xmin = 3.928
        xmax = 4.112
        text = "ä"

   intervals [118]:
        xmin = 9.232
        xmax = 9.568
        text = "ö"

tgt.io.write_to_file does not escape double quotes

The following code creates a TextGrid file that cannot be read by praat:

import tgt

tg = tgt.TextGrid()
tier = tgt.IntervalTier(name="tiername")
iv = tgt.Interval(0, 1, '"')
tier.add_annotation(iv)
tg.add_tier(tier)
tgt.io.write_to_file(textgrid=tg, filename="invalid.TextGrid", format="long")

Tier.repr fails on unicode names

Since my test suite is still (trying to) support Python 2 until the bitter end, I ran into the problem of that str(tier) results in a UnicodeEncodeError when containing annotations with unicode-only characters (similar to #15), since the format string in Tier.__repr__ is not a unicode string:

TextGridTools/tgt/core.py

Lines 461 to 463 in 52819ef

 def __repr__(self): 

 return '{0}(start_time={1}, end_time={2}, name="{3}", objects={4})'.format(self.__class__.__name__, 

 self.start_time, self.end_time, self.name, self._objects)

However, when I tried to patch my tests to work around it, I ran into the opposite problem, where Annotation, Interval, and Point do have unicode literals:

TextGridTools/tgt/core.py

Lines 615 to 617 in 52819ef

 def __repr__(self): 

 return u'Annotation({0}, {1}, "{2}")'.format(self.start_time, 

 self.end_time, self.text)

I am fully aware that supporting both Python 2 as well as 3 for these kinds of things is a mess and that there's only half a year of Python 2 support left, but I thought I'd report this minor inconsistency between Tier.__repr__ and Annotation.__repr__, just in case. No real rush to get it fixed, as far as I'm concerned, though.

	def __repr__(self):
	return '{0}(start_time={1}, end_time={2}, name="{3}", objects={4})'.format(self.__class__.__name__,
	self.start_time, self.end_time, self.name, self._objects)

	def __repr__(self):
	return u'Annotation({0}, {1}, "{2}")'.format(self.start_time,
	self.end_time, self.text)

hbuschme / textgridtools Goto Github PK

textgridtools's People

Contributors

Stargazers

Watchers

Forkers

textgridtools's Issues

Recommend Projects

Recommend Topics

Recommend Org