hbuschme / textgridtools Goto Github PK
View Code? Open in Web Editor NEWRead, write, and manipulate Praat TextGrid files with Python
License: GNU General Public License v3.0
Read, write, and manipulate Praat TextGrid files with Python
License: GNU General Public License v3.0
This is a minor detail, but when comparing the long output from TextGridTools to Praat, I found a difference for point tiers:
[...]
xmax = 1.283265306122449
intervals: size = 6
points [1]:
number = 0.10218212453545583
mark = "The"
[...]
while Praat has points: size = 6
.
Reading it into Praat is fine, since Praat seems to skip over the rest of the text until it finds a number: see https://github.com/praat/praat/blob/master/sys/Collection.cpp#L98 or https://github.com/praat/praat/blob/master/sys/Collection.cpp#L134.
But it might become a potentially annoying mistake if things change in the future inside Praat.
The line in TextGridTools seems to be this one, being added for both IntervalTiers and PointTiers: https://github.com/hbuschme/TextGridTools/blob/master/tgt/io3.py#L268
Hello,
Here is my issue. I have a text grid file where the end time is "wrong", and I want to edit it. Is there a way to do this with TGT?
I have directly tried to set the end_time attribute of the TGT object, but it does not work. I can't find a workaround that would work for me directly without creating a text grid from scratch.
Thanks for your help
There is alternative (older?) short TextGrid format that starts with:
File type = "ooTextFile short"
"TextGrid"
This format is used by Penn Forced Aligner and presumably some other software. read_textgrid
is not processing them.
I will submit a pull request.
I opened a TextGrid file a.TextGrid, changed something, then I wanted to write back to a.TextGrid. I got an error that permission denied. I think maybe because a.TextGrid was still open. but I cannot find a function to close the file object. how can I do to overwrite the original file?
A passed parameter tiers
is checked to be a sequence here:
Line 60 in 3a441f3
collections.Sequence
is deprecated and no longer works in Python 3.10 and up. Changing it to collections.abc.Sequence
fixes the problem, but will break compatibility with Python 2, if that is still a concern?
If it is, Sequence could be imported like here: https://stackoverflow.com/a/53978543
When concatenating TextGrids it appears that tiers are re-sorted in alphabetical order. Is there any way to disable this?
Like in timmahrt/praatIO#29
The following TextGrid does not load:
DEL.TextGrid.txt
If either the DEL
(ASCII 7f
) character or the newline is removed from the annotation, the file loads fine, but if both are present I get
>>> tgt.io.read_textgrid('DEL.TextGrid')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/daniel/.local/lib/python3.8/site-packages/tgt/io3.py", line 51, in read_textgrid
return read_long_textgrid(filename, stg, include_empty_intervals)
File "/home/daniel/.local/lib/python3.8/site-packages/tgt/io3.py", line 158, in read_long_textgrid
num_obj = int(get_attr_val(stg[index + 5]))
IndexError: list index out of range
If I run this script with last line commented I dont get an error.
The error seems to be raised on tgt.write_to_file
. which is confusing. I guess it should be raised already on tier.add_interval()
Currently TextGridTools does not support overlapping annotations on a single Tier. In my opinion this behaviour is reasonable. Overlapping annotations do not make sense and cannot be represented in the TextGrid file format.
Recently, however, I came across an ELAN file (example file) with overlapping annotations. These cannot be created in ELAN, but ELAN is able to open them without a problem and preserves them when saving a file containing them. I was not even able to load the file using TextGridTools because we are very strict. My question now is whether we should add an option that relaxes this constraint when loading ELAN files (it could for example result in a warning and move the overlapping annotation boundary).
tgt/__init__.py: 'read_textgird', 'read_eaf', 'write_to_file',
As shown below, I tried to loop through the directory of TextGrid files, extract the first and second tier of each TextGrid and save them as designed file name:
`import tgt
import os
os.chdir('/Users/ziweizh/Documents/ISU_ALT/2017_Spring/ENGL_515/Final_Project/RSMTool/Feature_Extraction/Prosody/MFA_output')
tg_list = os.listdir('/Users/ziweizh/Documents/ISU_ALT/2017_Spring/ENGL_515/Final_Project/RSMTool/Feature_Extraction/Prosody/MFA_output')[1:-1]
for i in tg_list:
tg = tgt.read_textgrid(i)
word_tier = tg.get_tier_by_name('words')
phone_tier = tg.get_tier_by_name('phones')
os.chdir('/Users/ziweizh/Documents/ISU_ALT/2017_Spring/ENGL_515/Final_Project/RSMTool/Feature_Extraction/Prosody/TG_MFA')
tgt.write_to_file(word_tier,i[:5]+'-'+'word'+'.'+'TextGrid',format = "short")
tgt.write_to_file(phone_tier,i[:5]+'-'+'phone'+'.'+'TextGrid',format = "short")
`
But when I an error message as shown below:
AttributeError: 'Interval' object has no attribute 'tier_type'
Is it because, unlike TextGrid object, an interval tier cannot be written to file?
It would be helpful if TextGridTools would provide more information when it encounters an error when loading TextGrid/ELAN files, e.g., the type of error and the line number on which it occurred.
Annotations should be immutable objects as changing start and end times may currently result in inconsistent representations of tiers (see Issue #7).
Hello, I am working on music transcriptions. I have some data which contain "start, end, word" record like that:
23.970000 24.146741 life
24.323482 24.500223 is
24.676964 24.853704 a
25.030445 25.383927 mo
25.383927 25.737409 ment
25.737409 25.914150 in
Can I create TextGrid ojbect and write them to .textgrid
file and visualize in Praat ? Thanks a lot.
Hello,
An interesting feature of the library is the possibility to extract parts from an existing textgrid file. But the feature does not seem to work. I systematically get an AttributeError: 'IntervalTier' object has no attribute 'extract_part'
.
Is this normal? How can I make it work?
By recreating the feature using your written function, I have the correct behavior, but with a textgrid that is not nice as the original textgrid. I would like to preserve all other properties of the original textgrid file, but only extracting intervals I need.
Thanks
I have two small TextGrids tg1 and tg2. They have the same number of tiers and tier names, but the concatenate code fails and raises an error.
tg1 = tgt.read_textgrid(file1)
tg2 = tgt.read_textgrid(file2)
print(len(tg1.tiers) == len(tg2.tiers))
print(tg1.get_tier_names() == tg2.get_tier_names())
tg = tgt.util.concatenate_textgrids([tg1,tg2])
Result:
/usr/local/lib/python3.10/dist-packages/tgt/util.py in concatenate_textgrids(textgrids, ignore_nonmatching_tiers, use_absolute_time)
151 if (not ignore_nonmatching_tiers and
152 not all([len(common_tiers) == len(tg) for tg in textgrids])):
--> 153 raise TextGridToolsException(
154 'Different numbers of tiers or non-matching tier names.')
155 ccd_tiers = {}
TextGridToolsException: Different numbers of tiers or non-matching tier names.
I get the same error even when I concatenate a file with itself:
9 tg = tgt.util.concatenate_textgrids([tg1,tg1])
If you get this error with python2, switch to python3, as that defaults internally to utf-8, and seems to work.
I was trying to open utf-8 TextGrid files, which have Finnish umlauts (ä,ö, and å for Swedish). Non-umlaut files work fine.
TextGridTools' internal representation(?) of ascii range(128) excludes the umlauts somewhere on the way.
Hence, alas:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 69: ordinal not in range(128)
import tgt
tg=tgt.read_textgrid('t.tg')
tg.get_tier_names()
Out[55]: [u'utterance', u'word', u'morph', u'phone']
In [57]: tg.get_tier_by_name(u'phone')
Out[57]: ---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/IPython/core/formatters.pyc in call(self, obj)
697 type_pprinters=self.type_printers,
698 deferred_pprinters=self.deferred_printers)
--> 699 printer.pretty(obj)
700 printer.flush()
701 return stream.getvalue()
/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/IPython/lib/pretty.pyc in pretty(self, obj)
401 if cls is not object
402 and callable(cls.dict.get('repr')):
--> 403 return _repr_pprint(obj, self, cycle)
404
405 return _default_pprint(obj, self, cycle)
/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/IPython/lib/pretty.pyc in repr_pprint(obj, p, cycle)
701 """A pprint that just redirects to the normal repr function."""
702 # Find newlines and replace them with p.break()
--> 703 output = repr(obj)
704 for idx,output_line in enumerate(output.splitlines()):
705 if idx:
/home/jh/miniconda3/envs/deep-voice-conversion/lib/python2.7/site-packages/tgt/core.pyc in repr(self)
461 def repr(self):
462 return '{0}(start_time={1}, end_time={2}, name="{3}", objects={4})'.format(self.class.name,
--> 463 self.start_time, self.end_time, self.name, self._objects)
464
465
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 24: ordinal not in range(128)
--- clip: t.tg ---
item[4]:
class = "IntervalTier"
name = "phone"
xmin = 0.000
xmax = 33.024
intervals: size = 174
intervals [47]:
xmin = 3.928
xmax = 4.112
text = "ä"
intervals [118]:
xmin = 9.232
xmax = 9.568
text = "ö"
``
The following code creates a TextGrid file that cannot be read by praat:
import tgt
tg = tgt.TextGrid()
tier = tgt.IntervalTier(name="tiername")
iv = tgt.Interval(0, 1, '"')
tier.add_annotation(iv)
tg.add_tier(tier)
tgt.io.write_to_file(textgrid=tg, filename="invalid.TextGrid", format="long")
Since my test suite is still (trying to) support Python 2 until the bitter end, I ran into the problem of that str(tier)
results in a UnicodeEncodeError
when containing annotations with unicode-only characters (similar to #15), since the format string in Tier.__repr__
is not a unicode string:
Lines 461 to 463 in 52819ef
However, when I tried to patch my tests to work around it, I ran into the opposite problem, where Annotation
, Interval
, and Point
do have unicode literals:
Lines 615 to 617 in 52819ef
I am fully aware that supporting both Python 2 as well as 3 for these kinds of things is a mess and that there's only half a year of Python 2 support left, but I thought I'd report this minor inconsistency between Tier.__repr__
and Annotation.__repr__
, just in case. No real rush to get it fixed, as far as I'm concerned, though.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.