Git Product home page Git Product logo

textgrid's Introduction

textgRid

Build Status Coverage Status

The software application Praat can be used to annotate waveform data (e.g., to mark intervals of interest or to label events). These annotations are stored in a Praat TextGrid object, which consists of Interval Tiers and Point Tiers. An Interval Tier consists of sequential (i.e., not overlapping) labeled intervals. A Point Tier consists of labeled events. The textgRid package provides S4 classes, generics, and methods for accessing annotations stored in Praat TextGrid objects.

Installation

To install the current released version from CRAN:

install.packages('textgRid')

To install the current development version from Github:

devtools::install_github('textgRid', username = 'patrickreidy')

Examples

Read the contents of a .TextGrid file.

textgrid <- TextGrid(system.file('extdata', 'myExample.TextGrid', 
                                 package = 'textgRid'))

Find all labeled intervals or points on a given tier.

# Find all labeled intervals on the $Words IntervalTier.
findIntervals(textgrid$Words)
#   Index StartTime EndTime  Label
# 1     2         1       3 word.1
# 2     4         6       9 word.2

# Find all labeled intervals on the $Phones IntervalTier.
findIntervals(textgrid$Phones)
#   Index StartTime EndTime    Label
# 1     2      1.00    1.50 phone.1a
# 2     3      1.50    2.50 phone.1b
# 3     4      2.50    3.00 phone.1c
# 4     6      6.00    6.75 phone.2a
# 5     7      6.75    7.25 phone.2b
# 6     8      7.25    8.25 phone.2c
# 7     9      8.25    9.00 phone.2d

# Find all intervals associated with word.2 on the $Phones IntervalTier.
findIntervals(textgrid$Phones, pattern = '2')
#   Index StartTime EndTime    Label
# 1     6      6.00    6.75 phone.2a
# 2     7      6.75    7.25 phone.2b
# 3     8      7.25    8.25 phone.2c
# 4     9      8.25    9.00 phone.2d

# Alternatively...
findIntervals(
  tier = textgrid$Phones,
  from = findIntervals(textgrid$Words, pattern = 'word.2')$StartTime,
  to   = findIntervals(textgrid$Words, pattern = 'word.2')$EndTime
)
#   Index StartTime EndTime    Label
# 1     6      6.00    6.75 phone.2a
# 2     7      6.75    7.25 phone.2b
# 3     8      7.25    8.25 phone.2c
# 4     9      8.25    9.00 phone.2d

# Find all labeled points on the $Events PointTier.
findPoints(textgrid$Events)
#   Index Time      Label
# 1     1 6.75  voicingOn
# 2     2 8.25 voicingOff

Coerce a TextGrid object to a data.frame.

as.data.frame(textgrid)
#    TierNumber TierName     TierType Index StartTime EndTime      Label
# 1           1    Words IntervalTier     2      1.00    3.00     word.1
# 2           1    Words IntervalTier     4      6.00    9.00     word.2
# 3           2   Phones IntervalTier     2      1.00    1.50   phone.1a
# 4           2   Phones IntervalTier     3      1.50    2.50   phone.1b
# 5           2   Phones IntervalTier     4      2.50    3.00   phone.1c
# 6           2   Phones IntervalTier     6      6.00    6.75   phone.2a
# 7           2   Phones IntervalTier     7      6.75    7.25   phone.2b
# 8           2   Phones IntervalTier     8      7.25    8.25   phone.2c
# 9           2   Phones IntervalTier     9      8.25    9.00   phone.2d
# 10          3   Events    PointTier     1      6.75    6.75  voicingOn
# 11          3   Events    PointTier     2      8.25    8.25 voicingOff

Write a TextGrid object to a Praat-compatible .TextGrid file.

writeTextGrid(textgrid, path = 'test_out.TextGrid')

Read a TextGrid that contains non-ASCII characters.

# Guess the encoding.
nonASCII <- TextGrid(system.file('extdata', 'nonASCII.TextGrid', package = 'textgRid'),
                     encoding = NULL)

# Or, explicitly provide the (correct) encoding.
nonASCII <- TextGrid(system.file('extdata', 'nonASCII.TextGrid', package = 'textgRid'),
                     encoding = "UTF-16BE")

# An error occurs if the provided encoding is incorrect.
TextGrid(system.file('extdata', 'nonASCII.TextGrid', package = 'textgRid'),
                     encoding = "UTF-8")

# Coerce the TextGrid to a data.frame.
as.data.frame(nonASCII)[1:2, ]
#   TierNumber TierName     TierType Index StartTime EndTime                      Label
# 1          1  Bengali IntervalTier     1         0       1   চকলেট এবং চিনাবাদাম মাখন
# 2          2  Chinese IntervalTier     1         0       1             巧克力和花生醬

# Non-ASCII characters can be used as patterns in searches.
findIntervals(nonASCII$Bengali, pattern = "চকলেট")
#   Index StartTime EndTime                    Label
# 1     1         0       1 চকলেট এবং চিনাবাদাম মাখন

Details on S4 classes

The textgRid package defines four S4 classes, whose slots and accessors are described in the tables below.

Tier

Slot Type Accessor
@name character tierName()
@number integer tierNumber()

IntervalTier (inherits from Tier)

Slot Type Accessor
@startTimes numeric intervalStartTimes()
@endTimes numeric intervalEndTimes()
@labels character intervalLabels()

PointTier (inherits from Tier)

Slot Type Accessor
@times numeric pointTimes()
@labels character pointLabels()

TextGrid

Slot Type Accessor
@.Data list (of IntervalTiers and PointTiers)
@startTime numeric textGridStartTime()
@endTime numeric textGridEndTime()

textgrid's People

Contributors

patrickreidy avatar teebusch avatar tjmahr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

textgrid's Issues

Error with a TextGrid having multiple tiers

When trying to open the attached TextGrid using:
tg = TextGrid("ipiapacs_speaker0006_0005_speech_E.TextGrid")

I get the following error:

Error in mapply(FUN = f, ..., SIMPLIFY = FALSE) : 
  zero-length inputs cannot be mixed with those of non-zero length
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Warning messages:
1: In readLines(file) : line 1 appears to contain an embedded nul
2: In readLines(file) : line 2 appears to contain an embedded nul
[...]

Both with version textgRid_1.0.1 and also with the latest development version (textgRid_1.0.2)

ipiapacs_speaker0006_0005_speech_E.TextGrid.gz

error when opening TextGrid files

We're having trouble opening TextGrid files using textgRid. Here is the output we get with two different approaches:

> library(textgRid)
> textgrid <- TextGrid(system.file('data', 'TatarKQ', 'File001.TextGrid'))
Error in mapply(FUN = f, ..., SIMPLIFY = FALSE) : 
  zero-length inputs cannot be mixed with those of non-zero length
In addition: Warning message:
In file(con, "r") :
  file("") only supports open = "w+" and open = "w+b": using the former
> textgrid <- TextGrid('./data/TatarKQ/File001.TextGrid')
Error in mapply(FUN = f, ..., SIMPLIFY = FALSE) : 
  zero-length inputs cannot be mixed with those of non-zero length

This is using R 3.6.3. Is there something we're doing wrong?

let find_points() and find_intervals() return NA labels by default.

The functions drop all annotations (intervals and points) with NA labels because the default for the pattern argument (used by grep) is "*".

Instead the functions should by default match all intervals and points. Otherwise it will not return (arguably very common) unlabeled points (these may be used e.g. when there is only one type of event in a tier, let's say "conversational turns". The label can then be specified in the tier name, so an additional label for each point is unnecessary). It also does not find unlabeled intervals (which e.g. may be a way to code silence).

This behavior also creates problems further down the line with as.data.frame() so that the NA labeled intervals and points are not present in the data frame. This essentially is a loss of information, because the gaps (at least the gaps at the beginning and end of the tiers) cannot be reliably reconstructed from the data frame. This also means that a TextGrid constructor for data frames will not be able to reconstruct the original TextGrid, i.e. for a textgrid object x it will not be possible to guarantee that TextGrid(as.data.frame(x)) == x

It is always easy for the user to filter NA elements later, but it shouldn't be a default that cannot be circumvented.

Grep will never match NA, regardless of the pattern, so the default has to be pattern = NULL or the like, so that grep can be done conditional on whether a pattern is supplied (e.g. if(is.null(pattern))) or not.

Example of behavior:

textgrid <- TextGrid(system.file('extdata', 'myExample.TextGrid', 
                                 package = 'textgRid'))

findPoints(textgrid@.Data[[3]])
# Index Time      Label
# 1     1 6.75  voicingOn
# 2     2 8.25 voicingOff

textgrid@.Data[[3]]@labels[1] <- NA # set 1st label to NA

findPoints(textgrid@.Data[[3]])
# Index Time      Label
# 1     2 8.25 voicingOff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.