ropensci / phonfieldwork Goto Github PK

View Code? Open in Web Editor NEW

20.0 5.0 4.0 65.54 MB

R package for phonetic research and experimenting

Home Page: https://docs.ropensci.org/phonfieldwork/

License: GNU General Public License v2.0

R 4.38% TeX 1.14% HTML 94.48%

r phonetics phonology fieldwork praat textgrid elan eaf exmaralda exb

phonfieldwork's Introduction

phonfieldwork

phonfieldwork is a package for phonetic fieldwork research and experiments. This package makes it easier to:

create a html/pptx presentation from stimuli-translation list,
rename soundfiles according to the list of stimuli,
concatenate multiple soundfiles and create a Praat TextGrid which interval labels are the original names of the sound
extract sounds according to annotation
extract annotation from multiple linguistic formats (Praat .TextGrid, ELAN .eaf, EXMARaLDA .exb, Audacity .txt and subtitles .srt)
visualise oscilograms, spectrograms and annotations
create an html viewer like this, ethical problems of this kind of viewer in linguistic research are covered in the vignette vignette("ethical_research_with_phonfieldwork").

For more details see tutorial.

The main goal of the phonfieldwork package is to make the full research workflow, from data collection to data extraction and data representation, easier for people that are not familiar with programming. However, most of the phonfieldwork functionality can be found in other software and packages:

stimuli presentation creation could be done with any programming language and probably without them
automatic file renaming and automatic merging could be done with any programming language
Praat .TextGrid manipulation is possible with Praat, R packages rPraat and textgRid, Python package 'pympi'
ELAN .eaf manipulation is possible with ELAN, R package FRelan and Python package pympi
import and export between Praat .TextGrid, ELAN .eaf, and 'EXMARaLDA .exb is possible with R package act
cutting sounds according to annotation is possible with Praat and the R packagetuneR
spectrogram visualisation is possible with multiple R packages signal, tuneR, seewave, phonTools, monitor, warbleR, soundgen and many others

Installation

Install from CRAN:

install.packages("phonfieldwork")

Get the development version from GitHub:

install.packages("remotes")
remotes::install_github("ropensci/phonfieldwork")

Load the library:

library(phonfieldwork)

In order to work with some rmarkdown functions you will need to install pandoc, see vignette("pandoc") for the details.

Cite the package

You can get the latest information about how to cite the package using the citation() function:

citation("phonfieldwork")
>
> To cite package ‘phonfieldwork’ in publications use:
>   
>   Moroz G (2020). _Phonetic fieldwork and experiments with phonfieldwork package_.
> <https://CRAN.R-project.org/package=phonfieldwork>.
> 
> A BibTeX entry for LaTeX users is
> 
> @Manual{,
>   title = {Phonetic fieldwork and experiments with phonfieldwork package},
>   author = {George Moroz},
>   year = {2020},
>   url = {https://CRAN.R-project.org/package=phonfieldwork},
> }

To do:

export to ELAN and EXMARALDA files
use ELAN and EXMARALDA files in the whole pipeline described in docs
use the same pipeline with video (for Sign Languages)
make TECkit to df and back

phonfieldwork's People

Contributors

Stargazers

Watchers

Forkers

sverhees jwijffels vbunt xenomirant

phonfieldwork's Issues

Cannot allocate vector of size ... Gb

Check it with huge number of files, what is going on there?

The problem came from Margaux Dubuis.

Multi-line views

It is interesting, whether it is useful to have Multi-line views like in Raven

https://ravensoundsoftware.com/video-tutorials/video-tutorials-english/04-multi-line-views/

read.audacity() from seewave

Have a look at this sound annotation format.

pkgdown documentation works locally, but not on github pages

Is it possible to merge concatenate_soundfiles() with different samp.frequency or bit.rate?

`textgrid_to_df` fails when there is \n in annotation

It actually means that the whole ideology of textgrid_to_df should be rewriten without counts of lines...

Create a map with viewer

from and to are inctorectly cut annotation in draw_sound()

`create_viewer()` with part of images or sounds

Would it be possible to create a report where some sounds have images, but others don't? (Or even one where it's just sounds and no pictures, or just pictures and no sounds?)

Thanks to Jonathan Keane

Add an ability to rerwite tier with `df_to_tier` function

encoding autodetection

In textgRid author used the readr package in order to make encoding autodetection. It is better to use uchardet which has less dependencies.

read Short text file TextGrids

thx to Sasha Shiryaev for this problem

It is possible to create another type of TextGrid files with the command Save as short text file in Praat. It would be nice to make phonfieldwork to read them. Here is an example:

File type = "ooTextFile"
Object class = "TextGrid"

0
0.6526757369614512
<exists>
3
"IntervalTier"
"intervals"
0
0.6526757369614512
5
0
0.012465825768947203
""
0.012465825768947203
0.247819135106803
"t"
0.247819135106803
0.39552362579469874
"e"
0.39552362579469874
0.5115771541923311
"s"
0.5115771541923311
0.6526757369614512
"t"
"IntervalTier"
"empty_intervals"
0
0.6526757369614512
5
0
0.012465825768947203
""
0.012465825768947203
0.247819135106803
""
0.247819135106803
0.39552362579469874
""
0.39552362579469874
0.5115771541923311
""
0.5115771541923311
0.6526757369614512
""
"TextTier"
"points"
0
0.6526757369614512
4
0.012465825768947203
"t"
0.247819135106803
"e"
0.39552362579469874
"s"
0.5115771541923311
"t"

Creating presentations

Could you please allow user to center stimuli on the presentation slide in the create_presentation() function? I like this very nice and quick way of creating a linguistic survey, but words don't look very good in the corner.

Fix pkgdown reference index

👋 @agricolamz!

The pkgdown build for your package is currently failing because some of the documentation topics are missing from the pkgdown config file. Could you please fix?

> pkgdown::check_pkgdown()
Error in `check_missing_topics()`:
! All topics must be included in reference index
• Missing topics: add_leading_symbols, create_image_look_up, create_sound_play, create_viewer, df_to_eaf, df_to_exb, remove_textgrid_tier

pkgdown build failing

👋 @agricolamz! I was looking at https://dev.ropensci.org/ and saw the docs building is failing for your package, it'd be nice if it were fixed before your blog post.

https://dev.ropensci.org/blue/organizations/jenkins/phonfieldwork/detail/phonfieldwork/26/pipeline

For context on the docs server https://ropensci.org/technotes/2019/06/07/ropensci-docs/

have a look on pympi functionality

https://dopefishh.github.io/pympi/index.html

When mapping click in the table doesn't work

Correct multiblock problem.

.flextext without a word-level annotations

e. g. this file can't be parsed because there is no any word-level annotations.

Thnks to Niko Partanen for spotting it.

add file-name to all annotations

not glossed nodes in `flextext_to_df()` are not desplayed (by @Neigelily)

Some Windows users need to open RStudio as administrator...

while renaming files

function for `concatenate_sound` and `annotate_sound()`

As Eduard Klishinskiy proposed it is nice to have a function that will combine concatenate_sound and annotate_sound().

add recursive reading for `read_from_folder()`

thx @Xenomirant

Spectrogram annotation

May be it is worth it to add ability to annotate Raven style rectangular annotations.

https://ravensoundsoftware.com/video-tutorials/video-tutorials-english/05-using-annotations/

remove fragment of sound

"read_from_folder" cannot read files if the textgrid file includes a tier with no points

"read_from_folder" cannot read files if the textgrid file that includes a tier with no points, and as a result, the whole textgrid file could not be read and is replaced with NA.

Zero point is actually important information in analysis; so I wonder if someone could show the solution to this problem.

Extract sound by dataframe

For now it is possible only cut sound by tier in the textgrid. It would be nice to make possible to cut it using some part of textgrid's annotation.

Add pitch and intencity visualisation

Correct files and sound mismatch in docs

move folder function into separate functions

whenever function has ..._from_folder = ... argument.

correct sounds in html viewer: they are out of order

improve you viewer

I'd like to make a viewer:

searchable and sortable table with all contents
select anything from a list and it loads
- precomputed spectrogram (or any other picture)
- sound player (play, stop, decrease rate)
some common things for the website: title, citation info block, etc.
password protection?
it is important not to load everything on page call

R tools: htmlwidgets and crosstalk
JS tools: DT, ...?

add pictures to `create_presentation()` function

It is important to have a possibility of non-verbal stimuli: pictures, gifs, may be even video...

Have a look at some sound annotation standards

transana

thanks to Oliver Ehmer and his Transformer tool.

Emu etc
https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface/FormantAnalysis

Add an ability to draw just a fragment with draw_sound()

For now draw_sound() is drawing the whole file that could be terrible, when you have a long file. It is could be nice to add a possibility to draw only fragment of the sound and TextGrid in order to see the result from R.

`create_glossed_document()` not only for flextext

It is nice to think about creation of a document from any type of annotation.

thx Niko Partanen