Git Product home page Git Product logo

tapas's Introduction

DOI

tapas R-package

Rationale

The set of functions gathered under the hood of tapas is meant to be used for analyzing paleoecological records, when the goal is to estimate the long-term Trend and detect Peaks to reconstruct the occurrence, the return intervals, and the magnitude of distinct events.

tapas builds on CharAnalysis (https://github.com/phiguera/CharAnalysis), a software for analyzing sediment-charcoal records written in and compiled with Matlab 7.0 by Phil Higuera (Higuera et al., 2009), with significant input by (amongst others) Patrick Bartlein (U of OR), Daniel Gavin (U of OR), Jennifer Marlon, and Ryan Kelly.

Two main reasons led to the development of tapas. Firstly, as R is an open source product, modifying the program to suit individual needs may be more straightforward. Secondly, an integration and inter-operability with other existing R-packages may allow using peak-detection analysis in conjunction with other workflows and types of paleoecological records (see for instance (Cagliero et al., 2021)).


Usage

A typical workflow of the peak-detection analysis includes the following steps (Higuera et al., 2011):

  • 1.) resample the record to equally spaced sampling intervals in time (years), a procedure also called “binning”
  • 2.) decompose the resampled record into a long-term trend (background component) and peaks (peak component)
  • 3.) screen the peak component to distinguish signal from noise using
    • 3.1 a unique global 2-component Gaussian mixture model, or
    • 3.2 local 2-component Gaussian mixture models,
    • 3.3 and eventually also screen the peak component using a minimum-count test,
  • 4.) evaluate the suitability of the record for peak-detection analysis using the signal-to-noise index (Kelly et al., 2011).

tapas performs steps 1.) and 2.) for several variables of one dataset type (e.g. different estimates of charcoal abundance) in one go. Instead, steps 3.) and 4.) are performed for one user-selected variable.

To run your own data, make a new folder within an umbrella folder, and save it under a name, e.g., Data-In. Then place a file (e.g., MyData.csv) in that folder. The input data file will contain the sample depths, sample ages, sample volume, and the variable(s). The file should have the following formatting: It has headers and at least six fields. The first five columns will report the metadata for the samples, the subsequent columns contain the variable(s) to be analysed (e.g., the abundance of charcoal pieces).

CmTop CmBot AgeTop AgeBot Volume variable1 variable2 nth-variable
0.5 1 -42 -24 3 8 0.01
1 1.5 -24 5 3 18 0.005

The depths and ages should be arranged in ascending order.

Load the data into the R environment

MyData <- read.csv(“./Data-In/MyData.csv”)


Installation

Install the development version of tapas with:

# library(devtools)
devtools::install_github("wfinsinger/tapas")

The check_pretreat() function can be used to verify the input data is formatted correctly. If the samples in the input file are not contiguous, the check_pretreat() function will add rows for the missing samples. Should the dataset contain samples that were deposited in a very short amount of time (e.g., slumps, tephras), for which AgeTop = AgeBot, these samples will be flagged and removed, and a new depth scale will be created to replace the original one.

MyData <- check_pretreat(Mydata)

The functions can either be run individually and step wise, or the wrapper function peak_detection() can be used to perform an analysis including steps from 1.) to 4.) in one go.

For instance, to analyse the MyData data set for variable1:

MyData_peaks <- peak_detection(series = MyData, proxy = “variable1”)

Individual output plots can be generated with dedicated plotting functions:

Plot.Anomalies(MyData_peaks, plot.neg = F)

Plot_ReturnIntervals(MyData_peaks)

Alternatively, the analysis can be done step-by-step (see below for further details).


Example

The package comes with toy data (co_char_data) to play with. This is a macroscopic charcoal record from (Code Lake, Higuera et al., 2009).

library(tapas)
co <- co_char_data

The dataset can be analysed, for instance, with the following settings (leaving other arguments with their default values). These are the same settings used by Higuera et al. (2009).

co_loc <- peak_detection(series = co, proxy = "char",
                                first = -51, last = 7500, yrInterp = 15,
                                detr_type = "mov.median", sens = F)
#> [1] "No slump samples detected."
#> [1] "The depth scale is continuous"
#> [1] "The age scale is continuous"

With these settings, the results obtained using tapas strikingly resemble those obtained by Higuera et al. (2009) with CharAnalysis

Alternatively, the data can be analysed step-by-step:

The first step is to resample the time series with pretreatment_data():

co_i <- pretreatment_data(
  series = co, out = "accI", series.name = "co",
  first = -50, last = 7450, yrInterp = 15
)

Then, detrend the data with series_detrend():

co_detr <- SeriesDetrend(
  series = co_i, smoothing.yr = 500,
  detr.type = "rob.loess"
)

Fit gaussian mixture models to determine the noise-signal threshold

  • With a Global Threshold (default options):
char_thresh_gl <- global_thresh(series = co_detr, proxy = "charAR")

  • With minimum-count test and removing consecutive peak samples.
char_thresh_gl <- global_thresh(
  series = co_detr,
  proxy = "charAR",
  thresh.value = 0.95,
  smoothing.yr = NULL,
  keep_consecutive = FALSE,
  minCountP = 0.95,
  MinCountP_window = 150
)

Once the analysis is done, plot the result:

Plot.Anomalies(
  series = char_thresh_gl,
  plot.crosses = TRUE,
  plot.x = FALSE,
  plot.neg = FALSE
)

And plot the Return Intervals:

Plot_ReturnIntervals(
  series = char_thresh_gl,
  plot.x = TRUE,
  plot.neg = FALSE
)


Details

A typical workflow of the peak-detection analysis includes the following steps (Higuera et al., 2011):

  • 1.) resampling the record to equally spaced sampling intervals in time (years) is performed using the pretreatment_data() function, which loops the pretreatment() function for all variables in the input data frame. The user can choose among the following output data types:
    • resampled accumulation rates (out = “accI”; default),
    • resampled concentrations (out = “conI”), or
    • resampled input data (e.g., if variable1 in the input is charcoal counts, with out = “countI” one gets resampled counts).
  • 2.) decomposition of the resampled record into a long-term trend (background component) and peaks (peak component) is performed with the SeriesDetrend() function for all variables in the input data frame. The user can select the smoothing-window width (in years) as well as the type of the detrending (e.g. detr.type = "rob.loess"). Currently, the following functions are implemented to smooth the temporal series:
    • robust loess (“rob.loess”),
    • robust Lowess (“rob.lowess”), and
    • moving median (“mov.median”; aka Method #4 in CharAnalysis’ Matlab version);
  • 3.1) screen the peak component to distinguish signal from noise using one or more 2-component Gaussian mixture models that are determined using the mclust R package Scrucca et al., 2016.
    • a unique global 2-component Gaussian mixture model (Global_Thresh()), or
    • several local 2-component Gaussian mixture models (Local_Thresh());
  • 3.2) and eventually also screen the peak component using a minimum-count test. This part of the Global_Thresh() and Local_Thresh() functions was translated verbatim from CharAnalysis.
  • 4.) evaluate the suitability of the record for peak-detection analysis using the signal-to-noise index SNI; R code from Supplementary Material to Kelly et al., 2011.
  • 5.) Diagnostic plots showing the sensitivity to different smoothing-window widths are produced with the peak-detection() function if the function’s argument sens=TRUE.

Since v0.1.2, a GAM-modeled trend can be used as well for the decomposition step, though the procedure is a little bit more tedious, as it is not implemented in the peak_detection() wrapper function: - if the data was resampled, use the tapas2mgcv() function to extract the data stored in the list generated with the pretreatment_data() function. Then model the trend using the mgcv::gam() function, and translate the mgcv::gam()-generated list using the mgcv2tapas() function. Then move on to the screening step. - if the data wasn’t resampled, simply omit the tapas2mgcv() step.


Credits & Acknowledgements

The development of this set of functions would not have been possible without the Matlab-coded templates that were written and made open source by Philip Higuera (https://github.com/phiguera/CharAnalysis), whose features were also based on the programs CHAPS, by Patrick Bartlein (U of OR), and Charster, by Daniel Gavin (U of OR), and on the Gaussian mixture model by Charles Bouman (Purdue), and benefited from discussions with and testing by members of the Whitlock Paleoecology Lab at Montana State University, Dan Gavin, and Ryan Kelly.

We are thankful to Dan Gavin for suggesting tweaks to accomodate for non-standard data-input formats for tapas. His suggestions led to the check_pretreatment() function. Petr Kunes and several CharAnalysis users who ignited the conversation over the past few years about getting that program into R are greatly thanked.

If you use tapas in your publication, please cite it as follows: Finsinger W., Bonnici I. (2022) - tapas: an R package to perform trend and peaks analysis. https://doi.org/10.5281/zenodo.6344463 and any non-default settings applied.


Issues & Contributions

If you are having problems running tapas or if you have any suggestions, please use the “Issues” tab. Contributions to this work are more than welcome. tapas is meant as a flexible data-analysis environment, just as its real-world counterpart that is based on a “pick & choose sides” philosophy. Thus, feel free to fork, make changes, and then file a pull request. Alternatively, get in touch to discuss how your improvements may fit with ongoing development of add-ons. Thanks!


News

For the latest news, see https://github.com/wfinsinger/tapas/blob/main/NEWS.md.


References

Cagliero E, Morresi D, Paradis L, Curović M, Spalević V, Marchi N, Meloni F, Bentaleb I, Motta R, Garbarino M, Lingua E, Finsinger W (2022) Legacies of past human activities on one of the largest old-growth forests in south-east European mountains. Vegetation History and Archaeobotany 31: 415-430 link.

Higuera PE, LB Brubaker, PM Anderson, FS Hu, TA Brown (2009) Vegetation mediated the impacts of postglacial climatic change on fire regimes in the south-central Brooks Range. Alaska Ecological Monographs 79: 201-219 link

Higuera PE, Gavin DG, Bartlein PJ, Hallett DJ (2010) Peak detection in sediment–charcoal records: impacts of alternative data analysis methods on fire-history interpretations. International Journal of Wildland Fire 19: 996 link

Kelly RF, PE Higuera, CM Barrett, FS Hu (2011) A signal-to-noise index to quantify the potential for peak detection in sediment-charcoal records. Quaternary Research 75: 11-17 link

Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal 8: 289-317 link

tapas's People

Contributors

iago-lito avatar wfinsinger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

ggilromera madwoa

tapas's Issues

Error message in peak detection command

Hi,

I'm having trouble getting the peak detection function to work. I keep getting an error message that I haven't been able to figure out. I've attached the message below and the data I'm trying to use. Any help would be greatly appreciated.

Cheers,

Mark

data

cmTop cmBot AgeTop AgeBot charvol char chararea
0 0.5 2021 2019 0.115 345 31.7788462
0.5 1 2019 2018 0.253 1525 57.819431
1 1.5 2018 2016 0.178 147 40.0562852
1.5 2 2016 2015 0.366 618 98.0768329
2 2.5 2015 2015 0.154 1475 308.979571
2.5 3 2015 2015 0.317 1865 442.777971
3 3.5 2015 2007 0.483 970 249.457083
3.5 4 2007 1999 0.262 619 211.638133
4 4.5 1999 1991 0.149 758 180.78351
4.5 5 1991 1983 0.325 507 37.978172
5 5.5 1983 1975 0.287 275 24.0804663
5.5 6 1975 1967 0.183 822 98.8168647
6 6.5 1967 1959 0.252 294 51.0646869
6.5 7 1959 1951 0.217 321 2.2664685
7 7.5 1951 1943 0.157 413 26.1431116
7.5 8 1943 1935 0.26 115 8.57449858
8 8.5 1935 1927 0.313 293 21.9781205
8.5 9 1927 1919 0.279 229 151.011147
9 9.5 1919 1911 0.222 1279 155.350379
9.5 10 1911 1903 0.252 687 39.5456259
10 10.5 1903 1896 0.169 675 43.2481862
10.5 11 1896 1880 0.422 452 73.0699588
11 11.5 1880 1865 0.25 1336 72.3731696
11.5 12 1865 1850 0.317 482 107.344739

error message

FCS_loc <- peak_detection(series = FCS_Mark, proxy = "chararea",

  •                      first = -91, last = 110, yrInterp = 15,
    
  •                      detr_type = "mov.median")
    

[1] "Warning: AgeTop = AgeBot for 2 samples, totaling 1cm"
[1] "The depth scale is continuous"
[1] "The age scale is continuous"
Error in seq.default(from = round(n.smooth), to = length(v), by = round(n.smooth/2)) :
wrong sign in 'by' argument

Issue with Tapas package installation

Hello,

I am trying to use tapas for the first time but the package isn't installing using the code line on the page. I was wondering if there is a new one. Please help. See code below in including error received.

devtools::install_github("wfinsinger/tapas")
Downloading GitHub repo wfinsinger/tapas@HEAD
Error in utils::download.file(url, path, method = method, quiet = quiet, :
cannot open URL 'https://api.github.com/repos/wfinsinger/tapas/tarball/HEAD'

Error in 'check_pretreat()

When I run the 'check_pretreat()' function, I have this error message

Error in cmB[slumps_index[i]]:
! Can't subset columns past the end.
ℹ Location 6 doesn't exist.
ℹ There is only 1 column.
Run rlang::last_trace() to see where the error occurred.

MyDataRP.csv
Can you kindly advise?

Confirmation regarding yrInterp parameter

Could you please confirm if yrInterp = NULL is equivalent to yrInterp = 0 in CharAnalysis?

Section 2.2 Parameter Selection

2.2.1 Pretreatment

yrinterp Years to interpolate record to. Charcoal counts, sample volume, and sample depths
are all interpolated before calculating charcoal accumulation rates. Enter 0 here is you want to
interpolate to the median sample resolution (yr sample-1) of the selected record.

Higuera, P. (2009) CharAnalysis 0.9: Diagnostic and analytical tools for sediment-charcoal analysis. User's Guide, updated January 2009.

Thank you for your insights!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.