ternaustralia / ausplotsr Goto Github PK

R package to interact with TERN AusPlots data

License: GNU General Public License v3.0

R 100.00%

ausplotsr's Introduction

ausplotsR

R package for live extraction, preparation, visualisation and analysis of TERN Ecosystem Surveillance monitoring data (AusPlots data).

Through ausplotsR, users can now directly access plot-based data on vegetation and soils across Australia, with simple function calls to extract the data and merge them into species occurrence matrices for analysis or to calculate things like basal area and fractional cover.

The data and open access under Creative Commons – Attribution 4.0 International (CC BY 4.0), and have been collected by TERN’s Ecosystem Surveillance platform via field surveys and sampling across a national network of plots and transects. Follow the links for more information on the research infrastructure provided by the Terrestrial Ecosystem Research Network (TERN), an Australian Government NCRIS-enabled project, and its Ecosystem Surveillance platform.

Update to Version 2 to work with new features and settings

The latest package version features a range of bug fixes as well as updated or additional functionality and insome cases altered data extracts and settings, including:

New and improved standardised plant taxonomy matching and fields as well as original herbarium determinations.
Reduced size of default data extraction due to the growing size of the database.
Easier coding, ordering and visualisation of revisits.
Enhanced search options including particular site visits and partial site name matches.

Future updates to Version 2

Pending minor versions of the package will enable extraction of two additional data tables presenting species-level traits (e.g., photosynthetic pathway, invasion status) and survey-level community indices (e.g., species abundance distribution, proportional abundance by photosynthetic pathway, community temperature index).

Using ausplotsR

ausplotsR is now available on CRAN, meaning it can be installed using the 'install packages' command or menu in an R or RStudio session.

ausplotsR requires the following packages as 'Imports': mapdata, vegan, plyr, R.utils, httr, jsonlite, ggplot2, gtools, jose, betapart, curl, r2r, stringr, progress; 'Suggests' (needed to build the package vignette if 'build_vignettes' is set to TRUE below): knitr, markdown, rmarkdown, formatR, sf.

The most current development version of ausplotsR can be installed directly from github using the devtools package, which must be installed first.

To install the package, use:

library(devtools)
install_github("ternaustralia/ausplotsR", build_vignettes = TRUE, dependencies = TRUE)

To get started:

library(ausplotsR)
help(ausplotsR)
browseVignettes(package="ausplotsR")

To download AusPlots data, start with:

?get_ausplots

Or, to simply grab basic site and visit info for all available plots, use:

library(ausplotsR)
my.ausplots.data <- get_ausplots()
names(my.ausplots.data)

Citation

A suggested citation is automatically generated in the following format when you extract TERN AusPlots data via ausplotsR:

TERN ("year") AusPlots ecosystem surveillance monitoring dataset (URL: http://aekos.org.au/collection/adelaide.edu.au/ausplotsrangelands). Obtained via the ausplotsR R package (URL: https://github.com/ternaustralia/ausplotsR), accessed "day month year".

To print the citation of our package:

citation('ausplotsR')

Please include appropriate citation in published papers/reports/theses that use the data and R functions.

Authors: Greg Guerin, Tom Saleeba, Samantha Munroe, Irene Martín-Forés, Bernardo Blanco-Martin, Andrew Tokmakoff

ausplotsr's People

Contributors

Stargazers

Watchers

Forkers

mdekauwe imfores martinfores mingkaijiang olivroy ciehf

ausplotsr's Issues

Fix warning about cover type vector in growth form

... - use only first element (looks sloppy)

Consistency of argument names in get_ausplots with their output tables

At present, the formatting of names for different Ausplots modules can be confusing because it differs between the argument in the get_ausplots function and the name of the tables in the outputs. This was done deliberately to separate input from output in preparing the code but could be frustrating to users.

e.g.
veg.vouchers | veg.vouch
site_info | site.info
soil_character | soil.char

Perhaps the formatting change from input to output should at least be consistent.

This change may break existing code using the package

Do we tidy this up or live with it to avoid breaking code?

Create a NEWS.Rd file explaining updates

Needs refining as updates are completed, including directly linking closed issues to updates in NEWS

Update species_table to reflect name filtering options

With new taxonomy fields from the database, add user option to use herbarium_determination or new standardised name field(s) - making it easy to get a table of standardised names or even genus or family, species level or lowest available level.

changes must apply to each m_kind %in% PA, percent_cover, freq, IVI.
the current output is just the species x sites matrix, so it won't break existing up or down stream code if the updated function defaults to standardised taxon at lowest level, for example
in addition to choice of herbarium versus standardised determinations and level, there should be an option that defaults to stripping out non-vascular plant species, and the option to exclude records not fully identified etc.

plot device issues ausplots_visual

CRAN par resetting requested is loading a blank plot window and then throwing a warning about par with no device at the end. Still works but looks a little messy. Check formatting and timing of the recently added par checks.

How can I obtain the site identity selected by optim_species in the package ausplotsR?

I would like to use an optimization method for plot selection that would maximize different diversity indices. For instance, I would like to know which subset of sites I should select to maximize species richness. If I had 3 sites to choose from a total of 4, which ones would get picked 1st, 2nd and 3rd, and how many species would be accumulated at each step?

The function optim_species in the package ausplotsR within R looks promising for this as it provides number of accumulated species for each additional site selected. However, the function does not provide the identity of the selected sites for each diversity metric used.

Is there a way to obtain this information? If not, is there an alternative approach?

`#' See the following example:

load package

library(ausplotsR)

matrix for test

species_matrix <- matrix(
c(1, 0, 1, 1, 0,
0, 0, 1, 0, 0,
0, 1, 1, 0, 0,
0, 1, 0, 0, 1),
nrow = 4,
ncol = 5,
byrow = TRUE,
dimnames = list(c("Site 1", "Site 2", "Site 3", "Site 4"), c("Species 1", "Species 2", "Species 3", "Species 4", "Species 5"))
)

optim_species(species_matrix, n.plt = 3, random = TRUE, iterations = 10)
`

At the start of the output, I get the chosen sites (a consensus of all estimator perhaps?), but I do not get the site id of the first randomly selected site. Lower in the output, I also do not get the site id selected by each estimator (only the accumulated diversity for number of sites selected). Any help would be greatly appreciated.

Note that I asked this question on StackOverflow and will post the answer there (and give credits of course) if I get answer.
See: https://stackoverflow.com/questions/76910491/how-can-i-obtain-the-site-identity-selected-by-optim-species-in-the-package-ausp

Species representation optimiser

Add a new function to run species representation optimisers on ausplots data or any species by sites matrix.

Consistency of names in ausplots visual

Revise to ensure consistent selection of herbarium determination or standardised_name in generation of data and visual in ausplots_visual. At present, SN is default for some, while cumulative cover uses HD, for e.g.

ausplots_visual crashing

ausplots_visual crashing when growth form pie is selected and there are 1-few sites included.

Convert visit dates to R date format and rank revisits in order for easier time series

Currently dates are read in as character and visits have opaque codes that don't signify visit order. A simple rank field based on formatted time will make time series easier

Implement site_unique in more places

Currently we build the site_unique field on the client side at

ausplotsR/R/get_ausplots.R

Line 110 in 5b4d4d9

 basal$site_unique <- do.call(paste, c(basal[c("site_location_name", "site_location_visit_id")], sep = "-")) #add unique site/visit identifier 

and also on line 127.

This could be built on the server side and just served up on the client side. This isn't that important, but I think adding the field in more places is important.

The field could also be included in the veg_voucher, veg_pi and veg_basal responses to make it easier for consumers to join (merge?) data frames.

Nice species lists

New function or new choice within species_table (i.e., m_kind = "species_list") that returns 'NICE' species lists by plot/visit.

'nice' means tidying and simplifying the raw voucher data to remove duplicate vouchers, records identified only to genus, or entries such as standing litter, grass sp., annual form etc., and return only standardised names
a stand-alone function could have more options such as whether to include common names, genus sp. entries, families etc, whereas those options could get confusing within the species_table options already there

Filter by site_unique as well as site_location_name and allow wildcard/partial match

get_ausplots has option to search by vector of site_location_name as well as bounding box. Users should be able to enter site_unique to extract particular visit(s) they know about. Also, wildcard/partial match of these fields will allow easier searching, including searching by state and IBRA since these are coded into the plot names.

Merge changes in v1.2.8 into v2.0.0

v1.2.8 includes update to APC/APNI from WFO (documentation plus functions that used strip_bryophyte). These changes need to be brought into v2. Also, changes to checking class of inputs and responses to 'inherits' needs to be merged.

Examples return errors when data resource unavailable

Recent updates implemented graceful fail when the database resource is unavailable (timeout, not internet etc). A residual issues is that when the resource is unavailable, the examples and vignette still return errors because they assume there is data and subsequently process data.

Package appears to have been removed from CRAN due to errors when server was moved.

Need to record whether data were returned and if not, bypass pass subsequent code with message.

Update growth_form_table by species richness to reflect name handling

growth_form_table function has the options to populate GFs x sites by species richness in each GF category. It does this currently by counting unique entries in herbarium_determination by GF

the output does not contain species names, so defaulting to counting standardised names instead of herbarium names should not affect downstream code
could either default to standardised names or add as a choice of herbarium versus standardised.
since names aren't returned, it may make sense not to filter out incomplete IDs etc.

HTTP 503 error trying to download soil data with `get_ausplots`

Hello ausplots!

I'm interested in checking out the soil data available for ausplots sites, but every time I try to download the data I get an error message. At first I thought the servers must be down but I've tried a few times now on different computers so I think maybe there's an error in the API function or the database configuration?

Function I've run:

soil <- get_ausplots(site_info = FALSE, soil_subsites = TRUE, 
                     soil_bulk_density = TRUE, soil_character = TRUE)

Returns error message:

Error in .ausplots_api(path, query) : 
  Service Unavailable (HTTP 503). Failed to {"details":"no connection to the server\n","message":"Database client error"}.

And session info in case that's relevant:

sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.5

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GGally_2.1.2        raster_3.6-20       sp_1.6-0            Hmisc_5.0-1         sf_1.0-12           ggpubr_0.6.0        ausflora_0.0.0.9000
 [8] lubridate_1.9.2     forcats_1.0.0       stringr_1.5.0       dplyr_1.1.1         purrr_1.0.1         readr_2.1.4         tidyr_1.3.0        
[15] tibble_3.2.1        ggplot2_3.4.2       tidyverse_2.0.0     ausplotsR_1.2.8     mapdata_2.3.1       maps_3.4.1          vegan_2.6-4        
[22] lattice_0.21-8      permute_0.9-7       austraits_1.1.1     RefManageR_1.4.0   

loaded via a namespace (and not attached):
  [1] colorspace_2.1-0   ggsignif_0.6.4     class_7.3-21       htmlTable_2.4.1    base64enc_0.1-3    rstudioapi_0.14    proxy_0.4-27      
  [8] bit64_4.0.5        fansi_1.0.4        xml2_1.3.3         codetools_0.2-19   splines_4.1.3      R.methodsS3_1.8.2  knitr_1.42        
 [15] itertools_0.1-3    Formula_1.2-5      jsonlite_1.8.4     betapart_1.6       broom_1.0.4        cluster_2.1.4      R.oo_1.25.0       
 [22] compiler_4.1.3     httr_1.4.5         backports_1.4.1    Matrix_1.5-1       fastmap_1.1.1      cli_3.6.1          htmltools_0.5.5   
 [29] tools_4.1.3        gtable_0.3.3       glue_1.6.2         fastmatch_1.1-3    Rcpp_1.0.10        carData_3.0-5      vctrs_0.6.1       
 [36] ape_5.7-1          nlme_3.1-162       iterators_1.0.14   xfun_0.38          timechange_0.2.0   lifecycle_1.0.3    gtools_3.9.4      
 [43] rstatix_0.7.2      terra_1.7-23       MASS_7.3-58.3      scales_1.2.1       rcdd_1.5           vroom_1.6.1        hms_1.1.3         
 [50] doSNOW_1.0.20      parallel_4.1.3     RColorBrewer_1.1-3 curl_5.0.0         gridExtra_2.3      jose_1.2.0         rpart_4.1.19      
 [57] reshape_0.8.9      stringi_1.7.12     maptools_1.1-6     foreach_1.5.2      e1071_1.7-13       checkmate_2.1.0    bibtex_0.5.1      
 [64] geometry_0.4.7     rlang_1.1.0        pkgconfig_2.0.3    evaluate_0.20      htmlwidgets_1.6.2  bit_4.0.5          tidyselect_1.2.0  
 [71] plyr_1.8.8         magrittr_2.0.3     R6_2.5.1           snow_0.4-4         generics_0.1.3     picante_1.8.2      DBI_1.1.3         
 [78] pillar_1.9.0       foreign_0.8-84     withr_2.5.0        mgcv_1.8-42        units_0.8-1        abind_1.4-5        nnet_7.3-18       
 [85] crayon_1.5.2       car_3.1-2          KernSmooth_2.23-20 utf8_1.2.3         rmarkdown_2.21     tzdb_0.3.0         grid_4.1.3        
 [92] data.table_1.14.8  minpack.lm_1.2-3   digest_0.6.31      classInt_0.4-9     R.utils_2.12.2     openssl_2.0.6      munsell_0.5.0     
 [99] magic_1.6-1        askpass_1.1

Missing or extra rows

Soem veg.PI downloads have missing rows for some sites and/or duplcate rows for some sites, leading to problems in calculation of fractional cover, etc.

Update DESCRIPTION

Update date, version number and new authors (SEM, IMF) for v1.2.

Add new growth form cover calculations

Following recent discussion, add options in growth_form_table to calculate:

Absolute (rather than cumulative) cover by growth form, so only 1 hit of each GF can be counted at a given point, rather than adding multiple hits from different overlapping species with a given GF. This would give the absolute % of points that have each GF and is more compatible with NVIS structural descriptions. Add option to m_kind as 'absolute_cover' perhaps.
Add option to calculate this as vegetation strata, defined simply by growth forms (e.g. trees, shrubs, grasses...) but rather than sum the GF covers in these categories, the function would inly count unique strata hits her point, giving absolute % cover for the three strata (not cumulative among species or growth forms. Add option to m_kind as 'strata'.

Both operations just need an additional step of duplicate removal at different levels.

Archive ausplotsR v1.1

Create an archive version of v1.1. in Zenodo with DOI as a snapshot before updating v1.2 to master

Update name handling in examples and vignettes

All examples should continue to work, but worth updating where relevant to choose use of standardised name fields rather than default of herbarium determination.

Herbarium determination clean up

IN addition to updated name standardisation, some entries in herbarium determination need to be cleaned up in the DB, for example to standardise entries for non-determined species observations. This will need some documentation updates as these entries are mentioned and sometimes named as examples regarding name standardisation in help etc.

Handle incorrect SiteName in Query more gracefully

When a user queries for a site and supplies an incorrect (or bogus) SiteName, an error is returned which is a bit cryptic.

We should trap the fact that no sites are returned and handle that gracefully.

get_ausplots(my.Plot_IDs="ILIKECOKE", site_info=TRUE, structural_summaries=FALSE,
             veg.vouchers=FALSE, veg.PI=FALSE, basal.wedge=FALSE, soil_subsites=FALSE,
             soil_bulk_density=FALSE, soil_character=FALSE, bounding_box="none",
             species_name_search=NULL)
Error in get_ausplots(my.Plot_IDs = "ILIKECOKE", site_info = TRUE, structural_summaries = FALSE,  : 
  Plot_IDs must be provided as a character vector.

Round calculated covers to a more realistic number of decimal places

Check order of sites/graphs for user-supplied data objects

Appears to sometimes ungroup the charts per survey, depending on plot inputs

New TERN ausplots references

Add recent relevant references describing TERN ecosystem surveillance and AusPlots data (especially 'A Vegetation and soil survey method ...') to help files and vignette where needed, e.g. in the main package help file.

Intermittent error specifying single plot in get_ausplots

Filtering get_ausplots by a single site_location_name sometimes returns the error: 'current_site_master' not found... even when the same plot added to a longer vector works, but some single plot searches work.

ausplots_visual with few plots

ausplots_visual calls with 1 of small number of plots in some cases crashes, particularly in relation to fractional cover calculations - can't find field names, which suggests the table is dropping out in some cases.

sentryR

We want to better capture API errors, sentryR on CRAN can be used to send error details to our Sentry project.

Need to ensure it captures the error, stops gracefully and also does not trigger its own error if Sentry capture can't be sent.

Internet resource fail gracefully CRAN request

'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'

Note that https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html only mentions converting API errors to R errors, but policy is now that no error should be thrown.

Variable dictionary

The output data tables (i.e. from get_auplsots call and relating to Ausplots modules like vouchers or soil properties) are described in the help files but the individual variables/columns are not defined anywhere (e.g. what they mean, their units etc). While some of this information is in the field manual and some of it is obvious or intuitive, ideally there would be a document (or link to one) that explains each data column/variable returned in the raw data from the package. The metadata that comes with an aekos download of TERN Ausplots can't be used as the data presentation is quite different.

This may need a wider discussion of how to handle this. Improving the metadata is pretty fundamental and have had a user request for this information.

@smguru @tomsaleeba @Sammunroe

ausplots name standardisation updated method

ausplots will shift from standardising herbarium determinations using World Flora Online to the Australian Plant Census in the DB. Standardised and Genus species name fields will still be operational but documentation will need to be updated throughout to be clear on the source of the names

Apply species_name_search to basal wedge data

'species_name_search' argument in get_ausplots function filters by matches to species names for point intercept and vouchers but is ignored for basal wedge data (all measured species are returned).

Extend the functionality to basal wedge species filtering.

single_cover_value usage

Presently subsets point intercept hits based on a minimum height but should also be able to subset by a MAXIMUM height as a user might want ground cover at a certain height rather than tree cover over a minimum height cut-off (the original case).

Also look into simplifying the arguments in the function call by combining the 2 relating to growth form.

Standardise order of column names in raw data tables

The order of columns is a little ad hoc but those that are used for many tables could be put first and in the same order to help users merge tables by site etc.

This would particularly apply to e.g. site_location_name, site_unique etc.

Cannot install package

Hi there,

Thought I'd flag the following issue:

This package has been archived from CRAN because dependency has also been archived:
https://cran.r-project.org/web/packages/ausplotsR/index.html

I cannot install from GitHub due to same issue.

Kind regards
Stewart

Error installing when building vignettes

Steps:

start a fresh RStudio instance
install ausplotsR using the command in the README: install_github("ternaustralia/ausplotsR", build_vignettes = TRUE)

Expected:
It works.

Actual:
Building the vignettes requires the suggested dependencies to also be installed so the install fails with following error:

Error: processing vignette 'start_guide.Rmd' failed with diagnostics:
there is no package called 'goeveg'

Visualise ausplots

Add function presenting graphical options previously developed for ausplots data visualisation. The aim is for a user to be able to pull the data in one step and then call this function to see selected plots showcasing data.

-the function takes the get_auplots object and user selection of visuals, with the example selecting a handful of plot

-option for live plotting versus save to pdf
-default somehow to prevent attempt to plot stuff out for hundreds of plots without warning to prevent locking up the system?

Some existing options:
-Fractional and growth form cover pie charts per plot
-Species cumulative percent cover as PI samples are taken within plots
-Map of Australia with plot locations marked
-species cover distribution as SAD (species abundance distribution)/Whittaker plot

e.g.,
ausplots_visual(my.ausplots.object, map = TRUE, fractional = TRUE, GF = TRUE, cumulative_cover = TRUE, whittaker = TRUE, file=paste(xx, ".pdf")...)

ausplotsR producing unneeded warnings when arguments are not supplied

Some functions in ausplotsR are producing warnings when arguments are not supplied by the user and it reverts to the default. The warnings are not causing any harm, they just aren't helpful and potentially a bit confusing.

For example, for species_table, if m_kind is not supplied, it defaults to m_kind="PA" but produces this warning

2: In if (m_kind == "PA") { :
the condition has length > 1 and only the first element will be used
3: In if (m_kind == "percent_cover" | m_kind == "IVI") { :
the condition has length > 1 and only the first element will be used
4: In if (m_kind == "freq" | m_kind == "IVI") { :
the condition has length > 1 and only the first element will be used
5: In if (m_kind == "IVI") { :
the condition has length > 1 and only the first element will be used

if cover_type is not supplied, it defaults to PFC, but gives this warning.

In if (cover_type == "PFC") { :
the condition has length > 1 and only the first element will be used

I think it is unlikely that m_kind wouldn't be specified, but cover_type might be ignored. Especially because it is specified as the default in the .Rd help files. The warnings don't effect anything, but they might confuse a user. Maybe we should consider either adding a stop function where if these things are not specified, it produces an error, or adding a missing() function to set the default more officially and remove the warning?

Update basal_area function for name handling

Update basal_area calculation from $veg.basal table by adding choice to use default herbarium_determination or new standardised name field when added to database.

current upstream and downstream code must not be broken. Since the output contains the herbarium_determination field when by.spp=TRUE, this must be returned by default

-add a warning to default data return on names with suggestion to run on standardised names

Remove veg tables from default data extraction

veg.PI especially but also veg.vouchers now becoming very large and time consuming to download by default for all sites. Needs to change to just extracting site table by default.

update data selection features in get_ausplots

Just a few thoughts...

Update "search_species_name" argument to use either herbarium_determination or standardised name
Add a "family_search" argument so that the function only extracts plots where that family is detected, essentially mimicking current role of "search_species_name" but for families

ausplotsR depends on retiring spatial infrastructure packages

You will be aware, for example from:
https://r-spatial.org/r/2022/04/12/evolution.html,
https://r-spatial.org/r/2022/12/14/evolution2.html,
https://r-spatial.org/r/2023/04/10/evolution3.html and
https://rsbivand.github.io/csds_jan23/bivand_csds_ssg_230117.pdf and
perhaps view
https://www.youtube.com/watch?v=TlpjIqTPMCA&list=PLzREt6r1NenmWEidssmLm-VO_YmAh$
that rgdal, rgeos and maptools will be retired this
year, in October 2023.

ausplotsR uses maptools::map2SpatialPolygons, but only uses it to generate
a SpatialPolygons object (aus.pl) twice, once in R/map_attribute.R, once
in map_ausplots.R. You could avoid relying on maptools (which will be gone
in just 6 months), by saving aus.pl as a serialised R object (RDS or RDA
file), and then drop map, mapdata and maptools (untried). Alternatively,
use the approach linked to in yesterday's blog in
https://github.com/r-spatial/evolution/blob/main/pkgapi_230305_refs.csv,
of coercing from map to sf, and on again to Spatial then to
SpatialPolygons, but this would entail importing from sf.

Note also that all map geometries have "+proj=longlat +ellps=clrk66
+no_defs +type=crs" as their CRS definition, they are definitely not
"+init=epsg:4326" and do not have a datum definition (except in North
America, where NAD27 is probably applicable). This conclusion is based on
the ellipsoid parameters given in C code in mapproj/src/map.h, ECC defined
as .08227185422, which is the same as:

ellps <- sf::sf_proj_info("ellps")
(clrk66 <- unlist(ellps[ellps$name=="clrk66",]))
eval(parse(text=clrk66["major"]))
eval(parse(text=clrk66["ell"]))
print(sqrt((a^2-b^2)/a^2), digits=10)

Also sent as email, since this repo is not exposed in DESCRIPTION.