ternaustralia / ausplotsr Goto Github PK
View Code? Open in Web Editor NEWR package to interact with TERN AusPlots data
License: GNU General Public License v3.0
R package to interact with TERN AusPlots data
License: GNU General Public License v3.0
Update date, version number and new authors (SEM, IMF) for v1.2.
'species_name_search' argument in get_ausplots function filters by matches to species names for point intercept and vouchers but is ignored for basal wedge data (all measured species are returned).
Extend the functionality to basal wedge species filtering.
ausplots will shift from standardising herbarium determinations using World Flora Online to the Australian Plant Census in the DB. Standardised and Genus species name fields will still be operational but documentation will need to be updated throughout to be clear on the source of the names
Steps:
Expected:
It works.
Actual:
Building the vignettes requires the suggested dependencies to also be installed so the install fails with following error:
Error: processing vignette 'start_guide.Rmd' failed with diagnostics:
there is no package called 'goeveg'
Currently dates are read in as character and visits have opaque codes that don't signify visit order. A simple rank field based on formatted time will make time series easier
get_ausplots has option to search by vector of site_location_name as well as bounding box. Users should be able to enter site_unique to extract particular visit(s) they know about. Also, wildcard/partial match of these fields will allow easier searching, including searching by state and IBRA since these are coded into the plot names.
Revise to ensure consistent selection of herbarium determination or standardised_name in generation of data and visual in ausplots_visual. At present, SN is default for some, while cumulative cover uses HD, for e.g.
Update basal_area calculation from $veg.basal table by adding choice to use default herbarium_determination or new standardised name field when added to database.
-add a warning to default data return on names with suggestion to run on standardised names
New function or new choice within species_table (i.e., m_kind = "species_list") that returns 'NICE' species lists by plot/visit.
'nice' means tidying and simplifying the raw voucher data to remove duplicate vouchers, records identified only to genus, or entries such as standing litter, grass sp., annual form etc., and return only standardised names
a stand-alone function could have more options such as whether to include common names, genus sp. entries, families etc, whereas those options could get confusing within the species_table options already there
Needs refining as updates are completed, including directly linking closed issues to updates in NEWS
All examples should continue to work, but worth updating where relevant to choose use of standardised name fields rather than default of herbarium determination.
With new taxonomy fields from the database, add user option to use herbarium_determination or new standardised name field(s) - making it easy to get a table of standardised names or even genus or family, species level or lowest available level.
changes must apply to each m_kind %in% PA, percent_cover, freq, IVI.
the current output is just the species x sites matrix, so it won't break existing up or down stream code if the updated function defaults to standardised taxon at lowest level, for example
in addition to choice of herbarium versus standardised determinations and level, there should be an option that defaults to stripping out non-vascular plant species, and the option to exclude records not fully identified etc.
... - use only first element (looks sloppy)
Following recent discussion, add options in growth_form_table to calculate:
Absolute (rather than cumulative) cover by growth form, so only 1 hit of each GF can be counted at a given point, rather than adding multiple hits from different overlapping species with a given GF. This would give the absolute % of points that have each GF and is more compatible with NVIS structural descriptions. Add option to m_kind as 'absolute_cover' perhaps.
Add option to calculate this as vegetation strata, defined simply by growth forms (e.g. trees, shrubs, grasses...) but rather than sum the GF covers in these categories, the function would inly count unique strata hits her point, giving absolute % cover for the three strata (not cumulative among species or growth forms. Add option to m_kind as 'strata'.
Both operations just need an additional step of duplicate removal at different levels.
Some functions in ausplotsR are producing warnings when arguments are not supplied by the user and it reverts to the default. The warnings are not causing any harm, they just aren't helpful and potentially a bit confusing.
For example, for species_table, if m_kind is not supplied, it defaults to m_kind="PA" but produces this warning
2: In if (m_kind == "PA") { :
the condition has length > 1 and only the first element will be used
3: In if (m_kind == "percent_cover" | m_kind == "IVI") { :
the condition has length > 1 and only the first element will be used
4: In if (m_kind == "freq" | m_kind == "IVI") { :
the condition has length > 1 and only the first element will be used
5: In if (m_kind == "IVI") { :
the condition has length > 1 and only the first element will be used
if cover_type is not supplied, it defaults to PFC, but gives this warning.
In if (cover_type == "PFC") { :
the condition has length > 1 and only the first element will be used
I think it is unlikely that m_kind wouldn't be specified, but cover_type might be ignored. Especially because it is specified as the default in the .Rd help files. The warnings don't effect anything, but they might confuse a user. Maybe we should consider either adding a stop function where if these things are not specified, it produces an error, or adding a missing() function to set the default more officially and remove the warning?
We want to better capture API errors, sentryR on CRAN can be used to send error details to our Sentry project.
Need to ensure it captures the error, stops gracefully and also does not trigger its own error if Sentry capture can't be sent.
Presently subsets point intercept hits based on a minimum height but should also be able to subset by a MAXIMUM height as a user might want ground cover at a certain height rather than tree cover over a minimum height cut-off (the original case).
Also look into simplifying the arguments in the function call by combining the 2 relating to growth form.
Hi there,
Thought I'd flag the following issue:
This package has been archived from CRAN because dependency has also been archived:
https://cran.r-project.org/web/packages/ausplotsR/index.html
I cannot install from GitHub due to same issue.
Kind regards
Stewart
Hello ausplots!
I'm interested in checking out the soil data available for ausplots sites, but every time I try to download the data I get an error message. At first I thought the servers must be down but I've tried a few times now on different computers so I think maybe there's an error in the API function or the database configuration?
Function I've run:
soil <- get_ausplots(site_info = FALSE, soil_subsites = TRUE,
soil_bulk_density = TRUE, soil_character = TRUE)
Returns error message:
Error in .ausplots_api(path, query) :
Service Unavailable (HTTP 503). Failed to {"details":"no connection to the server\n","message":"Database client error"}.
And session info in case that's relevant:
sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Monterey 12.6.5
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GGally_2.1.2 raster_3.6-20 sp_1.6-0 Hmisc_5.0-1 sf_1.0-12 ggpubr_0.6.0 ausflora_0.0.0.9000
[8] lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.1 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[15] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0 ausplotsR_1.2.8 mapdata_2.3.1 maps_3.4.1 vegan_2.6-4
[22] lattice_0.21-8 permute_0.9-7 austraits_1.1.1 RefManageR_1.4.0
loaded via a namespace (and not attached):
[1] colorspace_2.1-0 ggsignif_0.6.4 class_7.3-21 htmlTable_2.4.1 base64enc_0.1-3 rstudioapi_0.14 proxy_0.4-27
[8] bit64_4.0.5 fansi_1.0.4 xml2_1.3.3 codetools_0.2-19 splines_4.1.3 R.methodsS3_1.8.2 knitr_1.42
[15] itertools_0.1-3 Formula_1.2-5 jsonlite_1.8.4 betapart_1.6 broom_1.0.4 cluster_2.1.4 R.oo_1.25.0
[22] compiler_4.1.3 httr_1.4.5 backports_1.4.1 Matrix_1.5-1 fastmap_1.1.1 cli_3.6.1 htmltools_0.5.5
[29] tools_4.1.3 gtable_0.3.3 glue_1.6.2 fastmatch_1.1-3 Rcpp_1.0.10 carData_3.0-5 vctrs_0.6.1
[36] ape_5.7-1 nlme_3.1-162 iterators_1.0.14 xfun_0.38 timechange_0.2.0 lifecycle_1.0.3 gtools_3.9.4
[43] rstatix_0.7.2 terra_1.7-23 MASS_7.3-58.3 scales_1.2.1 rcdd_1.5 vroom_1.6.1 hms_1.1.3
[50] doSNOW_1.0.20 parallel_4.1.3 RColorBrewer_1.1-3 curl_5.0.0 gridExtra_2.3 jose_1.2.0 rpart_4.1.19
[57] reshape_0.8.9 stringi_1.7.12 maptools_1.1-6 foreach_1.5.2 e1071_1.7-13 checkmate_2.1.0 bibtex_0.5.1
[64] geometry_0.4.7 rlang_1.1.0 pkgconfig_2.0.3 evaluate_0.20 htmlwidgets_1.6.2 bit_4.0.5 tidyselect_1.2.0
[71] plyr_1.8.8 magrittr_2.0.3 R6_2.5.1 snow_0.4-4 generics_0.1.3 picante_1.8.2 DBI_1.1.3
[78] pillar_1.9.0 foreign_0.8-84 withr_2.5.0 mgcv_1.8-42 units_0.8-1 abind_1.4-5 nnet_7.3-18
[85] crayon_1.5.2 car_3.1-2 KernSmooth_2.23-20 utf8_1.2.3 rmarkdown_2.21 tzdb_0.3.0 grid_4.1.3
[92] data.table_1.14.8 minpack.lm_1.2-3 digest_0.6.31 classInt_0.4-9 R.utils_2.12.2 openssl_2.0.6 munsell_0.5.0
[99] magic_1.6-1 askpass_1.1
Filtering get_ausplots by a single site_location_name sometimes returns the error: 'current_site_master' not found... even when the same plot added to a longer vector works, but some single plot searches work.
Add a new function to run species representation optimisers on ausplots data or any species by sites matrix.
At present, the formatting of names for different Ausplots modules can be confusing because it differs between the argument in the get_ausplots function and the name of the tables in the outputs. This was done deliberately to separate input from output in preparing the code but could be frustrating to users.
e.g.
veg.vouchers | veg.vouch
site_info | site.info
soil_character | soil.char
Perhaps the formatting change from input to output should at least be consistent.
This change may break existing code using the package
Do we tidy this up or live with it to avoid breaking code?
CRAN par resetting requested is loading a blank plot window and then throwing a warning about par with no device at the end. Still works but looks a little messy. Check formatting and timing of the recently added par checks.
veg.PI especially but also veg.vouchers now becoming very large and time consuming to download by default for all sites. Needs to change to just extracting site table by default.
Add recent relevant references describing TERN ecosystem surveillance and AusPlots data (especially 'A Vegetation and soil survey method ...') to help files and vignette where needed, e.g. in the main package help file.
Appears to sometimes ungroup the charts per survey, depending on plot inputs
ausplots_visual crashing when growth form pie is selected and there are 1-few sites included.
Currently we build the site_unique
field on the client side at
Line 110 in 5b4d4d9
This could be built on the server side and just served up on the client side. This isn't that important, but I think adding the field in more places is important.
The field could also be included in the veg_voucher, veg_pi and veg_basal responses to make it easier for consumers to join (merge?) data frames.
IN addition to updated name standardisation, some entries in herbarium determination need to be cleaned up in the DB, for example to standardise entries for non-determined species observations. This will need some documentation updates as these entries are mentioned and sometimes named as examples regarding name standardisation in help etc.
The order of columns is a little ad hoc but those that are used for many tables could be put first and in the same order to help users merge tables by site etc.
This would particularly apply to e.g. site_location_name, site_unique etc.
'Packages which use Internet resources should fail gracefully with an informative message
if the resource is not available or has changed (and not give a check warning nor error).'
Note that https://cran.r-project.org/web/packages/httr/vignettes/api-packages.html only mentions converting API errors to R errors, but policy is now that no error should be thrown.
ausplots_visual calls with 1 of small number of plots in some cases crashes, particularly in relation to fractional cover calculations - can't find field names, which suggests the table is dropping out in some cases.
I would like to use an optimization method for plot selection that would maximize different diversity indices. For instance, I would like to know which subset of sites I should select to maximize species richness. If I had 3 sites to choose from a total of 4, which ones would get picked 1st, 2nd and 3rd, and how many species would be accumulated at each step?
The function optim_species in the package ausplotsR within R looks promising for this as it provides number of accumulated species for each additional site selected. However, the function does not provide the identity of the selected sites for each diversity metric used.
Is there a way to obtain this information? If not, is there an alternative approach?
`#' See the following example:
library(ausplotsR)
species_matrix <- matrix(
c(1, 0, 1, 1, 0,
0, 0, 1, 0, 0,
0, 1, 1, 0, 0,
0, 1, 0, 0, 1),
nrow = 4,
ncol = 5,
byrow = TRUE,
dimnames = list(c("Site 1", "Site 2", "Site 3", "Site 4"), c("Species 1", "Species 2", "Species 3", "Species 4", "Species 5"))
)
optim_species(species_matrix, n.plt = 3, random = TRUE, iterations = 10)
`
At the start of the output, I get the chosen sites (a consensus of all estimator perhaps?), but I do not get the site id of the first randomly selected site. Lower in the output, I also do not get the site id selected by each estimator (only the accumulated diversity for number of sites selected). Any help would be greatly appreciated.
Note that I asked this question on StackOverflow and will post the answer there (and give credits of course) if I get answer.
See: https://stackoverflow.com/questions/76910491/how-can-i-obtain-the-site-identity-selected-by-optim-species-in-the-package-ausp
Add function presenting graphical options previously developed for ausplots data visualisation. The aim is for a user to be able to pull the data in one step and then call this function to see selected plots showcasing data.
-the function takes the get_auplots object and user selection of visuals, with the example selecting a handful of plot
-option for live plotting versus save to pdf
-default somehow to prevent attempt to plot stuff out for hundreds of plots without warning to prevent locking up the system?
Some existing options:
-Fractional and growth form cover pie charts per plot
-Species cumulative percent cover as PI samples are taken within plots
-Map of Australia with plot locations marked
-species cover distribution as SAD (species abundance distribution)/Whittaker plot
e.g.,
ausplots_visual(my.ausplots.object, map = TRUE, fractional = TRUE, GF = TRUE, cumulative_cover = TRUE, whittaker = TRUE, file=paste(xx, ".pdf")...)
You will be aware, for example from:
https://r-spatial.org/r/2022/04/12/evolution.html,
https://r-spatial.org/r/2022/12/14/evolution2.html,
https://r-spatial.org/r/2023/04/10/evolution3.html and
https://rsbivand.github.io/csds_jan23/bivand_csds_ssg_230117.pdf and
perhaps view
https://www.youtube.com/watch?v=TlpjIqTPMCA&list=PLzREt6r1NenmWEidssmLm-VO_YmAh$
that rgdal
, rgeos
and maptools
will be retired this
year, in October 2023.
ausplotsR uses maptools::map2SpatialPolygons, but only uses it to generate
a SpatialPolygons object (aus.pl) twice, once in R/map_attribute.R, once
in map_ausplots.R. You could avoid relying on maptools (which will be gone
in just 6 months), by saving aus.pl as a serialised R object (RDS or RDA
file), and then drop map, mapdata and maptools (untried). Alternatively,
use the approach linked to in yesterday's blog in
https://github.com/r-spatial/evolution/blob/main/pkgapi_230305_refs.csv,
of coercing from map to sf, and on again to Spatial then to
SpatialPolygons, but this would entail importing from sf.
Note also that all map geometries have "+proj=longlat +ellps=clrk66
+no_defs +type=crs" as their CRS definition, they are definitely not
"+init=epsg:4326" and do not have a datum definition (except in North
America, where NAD27 is probably applicable). This conclusion is based on
the ellipsoid parameters given in C code in mapproj/src/map.h, ECC defined
as .08227185422, which is the same as:
ellps <- sf::sf_proj_info("ellps")
(clrk66 <- unlist(ellps[ellps$name=="clrk66",]))
eval(parse(text=clrk66["major"]))
eval(parse(text=clrk66["ell"]))
print(sqrt((a^2-b^2)/a^2), digits=10)
Also sent as email, since this repo is not exposed in DESCRIPTION.
The output data tables (i.e. from get_auplsots call and relating to Ausplots modules like vouchers or soil properties) are described in the help files but the individual variables/columns are not defined anywhere (e.g. what they mean, their units etc). While some of this information is in the field manual and some of it is obvious or intuitive, ideally there would be a document (or link to one) that explains each data column/variable returned in the raw data from the package. The metadata that comes with an aekos download of TERN Ausplots can't be used as the data presentation is quite different.
This may need a wider discussion of how to handle this. Improving the metadata is pretty fundamental and have had a user request for this information.
When a user queries for a site and supplies an incorrect (or bogus) SiteName, an error is returned which is a bit cryptic.
We should trap the fact that no sites are returned and handle that gracefully.
get_ausplots(my.Plot_IDs="ILIKECOKE", site_info=TRUE, structural_summaries=FALSE,
veg.vouchers=FALSE, veg.PI=FALSE, basal.wedge=FALSE, soil_subsites=FALSE,
soil_bulk_density=FALSE, soil_character=FALSE, bounding_box="none",
species_name_search=NULL)
Error in get_ausplots(my.Plot_IDs = "ILIKECOKE", site_info = TRUE, structural_summaries = FALSE, :
Plot_IDs must be provided as a character vector.
Recent updates implemented graceful fail when the database resource is unavailable (timeout, not internet etc). A residual issues is that when the resource is unavailable, the examples and vignette still return errors because they assume there is data and subsequently process data.
Package appears to have been removed from CRAN due to errors when server was moved.
Need to record whether data were returned and if not, bypass pass subsequent code with message.
growth_form_table function has the options to populate GFs x sites by species richness in each GF category. It does this currently by counting unique entries in herbarium_determination by GF
the output does not contain species names, so defaulting to counting standardised names instead of herbarium names should not affect downstream code
could either default to standardised names or add as a choice of herbarium versus standardised.
since names aren't returned, it may make sense not to filter out incomplete IDs etc.
Soem veg.PI downloads have missing rows for some sites and/or duplcate rows for some sites, leading to problems in calculation of fractional cover, etc.
Create an archive version of v1.1. in Zenodo with DOI as a snapshot before updating v1.2 to master
Just a few thoughts...
Update "search_species_name" argument to use either herbarium_determination or standardised name
Add a "family_search" argument so that the function only extracts plots where that family is detected, essentially mimicking current role of "search_species_name" but for families
v1.2.8 includes update to APC/APNI from WFO (documentation plus functions that used strip_bryophyte). These changes need to be brought into v2. Also, changes to checking class of inputs and responses to 'inherits' needs to be merged.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.