Git Product home page Git Product logo

botanyenmworkshops's Introduction

BotanyENMWorkshop2023

This repository contains material from the Botany 2023 workshop titled 'Using Digitized Collections-Based Data in Research: Applications for Ecology, Phylogenetics, and Biogeography' which was held in July 2023.

HTML version of our R markdown can also be viewed here: https://soltislab.github.io/BotanyENMWorkshops/Demos/Rbased/CrashCourse_2023.html

Organizers

Pam Soltis, Doug Soltis, Shelly Gaynor, Maria Cortez, Makenzie Mabry, Lauren Gillett, Elizabeth White, JT Miller, and Malu Ore Rengifo.

Many people contributted to this material including Anthony Melton, Johanna Jantzen, Andre Naranjo, and more.

Additional Resources

iDigBio API Working Group
QGIS Introduction - RhettRautsaw/GIS_Tutorial
SDM Best Practices
mbelitz/Odo_SDM_Rproj

Other related material from our lab past and current members

mgaynor1/CURE-FL-Plants
mgaynor1/R4NaturalHistoryCollections-BCEENET2021
mgaynor1/long-winded-scripts
mgaynor1/BLUE-Intro2RwithBiodiversityData
mgaynor1/BCEENET-DataCleaning
aemelton/EA_ENA_ENM
ryanafolk/pno_calc
ryanafolk/ambitus
ryanafolk/eco-discretizer
richiehodel/Amborella_ENM
jjantzen/CommPhylogeneticsOSBS

Papers to read.

Introduction to Natural History Collections

  • Soltis. 2017. Digitization of herbaria enables novel research. American Journal of Botany.
  • Herberling et al. 2019. The changing uses of herbarium data in an era of global change: An overview using automated content analysis. BioScience.
  • Nelson and Ellis. 2018. The history and impact of digitization and digital data mobilization on biodiversity research. Phil. Trans. R. Soc. B.

Occurrence Data

  • Daru et al. 2017. Widespread sampling biases in herbaria revealed from large-scale digitization. New Phytologist.
  • Zizka et al. 2019. CoordinateCleaner: Standardized cleaning of occurrence records from biological collection databases. Methods in Ecology and Evolution.
  • Aiello-Lammens et al. 2015. spThin: an R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography.
  • Proosdij et al. 2016. Minimum required number of specimen records to develop accurate species distribution models. Ecography.

Climatic layers

  • Barve et al. 2011. The crucial role of the accessible area in ecological niche modeling and species distribution modeling. Ecological Modelling.
  • Cobos et al. 2019. An exhaustive analysis of heuristic methods for variable selection in ecological niche modeling and species distribution modeling. Ecological Informatics.
  • Folk and Gaynor et al. 2023. Identifying climatic drivers of hybridization with a new ancestral niche reconstruction method. Systematic Biology.

ENM methods

  • Peterson. 2001. Predicting species' geographic distributions based on ecological niche modeling. The Condor.
  • Muscarella et al. 2014. ENMeval: An R package for conducting spatially independent evaluations and estimating optimal model complexity for MaxEnt ecological niche models. Methods in Ecology and Evolution.
  • Sillero N. and A. M. Barbosa. 2020. Common mistakes in ecological niche models. International Journal of Geographical Information Science.
  • Jiménez & Soberón. 2020. Leaving the area under the receiving operating characteristic curve behind: An evaluation method for species distribution modelling applications based on presence-only data. Methods in Ecology and Evolution.
  • Cobos et al. 2019. kuenm: an R package for detailed development of ecological niche models using Maxent. PeerJ.
  • Warren et al. 2010. ENMTools: a toolbox for comparative studies of environmental niche models. Ecography.
  • Brown and Carnaval. 2019. A tale of two niche: methods, concepts, and evolution. Frontiers of Biogeography.
  • Warren et al. 2021. The effects of climate change on Australia’s only endemic Pokémon: Measuring bias in species distribution models. Methods in Ecology and Evolution.

Applications of ENMs

  • Allen et al. 2019. Spatial Phylogenetics of Florida Vascular Plants: The Effects of Calibration and Uncertainty on Diversity Estimates. iScience.
  • Marchant et al. 2016. Patterns of abiotic niche shifts in allopolyploids relative to their progenitors. New Phytologist.
  • Gaynor et al. 2018. Climatic niche comparison among ploidal levels in the classic autopolyploid system, Galax urceolata. American Journal of Botany.
  • Visger et al. 2016. Niche divergence between diploid and autotetraploid Tolmiea. American Journal of Botany.
  • Wang et al. 2021. Potential distributional shifts in North America of allelopathic invasive plant species under climate change models. Plant Diversity.
  • Gaynor et al. 2021. Biogeography and ecological niche evolution in Diapensiaceae inferred from phylogenetic analysis. Journal of Systematics and Evolution.
  • Fitzpatrick and Turelli. 2006. The geography of mammalian speciation: Mixed signals from phylogenies and range maps. Evolution.
  • Cardillo and Warren. 2016. Analysing patterns of spatial and niche overlap among species at multiple resolutions. Global Ecology and Biogeography.
  • Jantzen et al. 2019. Effects of taxon sampling and tree reconstruction methods on phylodiversity metrics. Ecology and Evolution.

Advanced Topics

  • Webb et al. 2002. Phylogenies and Community Ecology. Annual Reviews of Ecology and Systematics.
  • Lu et al. 2018. Evolutionary history of the angiosperm flora of China. Nature.
  • Pollock et al. 2017. Large conservation gains possible for global biodiversity facets. Nature.
  • Marx et al. 2017. Riders in the sky (islands): Using a mega‐phylogenetic approach to understand plant species distribution and coexistence at the altitudinal limits of angiosperm plant life. Journal of Biogeography.
  • Li et al. 2019. For common community phylogenetic analyses, go ahead and use synthesis phylogenies. Ecology.
  • Tucker et al. 2016. A guide to phylogenetic metrics for conservation, community ecology and macroecology. Biological Reviews.
  • Mishler et al. 2014. Phylogenetic measures of biodiversity and neo- and paleo-endemism in Australian Acacia. Nature Communications.
  • Miller et al. 2016. Phylogenetic community structure metrics and null models: a review with new methods and software. Ecography.
  • Rahbek et al. 2019. Building mountain biodiversity: Geological and evolutionary processes. Science.
  • Körner 2003. Alpine plant life: Functional plant ecology of high mountain ecosystems. Springer. (Book)
  • Figueroa et al. 2021. Alpine, but not montane, seed plants constitute a biogeographically and climatically distinct species pool across the Americas. Alpine Botany.
  • Swenson. 2011. The role of evolutionary processes in producing biodiversity patterns, and the interrelationships between taxonomic, functional and phylogenetic biodiversity. American Journal of Botany.
  • Cortez et al. 2021. Is the age of plant communities predicted by the age, stability and soil composition of the underlying landscapes? An investigation of OCBILs. Biological Journal of the Linnean Society.
  • Hopper, Silveira and Fiedler. 2016. Biodiversity hotspots and Ocbil theory. Plant Soil.
  • Schut et al. 2014. Rapid characterisation of vegetation structure to predict refugia and climate change impacts across a global biodiversity hotspot. PLOS ONE.
  • Zappi et al. 2017. Plant biodiversity drivers in Brazilian Campos Rupestres: Insights from phylogenetic structure. Frontiers Plant Science.
  • Linder. 2001. Plant diversity and endemism in sub-Saharan tropical Africa. Journal of Biogeography.
  • Bomblies. 2020. When everything changes at once: finding a new normal after genome duplication. Proceedings of the Royal Society B.
  • Coate and Doyle. 2013. Polyploid and hybrid genomics. (Book)
  • Buggs et al. 2011. Transcriptomic shock generates evolutionary novelty in a newly formed, natural allopolyploid plant. Current Biology.
  • McCarthy et al. 2017. Related allopolyploids display distinct floral pigment profiles and transgressive pigments. American Journal of Botany.
  • Van de Peer et al. 2021. Polyploidy: an evolutionary and ecological force in stressful times. The Plant Cell.
  • Schmickl and Yant. 2021. Adaptive introgression: how polyploidy reshapes gene flow landscapes. New Phytologist.
  • Glick and Mayrose. 2014. ChromEvo: Assessing the pattern of chromosome number evolution and the inference of polyploidy along a phylogeny. Molecular Biology and Evolution.

ENM vs SDM

Are Ecological Niche Models (ENMs) and Species Distribution Models (SDMs) the same?

How to cite

How to cite: Link the repository in your method section as github.com/soltislab/BotanyENMWorkshops. For example "Models were developed for each species following avaliable scripts (github.com/soltislab/BotanyENMWorkshops)." Note, this workshop is related to another repository and an R package, which is updated outside of this workshop. This workshop is often done in conjuction with BiotaPhy, therefore our workshop repository contains additional presentation/demos/ect that are not included in the other.

If you use functions or scripts from the first half of this workshop, please cite this In-prep publication: Patten NN, Gaynor ML, Soltis DE, and Soltis PS. Geographic and Taxonomic Occurrence R-Based Scrubbing (gatoRs): An R package and reproducible workflow for processing biodiversity data. In prep.

botanyenmworkshops's People

Contributors

mgaynor1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

botanyenmworkshops's Issues

EcoSpat requires older versions

Issues exist on R version 4.2 for ecospat related to failing to download biomod2 dependent package correctly.

Current fix:
Use R version 4.1
install.packages("biomod2") install.packages("biospat") packageVersion("ecospat)

Version should be '3.5'

Correct proj 4 string on lesson 4

In lesson 4 there is an assignment of WGS84 via the ESPG code, that seems to be defunct at the moment, use proj 4 string instead.

RStoolbox is not available for standard installation on R version 4.3.0

install.packages("RStoolbox")
Installing package into ‘/home/jt-miller/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
Warning in install.packages :
  package ‘RStoolbox’ is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Looks like they haven't followed their CRAN updates: https://cran.r-project.org/web/packages/RStoolbox/index.html
Suggest to add it as a github_install along with rmaxent, kuenm, and gatoRs.
library(devtools) install_github("bleutner/RStoolbox")

Rversion:
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 3.0
year 2023
month 04
day 21
svn rev 84292
language R
version.string R version 4.3.0 (2023-04-21)
nickname Already Tomorrow

Bug for spacial correction of small datasets

Hi
I noticed an issue with the spacial correction script (lines 123-128 in the data cleaning section) when applied to small datasets with relatively uniform spacing among points. I initially ran the script on a 34 observation dataset and a 2.5'' resolution geotif, and I got back 3 points after correction. I integrated data from other sources to get my initial sample size up to 48, but this time I got back 2 points after correction. I realized that by filling in gaps in the distribution, the additional data made the observations more uniformly distributed. To illustrate, imagine 100 points in a line spaced 1 m apart. Attempting to satisfy a 2 m resolution by sequentially eliminating points with the smallest distance from its neighbor will result in deleting all but one data point because of uniform distribution. I have a workaround which is highly inelegant and computationally inefficient but seems to reasonably mitigate the glitch:

orginal:

Remove a point which nearest neighbor distance is smaller than the resolution size

aka remove one point in a pair that occurs within one pixel

while(min(nndist(df[,6:7])) < rasterResolution){
nnD <- nndist(df[,6:7])
df <- df[-(which(min(nnD) == nnD) [1]), ]
}

My soultion: 1) use a conditional to only apply the fix on data sets that have <300 observations (where the issue is most likely to be observed), 2) randomly choose which minimum distance point will be deleted from the pool of equally spaced candidates in each iteration of the while loop, and 3) repeat the while loop 50x and keep the iteration that retained the highest number of observations:

if (nrow(df) < 300) {
testdf <- df
result_holder = list()
for (i in 1:50){
result_holder[[i]] <- testdf
while(min(nndist(result_holder[[i]][,6:7])) < rasterResolution) {
nnD <- nndist(result_holder[[i]][,6:7])
result_holder[[i]] <- result_holder[[i]][-(sample(which(min(nnD) == nnD)) [1]), ]
}
}
df <- result_holder[[which.max(sapply(result_holder, nrow))]]
}
else {
while(min(nndist(df[,6:7])) < rasterResolution){
nnD <- nndist(df[,6:7])
df <- df[-(which(min(nnD) == nnD) [1]), ]
}
}

Best,
Tito

rangeBuilder and rnaturalearth fails to successfully call API

In Lesson 4 ClimateProcessing there is an erroneous API call that will cause getDynamicAlphaHull() to fail.

hull <- rangeBuilder::getDynamicAlphaHull(x = alldfsp@coords, fraction = 1, # min. fraction of records we want included partCount = 1, # number of polygons initialAlpha = 20, # initial alpha size, 20m clipToCoast = "terrestrial") # proj = "+proj=longlat +datum=WGS84") # Appears that this is currently disfunctional.

Will yield:

Downloading a world basemap for clipping. This only needs to happen once.


trying URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/physical/ne_50m_land.zip'
Error in utils::download.file(file.path(address), zip_file <- tempfile()) : 
  cannot open URL 'http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/50m/physical/ne_50m_land.zip'
In addition: Warning message:
In utils::download.file(file.path(address), zip_file <- tempfile()) :
  cannot open URL 'https://www.naturalearthdata.com/http/www.naturalearthdata.com/download/50m/physical/ne_50m_land.zip': HTTP status was '404 Not Found'

The failed call appears to be related to the missing '/' in 'https://www.naturalearthdata.com/http/www.naturalearthdata.com/download/50m/physical/ne_50m_land.zip' after the 'http/' as pointed out by Makenzi.

Fix currenlty is to run
install.packages("rnaturalearth")
As rnaturalearth seems to be the root of the erroneous call. This fix does not seem to be ubiquitous across systems however as it didnt resolve my errors until trying it at a later time (possibly calling the API too many times?).
Possible reasons listed prior were firewalls by university internet (seems unlikely), Rversion, or Operating System.

Fix has been tested on the cluster using versions 4.0 & 4.2 without errors. Will try on Windows operating system on non-university wifi at a later date.

installation of rangeBuilder *must* be through github

^ Otherwise errors will occur leading you to multipolygon (sf) objects. To install 'correctly' use
remotes::install_github('ptitle/rangeBuilder')
version should be 1.6 as of 02/20/2023,
2.0 is the Cran version and does not work with our current scripts.

06_Ecological_niche_modeling rJava memory allocation

The current memory allocation using options(java.parameters = "- Xmx16g") raises the following error when running dismo::maxent(...)
Loading required namespace: rJava
Unrecognized option: - Xmx16g
Error in .jinit(parameters = parameters) :
Unable to create a Java class loader.
In addition: Warning message:
In .local(x, p, ...) :
1 (0.06%) of the presence points have NA predictor values

Solution seems to be the removal of the spacing between -()Xmx16g -> options(java.parameters = "-Xmx8g") . This could be a linux thing as I haven't tested it on another OSs.

Also Possible comment to add to the beginning of script: Removal of present variables and restarting Rstudio session is necessary when using options(java.parameters=...) in order to avoid previous package loadspace issues. (see https://stackoverflow.com/questions/34624002/r-error-java-lang-outofmemoryerror-java-heap-space)

06_Ecological_Niche_Modeling_jtmod.txt

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so; LAPACK version 3.8.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8

time zone: America/Havana
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] kuenm_1.1.10 viridis_0.6.3 viridisLite_0.4.2 ggplot2_3.4.2 ENMeval_2.0.4 magrittr_2.0.3 dismo_1.3-14 dplyr_1.1.2
[9] gtools_3.9.4 raster_3.6-20 sp_1.6-0

loaded via a namespace (and not attached):
[1] gtable_0.3.3 compiler_4.3.0 tidyselect_1.2.0 Rcpp_1.0.10 gridExtra_2.3 scales_1.2.1 fastmap_1.1.1 lattice_0.21-8 R6_2.5.1
[10] generics_0.1.3 knitr_1.42 iterators_1.0.14 tibble_3.2.1 munsell_0.5.0 pillar_1.9.0 rlang_1.1.1 rgdal_1.6-6 utf8_1.2.3
[19] terra_1.7-29 xfun_0.38 cli_3.6.1 withr_2.5.0 digest_0.6.30 foreach_1.5.2 grid_4.3.0 rJava_1.0-6 rstudioapi_0.14
[28] lifecycle_1.0.3 vctrs_0.6.1 evaluate_0.20 glue_1.6.2 codetools_0.2-19 fansi_1.0.4 colorspace_2.1-0 rmarkdown_2.21 purrr_1.0.1
[37] htmltools_0.5.5 tools_4.3.0 pkgconfig_2.0.3

05_PointBased: ecospat.grid.clim.dyn suggests there are not enough relocations available.

The code chunk:

Kernel density estimates

create occurrence density grids based on the ordination data

z1 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p1.score, R = 100)
z2 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p2.score, R = 100)
z3 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p3.score, R = 100)
z4 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p4.score, R = 100)
zlist <- list(z1, z2, z3, z4)

Currently will error out: Error in adehabitatHR::kernelUD(sp::SpatialPoints(xr[, 1:2]), h = "href", :
At least 5 relocations are required to fit an home range

scores.clim are built off the PCAs per species, there seems to be adequate sample size for relocation. Unsure whats actually happening here.

Note: My ecospat version is 3.5.1

00_Setup has out of date packages

When fresh install is performed on packages, some packages will be newer than the listed version in setup.csv. Additionally, some packages are part of base R, therefore will fail any standard installation.

Here is a list of packages that are more up-to-date then currently the setup.csv suggests.

R session info:
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 3.0
year 2023
month 04
day 21
svn rev 84292
language R
version.string R version 4.3.0 (2023-04-21)
nickname Already Tomorrow

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.