Git Product home page Git Product logo

enmsdmx's Introduction

enmSdmX

Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed. cran version

Tools for modeling niches and distributions of species

enmSdmX is a set of tools in R for implementing species distribution models (SDMs) and ecological niche models (ENMs), including: bias correction, spatial cross-validation, model evaluation, raster interpolation, biotic velocity (speed and direction of movement of a "mass" represented by a raster), and tools for using spatially imprecise records. The heart of the package is a set of "training" functions which automatically optimize model complexity based number of available occurrences. These algorithms include MaxEnt, MaxNet, boosted regression trees/gradient boosting machines (BRT), generalized additive models (GAM), generalized linear models (GLM), natural splines (NS), and random forests (RF). To enhance interoperability with other packages, the package does not create any new classes. The package works with PROJ6 geodetic objects and coordinate reference systems.

Installation

You can install this package from CRAN using:

install.packages('enmSdmX', dependencies = TRUE)

Alternatively, you can install the development version of this package using:

remotes::install_github('adamlilith/enmSdmX', dependencies = TRUE)

You may need to install the remotes package first.

Functions

Using spatially imprecise records

  • coordImprecision: Coordinate imprecision
  • nearestGeogPoints: Minimum convex polygon from a set of spatial polygons and/or points ("nearest geographic point" method)
  • nearestEnvPoints: Extract "most conservative" environments from points and/or polygons ("nearest environmental point" method)

Data preparation

  • elimCellDuplicates: Eliminate duplicate points in each cell of a raster
  • geoFold: Assign geographically-distinct k-folds
  • geoFoldContrast: Assign geographically-distinct k-folds to background or contrast sites

Bias correction

  • geoThin: Thin geographic points deterministically or randomly
  • weightByDist: Proximity-based weighting for occurrences for correcting spatial bias

Model training

  • trainByCrossValid and summaryByCrossValid: Calibrate a distribution/niche model using cross-validation
  • trainBRT: Boosted regression trees (BRTs)
  • trainESM: Ensembles of small models (ESMs)
  • trainGAM: Generalized additive models (GAMs)
  • trainGLM: Generalized linear models (GLMs)
  • trainMaxEnt: MaxEnt models
  • trainMaxNet: MaxNet models
  • trainNS: Natural splines (NSs)
  • trainRF: Random forests (RFs)

Model prediction

  • predictEnmSdm: Predict most model types using default settings; parallelized
  • predictMaxEnt: Predict MaxEnt model
  • predictMaxNet: Predict MaxNet model

Model evaluation

  • evalAUC: AUC (with/out site weights)
  • evalMultiAUC: Multivariate version of AUC (with/out site weight)
  • evalContBoyce: Continuous Boyce Index (with/out site weights)
  • evalThreshold: Thresholds to convert continuous predictions to binary predictions (with/out site weights)
  • evalThresholdStats: Model accuracy based on thresholded predictions (with/out site weights)
  • evalTjursR2: Tjur's R2 (with/out site weights)
  • evalTSS: True Skill Statistic (TSS) (with/out site weights)
  • modelSize: Number of response values in a model object

Niche overlap and comparison

  • compareResponse: Compare different niche model responses along an environmental variable
  • nicheOverlapMetrics: Niche overlap metrics

Functions for rasters

  • bioticVelocity: Velocity of a "mass" across a time series of rasters
  • getValueByCell and setValueByCell: Retrieve or get raster values(s) by cell number
  • globalx: "Friendly" wrapper for terra::global() for calculatig raster statistics
  • interpolateRasts: Interpolate a stack of rasters
  • longLatRasts: Generate rasters with values of longitude/latitude for cell values
  • sampleRast : Sample raster with/out replacement
  • squareCellRast: Create a raster with square cells from an object with an extent

Coordinate reference systems

  • crss: Coordinate reference systems and their nicknames
  • customAlbers: Create a custom Albers conic equal-area projection
  • customLambert: Create a custom Lambert azimuthal equal-area projection
  • customVNS: Create a custom vertical near-side projection
  • getCRS: Return a WKT2 (well-known text) string using a nickname

Geographic utility functions

  • countPoints: Number of points in a "spatial points" object
  • decimalToDms: Convert decimal coordinate to degrees-minutes-seconds
  • dmsToDecimal: Convert degrees-minutes-seconds coordinate to decimal
  • extentToVect: Convert extent to a spatial polygon
  • plotExtent: Create a spatial polygon the same size as a plot region
  • spatVectorToSpatial: Convert SpatVector object to a Spatial* object

Data

  • lemurs: Lemur occurrences
  • mad0: Madagascar spatial object
  • mad1: Madagascar spatial object
  • madClim: Madagascar climate rasters for the present
  • madClim2030: Madagascar climate rasters for the 2030s
  • madClim2050: Madagascar climate rasters for the 2050s
  • madClim2070: Madagascar climate rasters for the 2070s
  • madClim2090: Madagascar climate rasters for the 2090s

Citation

Smith, A.B., Murphy, S.J., Henderson, D., and Erickson, K.D. 2023. Including imprecisely georeferenced specimens improves accuracy of species distribution models and estimates of niche breadth. Global Ecology and Biogeography In press. [open access pre-print | published article]

Abstract

Aim Museum and herbarium specimen records are frequently used to assess the conservation status of species and their responses to climate change. Typically, occurrences with imprecise geolocality information are discarded because they cannot be matched confidently to environmental conditions and are thus expected to increase uncertainty in downstream analyses. However, using only precisely georeferenced records risks undersampling of the environmental and geographical distributions of species. We present two related methods to allow the use of imprecisely georeferenced occurrences in biogeographical analysis.

Innovation Our two procedures assign imprecise records to the (1) locations or (2) climates that are closest to the geographical or environmental centroid of the precise records of a species. For virtual species, including imprecise records alongside precise records improved the accuracy of ecological niche models projected to the present and the future, especially for species with c.ย 20 or fewer precise occurrences. Using only precise records underestimated loss of suitable habitat and overestimated the amount of suitable habitat in both the present and the future. Including imprecise records also improves estimates of niche breadth and extent of occurrence. An analysis of 44 species of North American Asclepias (Apocynaceae) yielded similar results.

Main conclusions Existing studies examining the effects of spatial imprecision typically compare outcomes based on precise records against the same records with spatial error added to them. However, in real-world cases, analysts possess a mix of precise and imprecise records and must decide whether to retain or discard the latter. Discarding imprecise records can undersample the geographical and environmental distributions of species and lead to mis-estimation of responses to past and future climate change. Our method, for which we provide a software implementation in the enmSdmX package for R, is simple to use and can help leverage the large number of specimen records that are typically deemed "unusable" because of spatial imprecision in their geolocation.

enmsdmx's People

Contributors

adamlilith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

elkalexno vhsajna

enmsdmx's Issues

use terra interface

Hi Adam,

In two instances the enmSdmX package uses

ext <- terra::ext(x)@ptr$vector

That breaks with the development version of terra. The @ptr$ methods are not safe for use in other packages. Can you use the "official" interface instead? Something like this:

ext <- as.vector( terra::ext(x) )

Thanks!

BRT, GLM and GAM showing some error

Dear @adamlilith,

I'm a new user and really appreciate this wonderful package. I am getting some errors. However, it ran smoothly when I used lemurs data.

Following steps I am getting the error:

first error:

cur_glm <- trainGLM(data = bioCUR, resp = 'bg_pCUR',
preds = cur, verbose = TRUE, cores = 8)

Term-by-term evaluation:

################################ 

                       formula     AICc

1 bio_15 + bio_18 + bio_15:bio_18 55.34421
2 bio_19 + bio_4 + bio_19:bio_4 57.00321
3 bio_4 + bio_19 + bio_4:bio_19 57.00321
4 bio_4 + I(bio_4^2) 58.16015
5 bio_15 + bio_19 + bio_15:bio_19 59.00328
6 bio_4 59.15838
.
.
35 bio_9 + I(bio_9^2) 68.31131
36 bio_9 69.13251
37 bio_3 + bio_9 + bio_3:bio_9 70.21528
Error: cannot allocate vector of size 512.0 Gb

Second Error:

cur_gam <- trainGAM(data = bioCUR, resp = 'bg_pCUR', preds = cur, verbose = TRUE, cores = 4)
Error in { :
task 12 failed - "Repeated variables as arguments of a smooth are not permitted"
In addition: Warning messages:
1: In for (i in seq_along(new)) assign(names[i], new[[i]], envir = options) :
closing unused connection 6 (<-DESKTOP-7IET5NO:11809)
2: In for (i in seq_along(new)) assign(names[i], new[[i]], envir = options) :
closing unused connection 5 (<-DESKTOP-7IET5NO:11809)
3: In for (i in seq_along(new)) assign(names[i], new[[i]], envir = options) :
closing unused connection 4 (<-DESKTOP-7IET5NO:11809)
4: In for (i in seq_along(new)) assign(names[i], new[[i]], envir = options) :
closing unused connection 3 (<-DESKTOP-7IET5NO:11809)

Third Error:

cur_brt <- trainBRT(data = envSub,resp = 'bg_pCUR',preds = cur,
learningRate = 0.001, treeComplexity = 3, minTrees = 1200,
maxTrees = 1200, tryBy = 'treeComplexity',anyway = TRUE,
verbose = TRUE,cores = 4)

arningRate treeComplexity bagFraction maxTrees stepSize nTrees converged deviance
1 0.001 3 0.6 1200 50 NA FALSE NA
2 0.001 2 0.6 1200 50 NA FALSE NA
3 0.001 3 0.6 1200 50 NA FALSE NA
4 0.001 2 0.6 1200 50 NA FALSE NA
5 0.001 1 0.6 1200 50 NA FALSE NA
6 0.001 2 0.6 1200 50 NA FALSE NA

Warning message:
In trainBRT(data = envSub, resp = "bg_pCUR", preds = cur, learningRate = 0.001, :
No models converged and/or had sufficient trees.

Please help me in this regard.

Thanks
with regards
Ratnesh

Error in evalCountBoyce.

Hello,

I used to use enmSdm, but since all packages are migrating to the terra package, I changed my script to work with this package. My R in version 4.2.2, and i'm using package SDMTune in the latest version.

However, always this same error when calculating the CBI:

Here is my script:

#CBI

#Create spatial Points object from species presence dataframe
species <- vect(species_presence_data, geom = c("Longitude", "Latitude"), crs="WGS84")

#Extract values of prediction raster at species presence points
pres <- extract(prediction, species)

#Create spatial Points object from bg dataframe
bg_CBI <- vect(bg, geom = c("Longitude", "Latitude"), crs="WGS84")

#Extract values of prediction raster at background points
contrast <- extract(prediction, bg_CBI)

#Calculate Boyce Index using evalContBoyce function
cbiMax <- evalContBoyce(pres, contrast, na.rm = TRUE)

Error in min(c(pres, contrast), na.rm = na.rm) : invalid 'type' (list) of argument

How can i solve this? All my packages are updated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.