Git Product home page Git Product logo

simpoly's Introduction

DOI

SimPoly

The goal of SimPoly is to simulate a EU-PoMS like datasets with known parameters and appropriate built in stochasticity. The only goal is to explore how different types of variability affect model performance under strong simplified assumptions and should be interpreted as such.

DISCLAIMER: This is work in progress and the assumptions used to build the simulations can be enhanced.

Installation

# install.packages("devtools")
devtools::install_github("ibartomeus/SimPoly")

Example

library(SimPoly)
set.seed(32468)

This is a basic example to show how we build one dataset:

First we define the number of species, sites and years to simulate. We do this using the define_sites_years() functions. The pool argument is a call to sp_pool(), which simply defines the number of species (here pool = 100), and the occupancy of the rarest and commonest species.

site_years <- define_sites_years_rich(pool = sp_pool(pool = 1000, 
                                                mean_occ = 0.25),
                                 n_years = 7, n_sites = 100)

Let’s take a closer look at the data.

str(site_years)
head(site_years)
length(unique(site_years$siteID))
length(unique(site_years$species))

Note that the number of species in the dataset is lower than the species pool. This reflects stochasticity in how the species are assigned to sites.

Second, we specify species attributes such as phenology, abundance and detectability

pars <- sp_responses(site_years = site_years,
                     pheno_peak_mean = 120, pheno_peak_sd = 50,
                     pheno_range_mean = 25, pheno_range_sd = 5,
                     trend_max = 1, trend_min = 0.8)
str(pars)
hist(pars$h)
head(pars)

There is one row per species, with the following columns:
opt = day since 1 january of maximum abundance.
tol = spread -> range. h = expected species abundance along a dominant-rare log normal distribution.
slope = trend in abundance per year on the abundance scale.
detect = detectability in probability of a species being detected in a transect (independent of its abundance).
detect_pan = detectability in probability of a specimen falling in a pantrap.

Third, we sample the true abundance values expected at each sampling point.

dat <- true_abundance(n_round = 8,
                      site_years = site_years,
                      sp_responses = pars, white_noiseCV = 0.5)
str(dat)
head(dat)

There as many observations per species per site.
year: an integer from 1-3 (as defined by n_years, above). siteID: which site is it. round: month of the year of this survey. species: which species.
abundance: abundance during that visit.

From this table we can calculate species richness per site

library(reshape2)
(rich <- dcast(dat, siteID ~ "richness", value.var = 'species', function(x) length(unique(x))))
#and persistance (% of years where the species is present)
hist((persistence <- dcast(dat, siteID + species ~ "persistence", value.var = 'abundance', function(x) length(which(x == 0))/length(x)))$persistence, las = 1)
#average persistence over rounds, not over years.
yearly <- dcast(dat, year + siteID + species ~ "abundance", value.var = 'abundance', sum)
hist(yearly$abundance) #up to 400 indiv per year seems realistic
hist((persistence <- dcast(yearly, siteID + species ~ "persistence", value.var = 'abundance', function(x) length(which(x == 0))/length(x)))$persistence, las = 1)

Finally, we sample with detection error from the true values.

dat_obs <- obs_abundance(true_abundance = dat, sp_responses = pars, fraction_observed = 0.1)
head(dat_obs)
plot(dat_obs$obs, dat_obs$abundance) #nice expected correlation, but more noisy.
plot(dat_obs$obs, dat_obs$presences_pan) #nice expected correlation, but more noisy.

We can simulate a second transect if desired:

dat_obs$obs2 <- obs_abundance(true_abundance = dat, sp_responses = pars, fraction_observed = 0.1)$obs #note order is preserved
head(dat_obs)
plot(dat_obs$obs, dat_obs$obs2) #nice expected correlation.

This is the final output:

head(dat_obs)
plot(dat_obs$abundance ~ dat_obs$obs, las = 1, xlim = c(0,max(dat_obs$abundance)), 
     ylim = c(0,max(dat_obs$abundance))) 
#you can summarize observed variables per species, site and year
s_dat <- summary_poms(dat_obs, var_name = "obs")
head(s_dat)
scatter.smooth(s_dat$obs_abund ~ s_dat$year, las = 1)
#This can be easily loop to obtain several simulations

Citation

If using this package, please cite it:

citation("SimPoly")

To cite SimPoly in publications use:

  Bartomeus I, Isaac N and Schweiger O. 2023. SimPoly: Generating
  EU-PoMS like datasets https://github.com/ibartomeus/SimPoly

A BibTeX entry for LaTeX users is

  @Manual{,
    title = {SimPoly: Generating EU-PoMS like datasets},
    author = {Ignasi Bartomeus and Nick Isaac and Oliver Schweiger},
    year = {2023},
    url = {https://github.com/ibartomeus/SimPoly},
  }

Acknowledgements

STING, JRC and specially Nick Isaac and Oliver Schweiger and the coenocliner package.

simpoly's People

Contributors

ibartomeus avatar drnickisaac avatar

Watchers

 avatar

Forkers

drnickisaac

simpoly's Issues

pan trap model

It looks like the pan trap model is identical to the transect model. I think this should be changed to a binomial process with n trials, where n is the number of pan trap locations at each site. The outcome of this process is then the number of traps that contain the species of interest.

white noise

line 41 of true_abundances ... right now we use a small sd... Olli suggests using CV to add larger noise to abundant species... think about it.

Distribution of occupancy across species

@ibartomeus asked us to consider alternatives to this code on line 21 of sp_pool.r:
distrib <- runif(pool, rarest, commonest)

This will tend to produce more widespread species than typcial communities. One way to change this would be to replace with a beta distribution. The architecture would need to change a bit, but the distribution is quite flexible and moderately easy to use.

combined comments

Sp_pool
L21 runif seems suboptimal. Agree to let it be informed by data. If possible identify a relationship between the shape of the distribution and biogeographic factors (e.g. climate). I can give it a try, e.g. with eBMS data.

Define_sites_years
L3 rnorm: Agree to let it be informed by data, but assuming a normal distribution seem quite plausible, but why not using a Poisson distribution? Rich_mean and richness_sd (or lambda for rpois) might be more important to be informed by data (and by biogeography).
L39 assumption of no immigration or emigration: but this will not allow for simulating changes in species richness, or will this be addressed by the species trends?

Sp_responses
L27 Why hard-coding the days of the year?
L32 Rather fair, but bimodal might be more realistic. Or just use a more left-skewed distribution?
L33 Should be OK
L38 Might be nice to have the option to include different distributions. Max (or average) abundances could also be informed by data across biogeographic gradients.
L42 Agreed, we may go for a normal distribution (or some level of skewness) with the option to specify mean (or median) and sd. Might also be linked to the level of occupancy and/or abundance. Rare species might be expected to respond stronger (to the good or bad) than common ones. But perhaps also keep the option of random variation.
L48 Yes, function of h, but with a certain random component?

True_abundance
L31 Don’t know. But too few years might be problem?
L46 Is it also possible to gain species (e.g. starting with zero abundance in the first years)?

Obs_abundance
L22 Why hard-coding fraction_observed?
L50 adding noise might be good.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.