retoschmucki / rbms Goto Github PK

A home for the latest rbms R package

Home Page: https://retoschmucki.github.io/rbms/

License: Other

R 87.70% HTML 1.86% CSS 10.45%

rbms's Introduction

rbms

With rbms, we aim to facilitate the implementation of statistical and mathematical methods developed for computing relative abundance indices from yearly time-series of butterfly counts. These data are characterised by temporal patterns (phenology) that must be accounted for when deriving abundance from a time-series of counts. As a toolbox, we plan to implement more methods to compute and visualise metrics as they develop. The rbms package will provide the option of being coupled and work in line with other tools available and developed by the community (e.g., rtrim, BRCindicators). With the development of the 'rbms' R package, we also provide a tutorial to facilitate its usage and understanding.

Although rbms implements methods that were developed independently and for which the source should be acknowledged, users should also cite the rbms package and its version to ensure good referencing and improve the reproducibility of the work.

Suggested citation for the rbms package

Schmucki R., Harrower C.A., Dennis E.B. (2022) rbms: Computing generalised abundance indices for butterfly monitoring count data. R package version 1.1.3. https://github.com/RetoSchmucki/rbms

The rbms package implements methods from:

Dennis, E.B., Freeman, S.N., Brereton, T., Roy, D.B., 2013. Indexing butterfly abundance whilst accounting for missing counts and variability in seasonal pattern. Methods Ecol Evol 4, 637–645. https://doi.org/10.1111/2041-210X.12053
Dennis, E.B., Morgan, B.J.T., Freeman, S.N., Brereton, T.M., Roy, D.B., 2016. A generalized abundance index for seasonal invertebrates. Biom 72, 1305–1314. https://doi.org/10.1111/biom.12506
Schmucki, R., Pe’er, G., Roy, D.B., Stefanescu, C., Van Swaay, C.A.M., Oliver, T.H., Kuussaari, M., Van Strien, A.J., Ries, L., Settele, J., Musche, M., Carnicer, J., Schweiger, O., Brereton, T.M., Harpke, A., Heliölä, J., Kühn, E., Julliard, R., 2016. A regionally informed abundance index for supporting integrative analyses across butterfly monitoring schemes. J Appl Ecol 53, 501–510. https://doi.org/10.1111/1365-2664.12561

Installation

Once the R programming system is successfully installed, you can install the rbms package from GitHub. For this you will need to install the package devtools or remotes, both available on CRAN. Once installed, use the function devtools::install_github() to install the rbms package on your system.

Note that rbms is build with R > 3.6.0, so you might need to update your R system before installation. To install devtools on Windows system, you will also need to install Rtools - note Rtools40 for R 4.x.x New to R or want to refresh your R coding skills, try the excellent tutorials available at ourcodingclub 👍

if(!requireNamespace("devtools")) install.packages("devtools")
devtools::install_github("RetoSchmucki/rbms")

Get Started

Consult the tutorial vignettes / Articles to learn about rbms data format and the workflow how to compute flight curves, abundance indices, and collated index with a bootstrap confidence interval

Further documentation is also available through the help function in R or from the rbms online references

Development

Since version v.1.1.0, rbms is likely to perform better with species having sparse data, potentially resulting in more flight curves and indices being computed. This version implements a basic plot method for "pheno_curve", using the object produced by the flight_curve() function as an argument.

# usage or plot method
library(rbms)
data(m_visit)
data(m_count)

ts_date <- rbms::ts_dwmy_table(InitYear = 2000, LastYear = 2003, WeekDay1 = 'monday')

ts_season <- rbms::ts_monit_season(ts_date,
                       StartMonth = 4,
                       EndMonth = 9, 
                       StartDay = 1,
                       EndDay = NULL,
                       CompltSeason = TRUE,
                       Anchor = TRUE,
                       AnchorLength = 2,
                       AnchorLag = 2,
                       TimeUnit = 'w')

ts_season_visit <- rbms::ts_monit_site(ts_season, m_visit)

ts_season_count <- rbms::ts_monit_count_site(ts_season_visit, m_count, sp = 2)

ts_flight_curve <- rbms::flight_curve(ts_season_count, 
                       NbrSample = 300,
                       MinVisit = 5,
                       MinOccur = 3,
                       MinNbrSite = 5,
                       MaxTrial = 4,
                       GamFamily = 'nb',
                       SpeedGam = FALSE,
                       CompltSeason = TRUE,
                       SelectYear = NULL,
                       TimeUnit = 'w')

## Flight curves are in the "pheno" data.table located within the ts_flight_curve result that is a list
str(ts_flight_curve$pheno)

## Basic plot method to plot flight curve
plot(ts_flight_curve)
points(ts_flight_curve, col = 'magenta', pch = 19)

# for a single or multiple years
plot(ts_flight_curve, year = c(2001, 2002))
points(ts_flight_curve, year = 2001, col = 'magenta', pch = 19)

# for multiple year curves overlapping on a base year, use argument BaseYear
plot(ts_flight_curve, year = 2000, SiteID = 1, col = 'dodgerblue4')
points(ts_flight_curve, year = 2002, SiteID = 1, BaseYear = 2000, type = 'l', col = 'cyan4')
points(ts_flight_curve, year = 2001, SiteID = 1, BaseYear = 2000, type = 'l', col = 'orange')
legend("topright", legend = c("2000", "2001", "2002"), 
   col = c("dodgerblue4", "cyan4", "orange"), lty = 1,
   box.lty=0)

# multiple year curves on a time-series with different portion per year, use xlim
plot(ts_flight_curve, year = c(2000), xlim = as.Date(c("2000-01-01", "2002-12-30"), format = "%Y-%m-%d"), SiteID = 1, col = 'dodgerblue4') 
points(ts_flight_curve, year = 2001, SiteID = 1, type = 'l', col = 'cyan4') 
points(ts_flight_curve, year = 2002, SiteID = 1, type = 'l', col = 'orange') 
legend("topright", legend = c("2000", "2001", "2002"), 
   col = c("dodgerblue4", "cyan4", "orange"), lty = 1, 
   box.lty=0)

Reporting Issues

For reporting issues related to this package or workflow, please visit the issue and before opening a new, see if your problem is not already in the list reported issues here

rbms's People

Contributors

Stargazers

Watchers

Forkers

ealarsen colinharrower wkmor1 dylancarbone

rbms's Issues

fix <<- in impute_count

avoid assign within function and insure output of the flight_curve is properly used

How to automate the script across all species

Dear Reto @RetoSchmucki,

I am trying to automate your script from https://retoschmucki.github.io/rbms/articles/Get_Started_1.html and https://retoschmucki.github.io/rbms/articles/Get_Started_2.html to run it across all species at once.

For this I am creating a "for" loop at each step of the script, e.g (with LUBMS_count = the count table):
species.names <- unique(LUBMS_count[order(SPECIES), SPECIES])
for(i in 1:length(species.names)){
ts_season_count <- rbms::ts_monit_count_site(ts_season_visit, LUBMS_count, sp = species.names[i])
saveRDS(ts_season_count, file.path("K:/BIODIV/BIOLYS/3_Working area/WP2 - butterfly trends/Working directory/1-ALL transects/ts_season_count", paste(gsub(" ", "_", species.names[i]), "ts_season_count.rds", sep = "_")))
}
This command works fine but not for all steps. For example, when it comes to plot the flight curve, the NAs from the field NM are causing an error message ending the loop, and I don't know how to handle this.
for(i in 1:length(species.names)){
ts_flight_curve <- readRDS(file.path("K:/BIODIV/BIOLYS/3_Working area/WP2 - butterfly trends/Working directory/1-ALL transects/flight curve", paste(gsub(" ", "_", species.names[i]), "pheno.rds", sep = "_")))
pheno <- ts_flight_curve$pheno
yr <- unique(pheno[order(M_YEAR), as.numeric(as.character(M_YEAR))])
if("trimWEEKNO" %in% names(pheno)){
plot(pheno[M_YEAR == yr[1], trimWEEKNO], pheno[M_YEAR == yr[1], NM], type = 'l', ylim = c(0, max(pheno[, NM])), xlab = 'Monitoring Week', ylab = 'Relative Abundance', lwd=3, main=paste(species.names[i], "flight curve"))
} else {
plot(pheno[M_YEAR == yr[1], trimDAYNO], pheno[M_YEAR == yr[1], NM], type = 'l', ylim = c(0, max(pheno[, NM])), xlab = 'Monitoring Day', ylab = 'Relative Abundance', lwd=3, main=paste(species.names[i], "flight curve"))
}
if(length(yr) > 1) {
j <- 2
for(y in yr[-1]){
if("trimWEEKNO" %in% names(pheno)){
points(pheno[M_YEAR == y , trimWEEKNO], pheno[M_YEAR == y, NM], type = 'l', col = i, lwd=3)
} else {
points(pheno[M_YEAR == y, trimDAYNO], pheno[M_YEAR == y, NM], type = 'l', col = i, lwd=3)
}
j <- j + 1
}}
legend('topright', legend = c(yr), col = c(seq_along(c(yr))), lty = 1, bty = 'o', lwd = 3, title="Year")
}
=> Error in plot.window(...) : need finite 'ylim' values

There are two cases:

For species with only some years with no flight curve, then the script should ignore these years and plot only the years with a flight curve available. There would be a simple (but not optimal) solution, just by creating a new "pheno" where rows with NAs are removed (see issue #18 ).
For species where not a single year has a flight curve, then the script should just ignore this species and go to the next species. => this is where I am stuck.

I have this issue at three steps in the script:

To plot the annual flight curves
To calculate the species trend (from "workshop_functions.R", https://butterfly-monitoring.github.io/bms_workshop/species_trends.html)
To plot the annual collated indices with trend line

Would you know how to adapt the script to resolve this issue? I guess the statement "if next" would be helpful here but I have no clue on how to correctly apply it as my loop is built on species names and not on NM values, I cannot find examples on the web similar to the case here.

Thank you very much in advance for your help !

Sarah

Weird collated index and lower IC

Hi @RetoSchmucki, that's me again ^^"
For several species in some years, I obtain a very low collated index and/or a very low CI.

The low collated indices always correspond to a year with 0 specimen recorded, so maybe this very low value is normal?
The low CIs always correspond to years with very few occurrences, not enough to build a flight curve, but I am wondering why the lower CI is so low whereas the upper CI looks normal?

Here is the example of A. agestis:

Aricia agestis_collated index.csv
Here are the counts per year:

Here is another example, P. c-album:
Polygonia c-album_collated index.csv

In 2012, the collated index looks OK but the lower CI is very low.
In 2017, the collated index is very low although it is the year with the highest counts:

But there is clearly an issue with the computed flight curve for 2017 (see issue #21).

Thank you very much for your help!

Error in one line of the script

Dear Reto

I downloaded the rbms package and it worked fine untill I tried to run this part of the code

==============
co_index <- list()

for progression bar, uncomment the following

pb <- txtProgressBar(min = 0, max = dim(bootsample$boot_ind)[1], initial = 0, char = "*", style = 3)

for(i in c(0,seq_len(dim(bootsample$boot_ind)[1]))){

co_index[[i+1]] <- rbmsD::collated_index(data = sindex, s_sp = 2, sindex_value = "SINDEX", bootID=i, boot_ind= bootsample, glm_weights=TRUE, rm_zero=TRUE)

for progression bar, uncomment the following

setTxtProgressBar(pb, i)

}

collate and append all the result in a data.table format

co_index <- rbindlist(lapply(co_index, FUN = "[[","col_index"))

resulting in an error message. I removed the D from 'rbmsD' in this part of the code "co_index[[i+1]] <- rbmsD::collated_index(data = sindex, s_sp = 2, sindex_value = "SINDEX", bootID=i, boot_ind= bootsample, glm_weights=TRUE, rm_zero=TRUE)" and it worked well ... so I think/hope this is the correct solution for it?

Many thanks for making this analysis possible!

Best regards
Dirk

Error at line 59: lenghts of variables differ

Hello,
I was running the Get_Started_2 markdown file, and I encountered an error at line 59 while calculating the collated index:

co_index <- collated_index(data = sindex, s_sp = 2, sindex_value = "SINDEX", glm_weights = TRUE, rm_zero = TRUE)

Error in model.frame.default(weights = data_boot$weights, drop.unused.levels = TRUE, :
variable lenghts differ (found for '(weights)')

Full traceback attached

What could be the reason? I get this error both with my data (not shown) and with the provided files.

Thank you for your help,

Stefano Masier

month names instead of week numbers

Would it be possible to add the option of plotting month names rather than week numbers on x axis?

https://retoschmucki.github.io/rbms/articles/Get_Started_1.html

Updated code to use terra pakage and avoid raster and rasterVis

I did update the code from your workshop (https://butterfly-monitoring.github.io/bms_workshop/index.html) to use teh new terra package and discard the use of raster and rasterVis. I did also correct some small issues in your code taht were causing errors. Hope this is usefull

regards

if (!require("data.table")){
install.packages("data.table")
library(data.table)}
if (!require("speedglm")){
install.packages("speedglm")
library(speedglm)}
if (!require("devtools")){
install.packages("devtools")
library(devtools)}
if (!require("ggplot2")){
install.packages("ggplot2")
library(ggplot2)}
if (!require("reshape2")){
install.packages("reshape2")
library(reshape2)}
if (!require("plyr")){
install.packages("plyr")
library(plyr)}
if (!require("sf")){
install.packages("sf")
library(sf)}
if (!require("terra")) {
install.packages("terra")
library(terra)}
if (!require("tmap")){
remotes::install_github('r-tmap/tmap')
##install.packages("tmap")
library(tmap)}
if (!require("DT")){
install.packages("DT")
library(DT)}

rbms from GitHub

if(!requireNamespace("devtools")) {
install.packages("devtools")
devtools::install_github("RetoSchmucki/rbms")
library(rbms)}

library(data.table)

load .rds data, we use the function readRDS()

b_count_sub <- readRDS("bms_workshop_data/work_count.rds") # species data with counts
m_visit_sub <- readRDS("bms_workshop_data/work_visit.rds") # date and time of transects/stations
m_site_sub <- readRDS("bms_workshop_data/work_site.rds") # km coordinates of sites

to read csv into a data.table we use fread()

bcz_class <- data.table::fread("bms_workshop_data/GEnS_v3_classification.csv") # Bioclimate classes
b_count_sub

Extract columns, to store only 'bms_id', the reference to samples, teh year of the record and the full date

my_new_dt <- b_count_sub[ , .(bms_id, transect_id_serial, year, visit_date)]
my_new_dt

Extract unique

unique(my_new_dt$bms_id) # store the unique 'bms_id'
unique(my_new_dt[, .(bms_id, year)]) # store 'bms_id' and 'year'

Subset

my_new_dt[bms_id == "NLBMS", ]

Work with date

my_new_dt[ , month := month(visit_date)][ , c("day", "year") := .(mday(visit_date), year(visit_date))]
my_new_dt

Count object

my_new_dt[ , .N, by = bms_id]

rename column

change names

setnames(b_count_sub, "transect_id_serial", "SITE_ID")
setnames(b_count_sub, c("species_name", "bms_id"), c("SPECIES", "BMS_ID"))
names(b_count_sub) <- toupper(names(b_count_sub))
b_count_sub

merge data set

merge species count data with site coordinate data based on a common identifier (SITE_ID),

demonstrating both inner and left joins.

setnames(m_site_sub, "transect_id_serial", "SITE_ID")

setkey(b_count_sub, SITE_ID)
setkey(m_site_sub, SITE_ID)

merge(b_count_sub, m_site_sub[bms_id == "FRBMS" , .(SITE_ID, transect_lon_1km, transect_lat_1km)])
merge(b_count_sub, m_site_sub[bms_id == "FRBMS" , .(SITE_ID, transect_lon_1km, transect_lat_1km)], all.x = TRUE)
data_m <- merge(b_count_sub, m_site_sub[ , .(SITE_ID, transect_lon_1km, transect_lat_1km)], all.x = TRUE)
data_m

Mapping BMS sites

library(data.table)
library(sf)
library(tmap)

b_count_sub <- readRDS("bms_workshop_data/work_count.rds")
m_visit_sub <- readRDS("bms_workshop_data/work_visit.rds")
m_site_sub <- readRDS("bms_workshop_data/work_site.rds")

Mapping

bcz <- terra::rast("bms_workshop_data/metzger_bcz_genz3.grd")
bb <- sf::st_read("bms_workshop_data/bbox.shp")
bcz_class <- data.table::fread("bms_workshop_data/GEnS_v3_classification.csv")

sf transect location

site_sf <- st_as_sf(m_site_sub, coords = c("transect_lon_1km", "transect_lat_1km"), crs = 3035)
mapview::mapview(site_sf)

To produce a map with the Bioclimate Region, we use the function levelplot() in the rasterVis package

plot point on raster

site_sf_4326 <- st_transform(site_sf, crs = 4326)
bb_4326 <- st_transform(bb, crs = 4326)

Begin the tmap visualization

tm_layout(frame = FALSE) + # Optional: Adjust layout settings as needed
tm_shape(bcz) +
tm_raster(style = "cont", palette = "viridis", title = "Bioclimate") +
tm_shape(site_sf_4326) +
tm_dots(col = "cyan4", size = 0.1) + # Adjust size as needed
tm_shape(bb_4326) +
tm_borders(col = "magenta", lwd = 2, lty = 2)

plot point on raster using tmap

tm_shape(bcz) +
tm_raster(palette = "viridis", alpha = 1) + # Example using a viridis palette
tm_shape(st_geometry(st_transform(site_sf, 4326))) +
tm_symbols(col = "dodgerblue4", size = 0.1, shape = 20) +
tm_shape(st_geometry(st_transform(bb, 4326))) +
tm_borders(lty = 2, col= 'magenta', lwd=2)

Extracting spatial information

site_dt <- data.table(site_sf_4326)

Convert 'site_sf' to 'data.table', ensuring to include all attributes

site_dt <- data.table::setDT(sf::st_drop_geometry(site_sf_4326))
print(names(site_sf_4326))

Assuming 'bcz' is a SpatRaster object loaded with terra

Transform 'site_sf' to match the CRS of 'bcz'

site_sf_4326 <- st_transform(site_sf_4326, crs = crs(bcz))

Assuming 'site_sf_4326' is correctly transformed

Extract values

extracted_values <- terra::extract(bcz, site_sf_4326, exact=TRUE)

Ensure extracted_values is correctly structured for assignment

If bcz has a single layer, extracted_values should be a vector or a single-column data.frame

if (is.data.frame(extracted_values)) {
# Assuming we're interested in the first layer or there's only one
site_dt$GEnZ_seq <- extracted_values[[1]]
} else {
# If it's not a data.frame, it should be directly assignable, but this condition is just for illustration
site_dt$GEnZ_seq <- extracted_values
}

----

setkey(site_dt, GEnZ_seq)
setkey(bcz_class, GEnZ_seq)
site_dt <- merge(site_dt, unique(bcz_class[, .(GEnZ_seq, GEnZname)]), all.x=TRUE)

After extraction and before merge

print(names(site_dt))

Exploring results

library(DT)
datatable(site_dt[ ,.(bms_id, transect_id_serial, GEnZ_seq, GEnZname)], rownames = FALSE, filter = "top", options = list(pageLength = 5, scrollX = T))

---

mapview::mapview(bcz) + mapview::mapview(st_transform(site_sf, 4326))

Phenology in BMS data - flight curve

if(!requireNamespace("devtools")) install.packages("devtools")
devtools::install_github("RetoSchmucki/rbms")
library(data.table)
library(rbms)

b_count_sub <- readRDS("bms_workshop_data/work_count.rds")
m_visit_sub <- readRDS("bms_workshop_data/work_visit.rds")

Explore the data

library(DT)
datatable(b_count_sub[1:50, ], rownames = FALSE, filter = "top", options = list(pageLength = 5, scrollX = T))
datatable(m_visit_sub[1:50, ], rownames = FALSE, filter = "top", options = list(pageLength = 5, scrollX = T))
unique(b_count_sub[ , .(bms_id, species_name)])
unique(b_count_sub[order(year), year])

1. Select a single species

s_sp <- "Polyommatus icarus"
region_bms <- c("NLBMS", "FRBMS")
data(m_visit)
data(m_count)

2. Arrange data set name to fit the functions requirements

setnames(m_visit_sub, c("transect_id_serial", "visit_date"), c("SITE_ID", "DATE"))
names(m_visit_sub) <- toupper(names(m_visit_sub))

setnames(b_count_sub, c("transect_id_serial", "visit_date", "species_name"), c("SITE_ID", "DATE", "SPECIES"))
names(b_count_sub) <- toupper(names(b_count_sub))

3. Prepare a full time-series to align count, visit, monitoring and flight curve estimate,

using the function ts_dwmy_table(), with 'InitYear', 'LastYear' and 'WeekDay1'

ts_date <- rbms::ts_dwmy_table(InitYear = 2008,
LastYear = 2017,
WeekDay1 = "monday")

4. Define the monitoring season with the function ts_monit_season()(), using ts_date, StartMonth, EndMonth, StartDay,

EndDay, CompltSeason, Anchor, AnchorLength, AnchorLag, TimeUnit (“d” or “w”) as arguments

ts_season <- rbms::ts_monit_season(ts_date,
StartMonth = 4,
EndMonth = 9,
StartDay = 1,
EndDay = NULL,
CompltSeason = TRUE,
Anchor = TRUE,
AnchorLength = 2,
AnchorLag = 2,
TimeUnit = "w")

5. Align the site visit in the region of interest with your monitoring season

MY_visit_region <- m_visit_sub[BMS_ID %in% region_bms, ]
ts_season_visit <- rbms::ts_monit_site(ts_season, MY_visit_region)

6. Align the butterfly counts recorded for the species and region of interest with your visit and monitoring season

MY_count_region <- b_count_sub[BMS_ID %in% region_bms, ]
ts_season_count <- rbms::ts_monit_count_site(ts_season_visit, MY_count_region, sp = s_sp)

7. Estimate the flight curve for a species and region of interest,

using cubic spline smoother available in the mgcv package and considering

ts_flight_curve <- rbms::flight_curve(ts_season_count,
NbrSample = 500,
MinVisit = 3,
MinOccur = 2,
MinNbrSite = 5,
MaxTrial = 4,
GamFamily = "nb",
SpeedGam = FALSE,
CompltSeason = TRUE,
SelectYear = NULL,
TimeUnit = "w"
)
saveRDS(ts_flight_curve, file.path("bms_workshop_data", paste(gsub(" ", "", s_sp),
paste(region_bms, collapse=""),
"pheno.rds", sep = "_")))

8. Extract and plot the estimated phenology

ts_flight_curve <- readRDS(file.path("bms_workshop_data", paste(gsub(" ", "", s_sp), paste(region_bms, collapse=""), "pheno.rds", sep = "_")))
datatable(ts_flight_curve$pheno[1:370, .(SPECIES, M_YEAR, MONTH, trimWEEKNO, WEEK_SINCE, ANCHOR, NM)], rownames = FALSE, filter = "top", options = list(pageLength = 5, scrollX = T))

Plot using base R

Extract phenology part

pheno <- ts_flight_curve$pheno

add the line of the first year

yr <- unique(pheno[order(M_YEAR), as.numeric(as.character(M_YEAR))])

if("trimWEEKNO" %in% names(pheno)){
plot(pheno[M_YEAR == yr[1], trimWEEKNO], pheno[M_YEAR == yr[1], NM], type = 'l', ylim = c(0, max(pheno[, NM])), xlab = 'Monitoring Week', ylab = 'Relative Abundance')
} else {
plot(pheno[M_YEAR == yr[1], trimDAYNO], pheno[M_YEAR == yr[1], NM], type = 'l', ylim = c(0, max(pheno[, NM])), xlab = 'Monitoring Day', ylab = 'Relative Abundance')

}

add individual curves for additional years

if(length(yr) > 1) {
i <- 2
for(y in yr[-1]){
if("trimWEEKNO" %in% names(pheno)){
points(pheno[M_YEAR == y , trimWEEKNO], pheno[M_YEAR == y, NM], type = 'l', col = i)
} else {
points(pheno[M_YEAR == y, trimDAYNO], pheno[M_YEAR == y, NM], type = 'l', col = i)
}
i <- i + 1
}
}

add legend

legend('topright', legend = c(yr), col = c(seq_along(c(yr))), lty = 1, bty = 'n')

Computing site and collated indices

Generalized Abundance Index

library(data.table)
library(rbms)
library(ggplot2)

s_sp <- "Maniola jurtina"

s_sp <- "Polyommatus icarus"
region_bms <- c("NLBMS", "FRBMS")

We will use the count and visit data

b_count_sub <- readRDS("bms_workshop_data/work_count.rds")
m_visit_sub <- readRDS("bms_workshop_data/work_visit.rds")

setnames(m_visit_sub, c("transect_id_serial", "visit_date"), c("SITE_ID", "DATE"))
names(m_visit_sub) <- toupper(names(m_visit_sub))

setnames(b_count_sub, c("transect_id_serial", "visit_date", "species_name"), c("SITE_ID", "DATE", "SPECIES"))
names(b_count_sub) <- toupper(names(b_count_sub))

ts_date <- rbms::ts_dwmy_table(InitYear = 2008,
LastYear = 2017,
WeekDay1 = "monday")

ts_season <- rbms::ts_monit_season(ts_date,
StartMonth = 4,
EndMonth = 9,
StartDay = 1,
EndDay = NULL,
CompltSeason = TRUE,
Anchor = TRUE,
AnchorLength = 2,
AnchorLag = 2,
TimeUnit = "w")

MY_visit_region <- m_visit_sub[BMS_ID %in% region_bms, ]
ts_season_visit <- rbms::ts_monit_site(ts_season, MY_visit_region)

MY_count_region <- b_count_sub[BMS_ID %in% region_bms, ]
ts_season_count <- rbms::ts_monit_count_site(ts_season_visit, MY_count_region, sp = s_sp)

Impute & Site index

impute values for missing week counts

and adequately compute sites abundance index over the entire monitoring season

impt_counts <- rbms::impute_count(ts_season_count = ts_season_count,
ts_flight_curve = pheno,
YearLimit = NULL,
TimeUnit = "w")

sindex <- rbms::site_index(butterfly_count = impt_counts, MinFC = 0.10)

saveRDS(sindex, file.path("bms_workshop_data", paste( gsub(" ", "", s_sp),
paste(region_bms, collapse=""),
"sindex.rds", sep="_")))

Collated index, using the collated_index function in rbms to estimate annual abundance indices for a wider region

we will focus on the sites monitored in the Netherlands (NLBMS) and compute a collated index for Polyommatus icarus

accross the site monitored in that region over the 10-year period.

bms_id <- "NLBMS"
sindex <- readRDS(file.path("bms_workshop_data", paste( gsub(" ", "", s_sp),
paste(region_bms, collapse=""),
"sindex.rds", sep="_")))
sindex[substr(SITE_ID, 1, 5) == bms_id, ]

co_index <- collated_index(data = sindex[substr(SITE_ID, 1, 5) == bms_id, ],
s_sp = s_sp,
sindex_value = "SINDEX",
glm_weights = TRUE,
rm_zero = TRUE)
co_index$col_index

Bootstrap CI

bms_id <- "NLBMS"
set.seed(125361)

bootsample <- rbms::boot_sample(sindex[substr(SITE_ID, 1, 5) == bms_id, ], boot_n = 500)
co_index <- list()

for progression bar, uncomment the following

pb <- txtProgressBar(min = 0, max = dim(bootsample$boot_ind)[1], initial = 0, char = "*", style = 3)

for (i in c(0, seq_len(dim(bootsample$boot_ind)[1]))) {
co_index[[i + 1]] <- rbms::collated_index(data = sindex[substr(SITE_ID, 1, 5) == bms_id, ],
s_sp = s_sp,
sindex_value = "SINDEX",
bootID = i,
boot_ind = bootsample,
glm_weights = TRUE,
rm_zero = TRUE)

    ## for progression bar, uncomment the following
    setTxtProgressBar(pb, i)

}

collate and append all the result in a data.table format

co_index <- rbindlist(lapply(co_index, FUN = "[[", "col_index"))
co_index$BMS_ID <- bms_id
co_index$SPECIES <- s_sp

co_index_boot <- co_index
saveRDS(co_index_boot, file.path("bms_workshop_data", paste( gsub(" ", "", s_sp), bms_id, "co_index_boot.rds", sep="")))

Figure with CI

bms_id <- "NLBMS"

co_index_boot <- readRDS(file.path("bms_workshop_data", paste( gsub(" ", "", s_sp), bms_id, "co_index_boot.rds", sep="")))

boot0 <- co_index_boot[ , mean(COL_INDEX, na.rm=TRUE), by = M_YEAR]
boot0[, LOGDENSITY0 := log(V1) / log(10)]
boot0[, meanlog0 := mean(LOGDENSITY0)]

Add mean of BOOTi == 0 (real data) to all data

boot_ests <- merge(co_index_boot, boot0[, .(M_YEAR, meanlog0)],
by = c("M_YEAR")
)

boot_ests[, LOGDENSITY := log(COL_INDEX) / log(10)]
boot_ests[, TRMOBSCI := LOGDENSITY - meanlog0 + 2]

boot_ests[, TRMOBSCILOW := quantile(TRMOBSCI, 0.025), by = M_YEAR]
boot_ests[, TRMOBSCIUPP := quantile(TRMOBSCI, 0.975), by = M_YEAR]

ylim_ <- c(min(floor(range(boot_ests[TRMOBSCI <= TRMOBSCIUPP & TRMOBSCI >= TRMOBSCILOW, TRMOBSCI]))),
max(ceiling(range(boot_ests[TRMOBSCI <= TRMOBSCIUPP & TRMOBSCI >= TRMOBSCILOW, TRMOBSCI]))))

plot_col <- ggplot(data = boot_ests[TRMOBSCI <= TRMOBSCIUPP & TRMOBSCI >= TRMOBSCILOW, ], aes(x = M_YEAR, y = TRMOBSCI)) +
geom_point(colour = "orange", alpha = 0.2) + ylim(ylim_) +
geom_line(data = boot_ests[TRMOBSCI <= TRMOBSCIUPP & TRMOBSCI >= TRMOBSCILOW, median(TRMOBSCI, na.rm=TRUE), by = M_YEAR], aes(x = M_YEAR, y = V1), linewidth = 1, colour = "dodgerblue4") +
geom_hline(yintercept = 2, linetype = "dashed", color = "grey50") +
scale_x_continuous(minor_breaks = waiver(), limits = c(2005, 2020), breaks = seq(2005, 2020, by = 5)) +
xlab("Year") + ylab("Abundance Incices (Log/Log(10))") +
labs(subtitle = paste(s_sp, bms_id, sep = " "))

plot_col

Calculating species trends

source("workshop_functions.R")

Read in collated indices for species and BMS of interest

Specify the species - note the underscore

spp <- "Maniola_jurtina"

Specify the BMS

bms <- "UKBMS"

Read in the bootstrapped collated indices

co_index <- readRDS(paste0("./bms_workshop_data/", spp, "_co_index_boot.rds"))

Filter to a single BMS to focus upon

co_index <- co_index[BMS_ID == bms]
co_index

Convert to log 10 collated indices (TRMOBS)

Only use COL_INDEX larger then 0 for calculation or logdensity and trmobs

co_index[COL_INDEX > 0, LOGDENSITY:= log(COL_INDEX)/log(10)][, TRMOBS := LOGDENSITY - mean(LOGDENSITY) + 2, by = .(BOOTi)]
co_index

Estimate and classify species trends with a confidence interval based on the bootstraps

sp_trend <- estimate_boot_trends(co_index)
sp_trend

Plot the species log collated index with bootstrapped 95% confidence interval and linear trend line (in red)

Calculate mean log index for original data

co_index0_mean <- mean(co_index[BOOTi == 0]$LOGDENSITY, na.rm = TRUE)

Derive interval from quantiles

co_index_ci <- merge(co_index[BOOTi == 0, .(M_YEAR, TRMOBS, BMS_ID)],
co_index[BOOTi != 0, .(
LOWER = quantile(LOGDENSITY - co_index0_mean + 2, 0.025, na.rm = TRUE),
UPPER = quantile(LOGDENSITY - co_index0_mean + 2, 0.975, na.rm = TRUE)),
by = .(BMS_ID, M_YEAR)],
by=c("BMS_ID","M_YEAR"))

ggplot(co_index_ci, aes(M_YEAR, TRMOBS))+
theme(text = element_text(size = 14))+
geom_line()+
geom_point()+
geom_ribbon(aes(ymin = LOWER, ymax = UPPER), alpha = .3)+
geom_smooth(method="lm", se=FALSE, color="red")+
xlab("Year")+ylab(expression('log '['(10)']*' Collated Index'))+
ggtitle(paste("Collated index for", gsub("_", " ", spp), "in", bms))

Multi-species indicators

Read in collated indices for two species and filter to one BMS

bms <- "UKBMS"
co_index <- rbind(readRDS("./bms_workshop_data/Maniola_jurtina_co_index_boot.rds"),
readRDS("./bms_workshop_data/Polyommatus_icarus_co_index_boot.rds"))
co_index <- co_index[BMS_ID == bms]
co_index

Convert to log 10 collated indices (TRMOBS)

co_index[COL_INDEX > 0, LOGDENSITY:= log(COL_INDEX)/log(10)][,TRMOBS := LOGDENSITY - mean(LOGDENSITY) + 2,
by = .(SPECIES, BOOTi)]
co_index

calculate and plot the indicator, where the confidence interval is based on the bootstraps

First we generate the indicator for real data

msi <- produce_indicator0(co_index[BOOTi == 0])
msi

Then generate an indicator for each bootstrap

msi_boot <- produce_indicators_boot(co_index[BOOTi > 0])
str(msi_boot)
msi_boot[,1:5]

add a confidence interval to the indicator based on quantiles from the bootstrapped indicators

msi <- add_indicator_CI(msi, msi_boot)
msi

Plot the indicator

ggplot(msi, aes(year, indicator))+
theme(text = element_text(size = 14))+
geom_ribbon(aes(ymin = LOWsmooth1, ymax = UPPsmooth1), alpha=.2, fill = "blue")+
geom_point(color = "blue")+
geom_line(aes(y = SMOOTH), color = "blue")+
ylim(c(0, max(200,max(msi$UPPsmooth1))))+
ggtitle(paste("Example indicator for", bms, "based on two species"))

Calculate the indicator trend including confidence interval from the bootstraps

msi_trend <- estimate_ind_trends(msi, msi_boot)
msi_trend

Issue with rbms::flight_curve()

Hi @RetoSchmucki,
For some species the fitted counts computed by the function rbms::flight_curve() are not at all in line with the "real" counts.
With the code:

species.names <- unique(LUBMS_count[order(SPECIES), SPECIES])
for(i in 1:length(species.names)){ 
ts_season_count <- rbms::ts_monit_count_site(ts_season_visit, LUBMS_count, sp = species.names[i])
ts_flight_curve <- rbms::flight_curve(ts_season_count,
                                      NbrSample = 500,  
                                      MinVisit = 3,    
                                      MinOccur = 2,   
                                      MinNbrSite = 5,  
                                      MaxTrial = 4,    
                                      GamFamily = "nb", 
                                      SpeedGam = FALSE, 
                                      CompltSeason = TRUE, 
                                      SelectYear = NULL,   
                                      TimeUnit = "w",     
                                      MultiVisit = 'mean', 
                                      weekday = 3,        
                                      KeepModel = TRUE)}

Here is what I get for P. c-album:
Polygonia c album_ts_season_count.csv
As you can see, count = NA in week 16 for 2017.
Polygonia c album_pheno.csv
Here in the fitted counts, I have a sum of 6814 counts in week 16 for 2017.
And it gives me this curve:

I don't understand how I can get this huge peak in week 16 without having any real count? Maybe I am doing something wrong... Do you know where this issue could come from?
Thank you very much in advance for your help!
Sarah

How to plot(flight curve) ignoring the years with no flight curve available?

Dear Reto, @RetoSchmucki

In https://retoschmucki.github.io/rbms/articles/Get_Started_1.html, I am trying to run the very last part of the script to plot the annual flight curves in one graph. It works well for species with available flight curve in each year. However, for species with at least one year without an available flight curve, it returns the error message:
"Error in plot.window(...) : need finite 'ylim' values"
I suppose it is because NM = NA for the years with no flight curve. How could I adapt the script so that it can still plot the curves, just "ignoring" the years with no flight curve available ? The result would be a plot showing only the available flight curves.

I did a few tries with the "if"/"next" or "else" commands but I'm a beginner for that.

Thank you very much for your help!

Best wishes,

Sarah

Add functionality for regional collation

Compute collation for regional subsets as requested by @Dafneram

document data

add documentation for the data m_visit, m_count, metzger_v3_europe

cannot install rbms

Hello! I have R version 4.04 and get a message "package ‘rbms’ is not available for this version of R".

How can I get rbms?

single species flight curves

Hello again,

I've been able to generate a flight curve thanks to your excellent vignette. However, I'm not sure the analysis is really suitable for my data. I have single species phenology data (flower abundances on discrete visit dates throughout the season). I simply coded all of the data as belonging to one species so that I could use your package, but now I'm struggling to understand what the relative abundance means considering there is just 1 species.

Is it still appropriate, and if so what does relative abundance really mean for my example?

Is it possible to plot flight curves for separate groups of sites (for example groups of sites belonging to different regions), where each line on the graph is a region, just as you do for different years?

code after update

Hello,

Thank you so much for implementing my suggestion of month labels on the X axis!

I am trying to plot my curves after the most recent update to your package and having a bit of trouble; hoping you can advise.

Here is my code

#now set up time series
ts_date <- rbms::ts_dwmy_table(InitYear = 2020, LastYear = 2020, WeekDay1 = 'monday')

ts_season_all <- rbms::ts_monit_season(ts_date, StartMonth = 6, EndMonth = 10, StartDay = 02, EndDay = 21, CompltSeason = TRUE, Anchor = TRUE, AnchorLength = 2, AnchorLag = 2, TimeUnit = 'w')

#  We have changed the order of the input arguments, with ts_seaon now in first place, before m_visit.
ts_season_visit_all_fruits_CB <- rbms::ts_monit_site(ts_season_all,m_visit_region_all_fruits_CB)
ts_season_count_all_fruits_CB <- rbms::ts_monit_count_site(ts_season_visit_all_fruits_CB, m_count_region_all_fruits_CB, sp = 2)
ts_flight_curve_all_fruits_CB <- rbms::flight_curve(ts_season_count_all_fruits_CB, NbrSample = 100, MinVisit = 2, MinOccur = 0, MinNbrSite = 5, MaxTrial = 3, GamFamily = 'quasipoisson', SpeedGam = FALSE, CompltSeason = TRUE, SelectYear = NULL, TimeUnit = 'w')
pheno_CB_all_fruits <- ts_flight_curve_all_fruits_CB$pheno
plot(pheno_CB_all[M_YEAR == yr[1] , trimWEEKNO], pheno_CB_all[M_YEAR == yr[1], NM], type = 'l', col="gray0", xlab="Week",ylab="Relative Abundance",ylim = c(0, max(pheno_HR_all[, NM])))

The last line throws an error, says yr not found.

This works:

plot(ts_flight_curve_all_fruits_CB, year = 2020, col = 'dodgerblue4')

So is extracting the phenology part using the second-to-last line of my script now unnecessary?

pheno_CB_all_fruits <- ts_flight_curve_all_fruits_CB$pheno

Thank you!

abundance index function missing

need to code an abundance function with proportion of flight curve sampled.

potentially using different approaches, but including the one used in RegionaGAM package

Faulty -not enough data to comput flight curve

When computing the flight curve, in some case, rbms message that "You have not enough sites with observations for estimating the flight curve for species ...." when enough sites, visits or occurrence are present.

ts_flight_curve <- rbms::flight_curve(ts_season_count,
                                      NbrSample = 500,
                                      MinVisit = 5,
                                      MinOccur = 2,
                                      MinNbrSite = 1,
                                      MaxTrial = 4,
                                      GamFamily = "nb",
                                      SpeedGam = FALSE,
                                      CompltSeason = TRUE,
                                      SelectYear = NULL,
                                      TimeUnit = "w")

This was observed in some species from Catalonia BMS but is likely to have occurred elsewhere.

Reported by A Ubach

Select Base year for Collated index

Here is a solution for selecting the reference year to report trends, instead of using the mean of the series.

After bootstrap sampling in Vignette 2....

of transect monitored

co_index_b <- co_index[COL_INDEX > 0.0001 & COL_INDEX < 100000, ]
co_index_logInd <- co_index_b[BOOTi == 0, .(M_YEAR, COL_INDEX)][, .(logInd = log(COL_INDEX) / log(10)), by = M_YEAR][, mean_logInd := mean(logInd)]

set base year

base_year <- 2012
co_index_logInd$base_year_logInd <- co_index_logInd[M_YEAR == base_year, logInd]

merge the mean log index with the full bootstrap dataset

setkey(co_index_logInd, M_YEAR); setkey(co_index_b, M_YEAR)
co_index_b <- merge(co_index_b, co_index_logInd, all.x = TRUE)

compute the log collated index for each bootstap samples

setkey(co_index_b, BOOTi, M_YEAR)
co_index_b[ , boot_logInd := log(COL_INDEX)/log(10)]

compute the metric used for the graph of the Collated Log-Index with base year (observed, bootstap sampel, credible confidence interval, linear trend)

b1 <- data.table(M_YEAR = co_index_b$M_YEAR, LCI = 2 + co_index_b$boot_logInd - co_index_b$base_year_logInd)
b2 <- data.table(M_YEAR = co_index_b[BOOTi == 0, M_YEAR], LCI = 2 + co_index_b[BOOTi == 0, logInd] - co_index_b[BOOTi == 0, base_year_logInd])
b5 <- b1[co_index_b$BOOTi != 0, quantile(LCI, 0.025, na.rm = TRUE), by = M_YEAR]
b6 <- b1[co_index_b$BOOTi != 0, quantile(LCI, 0.975, na.rm = TRUE), by = M_YEAR]

output data

LCI_1 <- merge(b2, b5,  by = "M_YEAR")
LCI_out <- merge(LCI_1, b6, by = "M_YEAR")
LCI_out$SPECIES <- p_Speciess
colnames(LCI_out) <- c("YEAR", "LCI", "Q.025", "Q.975", "SPECIES")