forestgeo / fgeo.biomass Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 4.0 4.26 MB

Calculate biomass with allometric equations from the allodb package and ForestGEO data

Home Page: https://forestgeo.github.io/fgeo.biomass

License: GNU General Public License v3.0

R 100.00%

fgeo.biomass's People

Contributors

Stargazers

Watchers

Forkers

gonzalezeb helixcn valentineherr fdbesanto2

fgeo.biomass's Issues

Remove or hide `allo_order()`

allo_order() now seems useless. It may be removed and restored later if it ends up being useful.

Simplify README.Rmd

Keep only the essential to remove noise and make it easier for reader to understand what's going on, try the code, and give feedback.

Enhancements

This section shows pseudo-code: Code that doesn't actually run but shows what it would look like if it did work.

New function to automatically add biomass to a census dataframe.

census %>% 
  add_species(species) %>% 
  add_biomass()

# Or
census_table %>% 
  add_biomass(sp = species)

New function to automatically add equations to a census dataframe.

census %>% 
  add_species(species) %>% 
  add_equations()

New function to replace specific equations.

census %>% 
  add_species(species) %>% 
  add_equations()
  replace_equations(
    eqn_id = c("abcd", "efgh"),
    eqn = c("2.0394 * (dbh^2.5715)", "2.0394 * (dbh^2.5715)")
  )

From the BIOMASS package, add ability to propagate errors

as_eqn() should create an equation_id on the fly -- not given by the user

Basic support for dbh-specific equations (other more specific issues follow this up)

@gonzalezeb,

To support dbh-specific equations the code needs to compare dbh values in the data with the values of dbh_min_cm and dbh_max_cm.

What should the code do with missing values if dbh_min_cm and dbh_max_cm?

library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(allodb)

all <- equations %>% 
  allodb::set_type() %>% 
  select(equation_id, equation_form, matches("dbh.*_cm$"))

setdiff(all, na.omit(all))
#> # A tibble: 25 x 4
#>    equation_id equation_form                    dbh_min_cm dbh_max_cm
#>    <chr>       <chr>                                 <dbl>      <dbl>
#>  1 b45a32      exp(a+b*log(dbh))                       1.5         NA
#>  2 c8c6f1      exp(a+b*log(dbh))                       1.5         NA
#>  3 785080      exp(a+(b*log(dbh)))*645.704*1.05        1.5         NA
#>  4 c94845      a+b*BA                                 NA           NA
#>  5 870336      exp(a+b*log(dbh))                       1.5         NA
#>  6 748d94      exp(a+b*log(dbh))                       1.5         NA
#>  7 5a774d      exp(a+(b*log(dbh)))*419.814*1.22        1.5         NA
#>  8 fc521f      exp(a+(b*(log(pi*dbh))))               NA           NA
#>  9 e44bb9      exp(a+b*(dbh/(dbh+c)))                 NA           40
#> 10 7b7468      exp(a+b*(dbh/(dbh+c)))                 NA           50
#> # ... with 15 more rows

I suggest replacing NA with 0 in dbh_min_cm and with +Inf in dbh_max_cm. I can do it with code -- no need to fix the .csv database unless you want.

# I'd replace missing values of `dbh_min_cm` should be replaced with 0

# Bad
dbh <- 1
dbh_min_cm <- NA
dbh >= dbh_min_cm
#> [1] NA

# Good
dbh <- 1
dbh_min_cm <- 0
dbh >= dbh_min_cm
#> [1] TRUE



# I'd replace missing values of `dbh_max_cm` should be replaced with +Inf

# Bad
dbh <- 1
dbh_max_cm <- NA
dbh >= dbh_max_cm
#> [1] NA

# Good
dbh <- 1
dbh_max_cm <- +Inf
dbh >= dbh_max_cm
#> [1] FALSE

Created on 2019-03-19 by the reprex package (v0.2.1)

include failing ecuations and evaluate them safely()

Instead of excluding failing equations, evaluate with purrr::safely() and missing default. The errors could be printed as warnings.

This will make it easier to understand why some boimass values can't be evaluated (#23).

Move `convert_units()` out of `allo_find()`.

This responds to #23 (comment) and follows a suggestion by Krista.

Evaluate allodb equations at dbh 1, 50, and 100

Relates to ropensci/allodb#16

source

Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds).

Support for custom_eqn

I dissabled custom_eqn because of weird errors. This is not crucial right now and it's taking me to long to debug. Will come back to this later.

Relates to ropensci/allodb#60

Warn (once) if dead trees are detected in the census data

@gonzalezeb and @teixeirak,

Should we warn if calclating the biomass for dead trees?

Some dead trees in SCBI have non-missing dbh values. Am I right in thinking that some other ForestGEO plots do not record dbh of dead trees?

library(tidyverse)

fgeo.biomass::scbi_tree1 %>% 
  filter(status != "A") %>%
  select(matches("status"), dbh) %>% 
  as_tibble()
#> # A tibble: 10,228 x 3
#>    DFstatus status   dbh
#>    <chr>    <chr>  <dbl>
#>  1 dead     D       42.6
#>  2 dead     D       37  
#>  3 dead     D      318  
#>  4 dead     D      179. 
#>  5 dead     D      203. 
#>  6 dead     D      137. 
#>  7 dead     D       55  
#>  8 dead     D      164. 
#>  9 dead     D      242. 
#> 10 dead     D      238. 
#> # ... with 10,218 more rows

fgeo.biomass::scbi_tree1 %>% 
  mutate(missing_dbh = if_else(is.na(dbh), TRUE, FALSE)) %>% 
  select(matches("status"), missing_dbh) %>%
  arrange(status, DFstatus) %>% 
  unique()
#> # A tibble: 7 x 3
#>   DFstatus  status missing_dbh
#>   <chr>     <chr>  <lgl>      
#> 1 alive     A      FALSE      
#> 2 alive     A      TRUE       
#> 3 lost_stem A      FALSE      
#> 4 dead      D      FALSE      
#> 5 dead      D      TRUE       
#> 6 missing   M      TRUE       
#> 7 prior     P      TRUE

Created on 2019-03-14 by the reprex package (v0.2.1)

Integrate allodb and bmss

@gonzalezeb,

Where in the table should the code look for the parameters in the column equation?

It is clear that DBH is a measurement that the user must provide for each stem. But is it not clear where the other parameters come from. Should we give them in the equations table? Should the user get them from somewhere else and feed them into our code?

For example, where should the code get a from? Or b, or d? Also, is there a lookup table to know what each of those parameters mean?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(allodb)

head(equations$equation)
#> [1] "a*(DBH^2)^b"       "a*(DBH^2)^b"       "a+b*DBH+c*(DBH^d)"
#> [4] "a+b*DBH+c*(DBH^d)" "a+b*DBH+c*(DBH^d)" "a+b*DBH+c*(DBH^d)"
glimpse(equations)
#> Observations: 421
#> Variables: 23
#> $ equation_id                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ biomass_component            <chr> "Stem and branches (live)", "Stem...
#> $ equation                     <chr> "a*(DBH^2)^b", "a*(DBH^2)^b", "a+...
#> $ allometry_specificity        <chr> "Species", "Species", "Species", ...
#> $ development_species          <chr> NA, NA, NA, "Ulmus americana", NA...
#> $ geographic_area              <chr> "North Carolina, Georgia", "North...
#> $ dbh_min_cm                   <chr> "14.22", "29.46", "2.5", "2.5", "...
#> $ dbh_max_cm                   <chr> "25.91", "41.66", "40", "40", "55...
#> $ n_trees                      <int> 9, 9, NA, NA, NA, NA, NA, NA, NA,...
#> $ dbh_units_original           <chr> "in", "in", "cm", "cm", "mm", "mm...
#> $ biomass_units_original       <chr> "lb", "lb", "kg", "kg", "kg", "kg...
#> $ allometry_development_method <chr> "harvest", "harvest", "harvest", ...
#> $ model_parameters             <chr> "DBH", "DBH", "DBH", "DBH", "DBH"...
#> $ regression_model             <chr> "linear_multiple", "linear_multip...
#> $ other_equations_tested       <chr> "yes", "yes", NA, NA, NA, NA, NA,...
#> $ log_biomass                  <chr> "10", "10", NA, NA, NA, NA, NA, N...
#> $ bias_corrected               <chr> "yes", "yes", "no", "no", "no", "...
#> $ bias_correction_factor       <chr> "included in model", "included in...
#> $ notes_fitting_model          <chr> "Regression equations were develo...
#> $ original_data_availability   <chr> "1", "1", NA, NA, NA, NA, NA, NA,...
#> $ notes_to_consider            <chr> NA, NA, NA, NA, NA, NA, "DBA = ba...
#> $ warning                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ ref_id                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, N...

Flowchart overview

add_biomass()

add_species() and add_equations()

Understand why some species have missing biomass

Match species names given as Genus sp. (as in Homo sp. instead of Homo sapiens)

Cache data

Just recording a thought. It may be useful to cache queries to the database, and use an object oriented style of programming. On the functional end, maybe via closures; on the object-oriented end, maybe R6.

Ensure that allo_find() matches equations by site (not only species)

Guess dbh units

If min dbh is ~ 10, then units are [mm].
If min dbh is ~ 1, then units are [cm].

Inform join messages

Give more feedback about how tables are being joint. These messages may have been suppressed to avoid overloading users with messages. But at this stage those messages may help us discuss the logic of the code. For example, recently we thought that the final equations table resulted from joining only by species and not site. This was clear in the source code but was no feedback so it took some time to figure out and confused all of us in the discussion.

Temporarily exclude generic equations until we can support them

This should improve the accuracy of the bioimass estimates. Right now, we don't support generic equations but those equations are being summed with expert equations.

installing fgeo.biomass error: https://forestgeo.github.io/drat/ not found

@maurolepore , in the DESCRIPTION file, there is:

Additional_repositories: https://forestgeo.github.io/drat/

But that website does not exist (any more?)

Can you explain why that line is there and if it is safe to delete it?
Currently the command

devtools::install_github("forestgeo/fgeo.biomass")

gives an error:

> devtools::install_github("forestgeo/fgeo.biomass")
Downloading GitHub repo forestgeo/fgeo.biomass@master
These packages have more recent versions available.
Which would you like to update?

 1: All                              
 2: CRAN packages only               
 3: None                             
 4: usethis (1.5.0  -> 1.5.1 ) [CRAN]
 5: raster  (2.9-5  -> 3.0-2 ) [CRAN]
 6: Rcpp    (1.0.1  -> 1.0.2 ) [CRAN]
 7: digest  (0.6.19 -> 0.6.20) [CRAN]
 8: hms     (0.5.0  -> 0.5.1 ) [CRAN]
 9: clipr   (0.6.0  -> 0.7.0 ) [CRAN]
10: curl    (3.3    -> 4.1   ) [CRAN]
11: whisker (0.3-2  -> 0.4   ) [CRAN]
12: httr    (1.4.0  -> 1.4.1 ) [CRAN]
13: openssl (1.4    -> 1.4.1 ) [CRAN]
14: sys     (3.2    -> 3.3   ) [CRAN]

Enter one or more numbers, or an empty line to skip updates:

Installing 6 packages: BIOMASS, fgeo.tool, formatR, measurements, minpack.lm, rappdirs
Installing packages into ‘C:/Users/HerrmannV/Documents/R/win-library/3.6’
(as ‘lib’ is unspecified)
Error: Failed to install 'fgeo.biomass' from GitHub:
  (converted from warning) unable to access index for repository https://forestgeo.github.io/drat/src/contrib:
  cannot open URL 'https://forestgeo.github.io/drat/src/contrib/PACKAGES'
In addition: Warning messages:
1: In untar2(tarfile, files, list, exdir) :
  skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir) :
  skipping pax global extended headers
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        ps_1.3.0          prettyunits_1.0.2 rprojroot_1.3-2   withr_2.1.2       digest_0.6.19    
 [7] crayon_1.3.4      assertthat_0.2.1  R6_2.4.0          backports_1.1.4   magrittr_1.5      rlang_0.4.0      
[13] cli_1.1.0         curl_3.3          fs_1.3.1          remotes_2.1.0     rstudioapi_0.10   callr_3.2.0      
[19] desc_1.2.0        devtools_2.0.2    tools_3.6.0       glue_1.3.1        pkgload_1.0.2     compiler_3.6.0   
[25] processx_3.3.1    pkgbuild_1.0.3    sessioninfo_1.1.1 memoise_1.1.0     usethis_1.5.0

Add support for dbh-specific equaitons

Let users define an unknown species to be matched with a generic equation

(Relates to ropensci/allodb#72)

Different sites use different way to encode unknown species. Thus users should be able to specify how this is done for their specific datasets

library(tidyverse)

scbi_species %>% 
  filter(tolower(Latin) == "unidentified unk") %>% 
  select(1:5)
#> Error in eval(lhs, parent, parent): object 'scbi_species' not found

bciex::bci_species %>% 
  filter(grepl("unident", tolower(Latin))) %>% 
  select(1:5)
#> # A tibble: 48 x 5
#>    sp     Latin                   Genus        Species    Family 
#>    <chr>  <chr>                   <chr>        <chr>      <chr>  
#>  1 lau12  Unidentified pubescente Unidentified pubescente Unknown
#>  2 un2_sh Unidentified (sherman)  Unidentified (sherman)  Unknown
#>  3 cel1   Unidentified sp.1       Unidentified sp.1       Unknown
#>  4 mal1   Unidentified sp.1       Unidentified sp.1       Unknown
#>  5 myt13  Unidentified sp.13      Unidentified sp.13      Unknown
#>  6 ann2   Unidentified sp.2       Unidentified sp.2       Unknown
#>  7 nyc2   Unidentified sp.2       Unidentified sp.2       Unknown
#>  8 rub7   Unidentified sp.7       Unidentified sp.7       Unknown
#>  9 rub8   Unidentified sp.8       Unidentified sp.8       Unknown
#> 10 uniden Unidentified species    Unidentified species    Unknown
#> # ... with 38 more rows

Created on 2019-03-27 by the reprex package (v0.2.1)

Convert `site = any temperate NA` to `<current site>`

Relates to ropensci/allodb#72

Add full URLs to the website location of Information files

See forestgeo/fgeo.plot#57

Explore the biomass functions of the "CTFS R package"

Add functions for loss and production in table 1 of Kohyama et al. 2019

From: Helene Muller-Landau
Date: Wed, Dec 19, 2018 at 9:54 AM
Subject: improved estimation of biomass fluxes when census intervals vary
To: Mauro Lepore ...

... this new pub by Kohyama et al. ...provides formulas for unbiased estimates of biomass fluxes. The ForestGEO biomass package should incorporate the various functions for loss and production included in table 1.
2019-Kohyama-estimating.AGB.prod.pdf

Warn if dropping potentially useful rows

In the interest of building a conservative, minimum viable product, we currently drop potentially useful rows. This is because we still lack some features planned for later. But this is not obvious and alpha users (e.g. Maxime) may be surprised to get meaningless biomass results.

The code should warn when dropping potentially useful rows. That should immediately add value by providing useful feedback to alpha users. And in the long run it also adds value because later we can write a unit test to check that this warning is now thrown once the code includes the features that we still lack.

Unnest the output of allo_find()

Relates to #17. Considering that I seem to have been wrong in how to use the different equation types, the output as a nested dataframe may be more of a problem than a solution.