Git Product home page Git Product logo

fgeo.biomass's People

Contributors

gonzalezeb avatar maurolepore avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fgeo.biomass's Issues

Simplify README.Rmd

Keep only the essential to remove noise and make it easier for reader to understand what's going on, try the code, and give feedback.

Interface enhancements

Enhancements

This section shows pseudo-code: Code that doesn't actually run but shows what it would look like if it did work.

  • New function to automatically add biomass to a census dataframe.
census %>% 
  add_species(species) %>% 
  add_biomass()

# Or
census_table %>% 
  add_biomass(sp = species)
  • New function to automatically add equations to a census dataframe.
census %>% 
  add_species(species) %>% 
  add_equations()
  • New function to replace specific equations.
census %>% 
  add_species(species) %>% 
  add_equations()
  replace_equations(
    eqn_id = c("abcd", "efgh"),
    eqn = c("2.0394 * (dbh^2.5715)", "2.0394 * (dbh^2.5715)")
  )

Basic support for dbh-specific equations (other more specific issues follow this up)

@gonzalezeb,

To support dbh-specific equations the code needs to compare dbh values in the data with the values of dbh_min_cm and dbh_max_cm.

What should the code do with missing values if dbh_min_cm and dbh_max_cm?

library(tidyverse)
#> Warning: package 'purrr' was built under R version 3.5.3
library(allodb)

all <- equations %>% 
  allodb::set_type() %>% 
  select(equation_id, equation_form, matches("dbh.*_cm$"))

setdiff(all, na.omit(all))
#> # A tibble: 25 x 4
#>    equation_id equation_form                    dbh_min_cm dbh_max_cm
#>    <chr>       <chr>                                 <dbl>      <dbl>
#>  1 b45a32      exp(a+b*log(dbh))                       1.5         NA
#>  2 c8c6f1      exp(a+b*log(dbh))                       1.5         NA
#>  3 785080      exp(a+(b*log(dbh)))*645.704*1.05        1.5         NA
#>  4 c94845      a+b*BA                                 NA           NA
#>  5 870336      exp(a+b*log(dbh))                       1.5         NA
#>  6 748d94      exp(a+b*log(dbh))                       1.5         NA
#>  7 5a774d      exp(a+(b*log(dbh)))*419.814*1.22        1.5         NA
#>  8 fc521f      exp(a+(b*(log(pi*dbh))))               NA           NA
#>  9 e44bb9      exp(a+b*(dbh/(dbh+c)))                 NA           40
#> 10 7b7468      exp(a+b*(dbh/(dbh+c)))                 NA           50
#> # ... with 15 more rows

I suggest replacing NA with 0 in dbh_min_cm and with +Inf in dbh_max_cm. I can do it with code -- no need to fix the .csv database unless you want.

# I'd replace missing values of `dbh_min_cm` should be replaced with 0

# Bad
dbh <- 1
dbh_min_cm <- NA
dbh >= dbh_min_cm
#> [1] NA

# Good
dbh <- 1
dbh_min_cm <- 0
dbh >= dbh_min_cm
#> [1] TRUE



# I'd replace missing values of `dbh_max_cm` should be replaced with +Inf

# Bad
dbh <- 1
dbh_max_cm <- NA
dbh >= dbh_max_cm
#> [1] NA

# Good
dbh <- 1
dbh_max_cm <- +Inf
dbh >= dbh_max_cm
#> [1] FALSE

Created on 2019-03-19 by the reprex package (v0.2.1)

include failing ecuations and evaluate them safely()

Instead of excluding failing equations, evaluate with purrr::safely() and missing default. The errors could be printed as warnings.

This will make it easier to understand why some boimass values can't be evaluated (#23).

Evaluate allodb equations at dbh 1, 50, and 100

Relates to ropensci/allodb#16

source

Right now/ to start (and maybe this is all we'll ever want), it may be worth writing a very simple script where each equation will be evaluated at ~3 dbh values (e.g., 1 cm, 50 cm, 100 cm; counting only those within range of the equation's DBH limits) and equations flagged if they don't fall within a range that we consider reasonable for that size (@gonzalezeb could provide guidance on those threshholds).

Warn (once) if dead trees are detected in the census data

@gonzalezeb and @teixeirak,

  • Should we warn if calclating the biomass for dead trees?

Some dead trees in SCBI have non-missing dbh values. Am I right in thinking that some other ForestGEO plots do not record dbh of dead trees?

library(tidyverse)

fgeo.biomass::scbi_tree1 %>% 
  filter(status != "A") %>%
  select(matches("status"), dbh) %>% 
  as_tibble()
#> # A tibble: 10,228 x 3
#>    DFstatus status   dbh
#>    <chr>    <chr>  <dbl>
#>  1 dead     D       42.6
#>  2 dead     D       37  
#>  3 dead     D      318  
#>  4 dead     D      179. 
#>  5 dead     D      203. 
#>  6 dead     D      137. 
#>  7 dead     D       55  
#>  8 dead     D      164. 
#>  9 dead     D      242. 
#> 10 dead     D      238. 
#> # ... with 10,218 more rows

fgeo.biomass::scbi_tree1 %>% 
  mutate(missing_dbh = if_else(is.na(dbh), TRUE, FALSE)) %>% 
  select(matches("status"), missing_dbh) %>%
  arrange(status, DFstatus) %>% 
  unique()
#> # A tibble: 7 x 3
#>   DFstatus  status missing_dbh
#>   <chr>     <chr>  <lgl>      
#> 1 alive     A      FALSE      
#> 2 alive     A      TRUE       
#> 3 lost_stem A      FALSE      
#> 4 dead      D      FALSE      
#> 5 dead      D      TRUE       
#> 6 missing   M      TRUE       
#> 7 prior     P      TRUE

Created on 2019-03-14 by the reprex package (v0.2.1)

Integrate allodb and bmss

@gonzalezeb,

Where in the table should the code look for the parameters in the column equation?

It is clear that DBH is a measurement that the user must provide for each stem. But is it not clear where the other parameters come from. Should we give them in the equations table? Should the user get them from somewhere else and feed them into our code?

For example, where should the code get a from? Or b, or d? Also, is there a lookup table to know what each of those parameters mean?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(allodb)

head(equations$equation)
#> [1] "a*(DBH^2)^b"       "a*(DBH^2)^b"       "a+b*DBH+c*(DBH^d)"
#> [4] "a+b*DBH+c*(DBH^d)" "a+b*DBH+c*(DBH^d)" "a+b*DBH+c*(DBH^d)"
glimpse(equations)
#> Observations: 421
#> Variables: 23
#> $ equation_id                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ biomass_component            <chr> "Stem and branches (live)", "Stem...
#> $ equation                     <chr> "a*(DBH^2)^b", "a*(DBH^2)^b", "a+...
#> $ allometry_specificity        <chr> "Species", "Species", "Species", ...
#> $ development_species          <chr> NA, NA, NA, "Ulmus americana", NA...
#> $ geographic_area              <chr> "North Carolina, Georgia", "North...
#> $ dbh_min_cm                   <chr> "14.22", "29.46", "2.5", "2.5", "...
#> $ dbh_max_cm                   <chr> "25.91", "41.66", "40", "40", "55...
#> $ n_trees                      <int> 9, 9, NA, NA, NA, NA, NA, NA, NA,...
#> $ dbh_units_original           <chr> "in", "in", "cm", "cm", "mm", "mm...
#> $ biomass_units_original       <chr> "lb", "lb", "kg", "kg", "kg", "kg...
#> $ allometry_development_method <chr> "harvest", "harvest", "harvest", ...
#> $ model_parameters             <chr> "DBH", "DBH", "DBH", "DBH", "DBH"...
#> $ regression_model             <chr> "linear_multiple", "linear_multip...
#> $ other_equations_tested       <chr> "yes", "yes", NA, NA, NA, NA, NA,...
#> $ log_biomass                  <chr> "10", "10", NA, NA, NA, NA, NA, N...
#> $ bias_corrected               <chr> "yes", "yes", "no", "no", "no", "...
#> $ bias_correction_factor       <chr> "included in model", "included in...
#> $ notes_fitting_model          <chr> "Regression equations were develo...
#> $ original_data_availability   <chr> "1", "1", NA, NA, NA, NA, NA, NA,...
#> $ notes_to_consider            <chr> NA, NA, NA, NA, NA, NA, "DBA = ba...
#> $ warning                      <chr> NA, NA, NA, NA, NA, NA, NA, NA, N...
#> $ ref_id                       <chr> NA, NA, NA, NA, NA, NA, NA, NA, N...

Cache data

Just recording a thought. It may be useful to cache queries to the database, and use an object oriented style of programming. On the functional end, maybe via closures; on the object-oriented end, maybe R6.

Guess dbh units

If min dbh is ~ 10, then units are [mm].
If min dbh is ~ 1, then units are [cm].

Inform join messages

Give more feedback about how tables are being joint. These messages may have been suppressed to avoid overloading users with messages. But at this stage those messages may help us discuss the logic of the code. For example, recently we thought that the final equations table resulted from joining only by species and not site. This was clear in the source code but was no feedback so it took some time to figure out and confused all of us in the discussion.

installing fgeo.biomass error: https://forestgeo.github.io/drat/ not found

@maurolepore , in the DESCRIPTION file, there is:

Additional_repositories: https://forestgeo.github.io/drat/

But that website does not exist (any more?)

Can you explain why that line is there and if it is safe to delete it?
Currently the command

devtools::install_github("forestgeo/fgeo.biomass")

gives an error:

> devtools::install_github("forestgeo/fgeo.biomass")
Downloading GitHub repo forestgeo/fgeo.biomass@master
These packages have more recent versions available.
Which would you like to update?

 1: All                              
 2: CRAN packages only               
 3: None                             
 4: usethis (1.5.0  -> 1.5.1 ) [CRAN]
 5: raster  (2.9-5  -> 3.0-2 ) [CRAN]
 6: Rcpp    (1.0.1  -> 1.0.2 ) [CRAN]
 7: digest  (0.6.19 -> 0.6.20) [CRAN]
 8: hms     (0.5.0  -> 0.5.1 ) [CRAN]
 9: clipr   (0.6.0  -> 0.7.0 ) [CRAN]
10: curl    (3.3    -> 4.1   ) [CRAN]
11: whisker (0.3-2  -> 0.4   ) [CRAN]
12: httr    (1.4.0  -> 1.4.1 ) [CRAN]
13: openssl (1.4    -> 1.4.1 ) [CRAN]
14: sys     (3.2    -> 3.3   ) [CRAN]

Enter one or more numbers, or an empty line to skip updates:

Installing 6 packages: BIOMASS, fgeo.tool, formatR, measurements, minpack.lm, rappdirs
Installing packages intoC:/Users/HerrmannV/Documents/R/win-library/3.6’
(aslibis unspecified)
Error: Failed to install 'fgeo.biomass' from GitHub:
  (converted from warning) unable to access index for repository https://forestgeo.github.io/drat/src/contrib:
  cannot open URL 'https://forestgeo.github.io/drat/src/contrib/PACKAGES'
In addition: Warning messages:
1: In untar2(tarfile, files, list, exdir) :
  skipping pax global extended headers
2: In untar2(tarfile, files, list, exdir) :
  skipping pax global extended headers
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 14393)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        ps_1.3.0          prettyunits_1.0.2 rprojroot_1.3-2   withr_2.1.2       digest_0.6.19    
 [7] crayon_1.3.4      assertthat_0.2.1  R6_2.4.0          backports_1.1.4   magrittr_1.5      rlang_0.4.0      
[13] cli_1.1.0         curl_3.3          fs_1.3.1          remotes_2.1.0     rstudioapi_0.10   callr_3.2.0      
[19] desc_1.2.0        devtools_2.0.2    tools_3.6.0       glue_1.3.1        pkgload_1.0.2     compiler_3.6.0   
[25] processx_3.3.1    pkgbuild_1.0.3    sessioninfo_1.1.1 memoise_1.1.0     usethis_1.5.0 

Let users define an unknown species to be matched with a generic equation

(Relates to ropensci/allodb#72)

Different sites use different way to encode unknown species. Thus users should be able to specify how this is done for their specific datasets

library(tidyverse)

scbi_species %>% 
  filter(tolower(Latin) == "unidentified unk") %>% 
  select(1:5)
#> Error in eval(lhs, parent, parent): object 'scbi_species' not found

bciex::bci_species %>% 
  filter(grepl("unident", tolower(Latin))) %>% 
  select(1:5)
#> # A tibble: 48 x 5
#>    sp     Latin                   Genus        Species    Family 
#>    <chr>  <chr>                   <chr>        <chr>      <chr>  
#>  1 lau12  Unidentified pubescente Unidentified pubescente Unknown
#>  2 un2_sh Unidentified (sherman)  Unidentified (sherman)  Unknown
#>  3 cel1   Unidentified sp.1       Unidentified sp.1       Unknown
#>  4 mal1   Unidentified sp.1       Unidentified sp.1       Unknown
#>  5 myt13  Unidentified sp.13      Unidentified sp.13      Unknown
#>  6 ann2   Unidentified sp.2       Unidentified sp.2       Unknown
#>  7 nyc2   Unidentified sp.2       Unidentified sp.2       Unknown
#>  8 rub7   Unidentified sp.7       Unidentified sp.7       Unknown
#>  9 rub8   Unidentified sp.8       Unidentified sp.8       Unknown
#> 10 uniden Unidentified species    Unidentified species    Unknown
#> # ... with 38 more rows

Created on 2019-03-27 by the reprex package (v0.2.1)

Warn if dropping potentially useful rows

In the interest of building a conservative, minimum viable product, we currently drop potentially useful rows. This is because we still lack some features planned for later. But this is not obvious and alpha users (e.g. Maxime) may be surprised to get meaningless biomass results.

The code should warn when dropping potentially useful rows. That should immediately add value by providing useful feedback to alpha users. And in the long run it also adds value because later we can write a unit test to check that this warning is now thrown once the code includes the features that we still lack.

Unnest the output of allo_find()

Relates to #17. Considering that I seem to have been wrong in how to use the different equation types, the output as a nested dataframe may be more of a problem than a solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.