forestgeo / fgeo.tool Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 6.0 3.79 MB

[R-package on CRAN] General purpose tools for ForestGEO Packages

Home Page: https://forestgeo.github.io/fgeo.tool

License: Other

R 99.81% Shell 0.19%

fgeo forestgeo tree ecology dynamics tools utils miscelaneas

fgeo.tool's People

Contributors

Stargazers

Watchers

Forkers

ayushranjan1 overstreeth sainirock61 helixcn fdbesanto2 jimhester

fgeo.tool's Issues

Add to Family: filters of `tree_status`: Rename rm_dead_twice

To Make it consistent with

not_dead_twice(x)
tree_not_dead_twice(x)

tree_status(x, status = "not_dead_twice")

Fix byyr_abundance() to count only one individual per Tag not per StemID

Here the output should be 1 -- ont 2 (@).

library(fgeo)
#> -- Attaching packages -------------------------------------------- fgeo 0.0.0.9000 --
#> v bciex           0.0.0.9000     v fgeo.demography 0.0.0.9000
#> v fgeo.abundance  0.0.0.9004     v fgeo.habitat    0.0.0.9006
#> v fgeo.base       0.0.0.9001     v fgeo.map        0.0.0.9204
#> v fgeo.data       0.0.0.9002     v fgeo.tool       0.0.0.9003
#> 

vft <- data.frame(
  StemID = c("1", "2"),
  Tag = c("0001", "0001"),
  PlotName = "p",
  Status = c("alive", "alive"),
  DBH = c(10, 100),
  ExactDate = c("2000-01-01", "2000-01-01"),
  PlotCensusNumber = c(1, 1),
  CensusID = c(1, 1),
  Genus = c("A", "A"),
  SpeciesName = c("a", "a"),
  Family = "f",
  stringsAsFactors = FALSE
)
vft
#>   StemID  Tag PlotName Status DBH  ExactDate PlotCensusNumber CensusID
#> 1      1 0001        p  alive  10 2000-01-01                1        1
#> 2      2 0001        p  alive 100 2000-01-01                1        1
#>   Genus SpeciesName Family
#> 1     A           a      f
#> 2     A           a      f

# First pick the data you want
pick1 <- pick_plotname(vft, "p")
#> Using: p.

pick2 <- drop_dead_trees_by_cns(pick1)
#> Calculating tree-status (from stem `Status`) by `PlotCensusNumber`.
#> Warning: No observation has .status = dead
#>   * Detected values: alive
#> Dropping rows where `Status = dead`.
#> Warning: No observation has .status = dead
#>   * Detected values: alive
pick3 <- pick_dbh_min(pick2, 10)
pick3
#> # A tibble: 2 x 12
#>   StemID Tag   PlotName Status   DBH ExactDate  PlotCensusNumber CensusID
#>   <chr>  <chr> <chr>    <chr>  <dbl> <chr>                 <dbl>    <dbl>
#> 1 1      0001  p        alive     10 2000-01-01                1        1
#> 2 2      0001  p        alive    100 2000-01-01                1        1
#> # ... with 4 more variables: Genus <chr>, SpeciesName <chr>, Family <chr>,
#> #   status_tree <chr>

byyr_abundance(pick3)
#> # A tibble: 1 x 3
#>   species Family `2000`
#>   <chr>   <chr>   <dbl>
#> 1 A a     f           2

Created on 2018-06-20 by the reprex package (v0.2.0).

thank @rick_pack2

replace_all_na <- function(x, filler = 0) {
  replace(x, http://is.na (x), filler)
}
replace_all_na(df, 0)
#>   x y
#> 1 0 2
#> 2 1 0
replace_all_na(df, "missing")
#>         x       y
#> 1 missing       2
#> 2       1 missing

FastField Forms spreadsheet export - missing required sheets

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/, https://community.rstudio.com/, https://github.com/forestgeo/forum/ or email Mauro Lepore at [email protected].

Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.

Brief description of the problem
When using FastField Forms for a re-census we've discovered or had submissions missing one or multiple of the required sheets: 'new_secondary_stems', 'recruits', and 'original_stems'. If no recruits are recorded in a quadrat, no sheet is exported. Likewise, post-submission if a recruit is found, then added to a 'new' file by itself, the 'new_secondary_stems', and 'original_stems' sheets are missing. This of course breaks the code and the files cannot be compiled. Any solution to this, or should I reach out to FastField Forms/add the sheets ourselves?

> ######## Compile the separate worksheets from the quadrat workbooks into new excel files

> files.directory <- "C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/uploads"

> files.export <- "C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/exports/test"
> xl_sheets_to_xl(files.directory, files.export, first_census = FALSE)
Error: Data should contain these sheets:
original_stems, new_secondary_stems, recruits, root
* Missing sheets: recruits
In addition: Warning messages:
1: `new_secondary_stems` has cero rows. 
2: Filling every cero-row dataframe with NAs (new_secondary_stems). 
3: `new_secondary_stems` has cero rows. 
4: Filling every cero-row dataframe with NAs (new_secondary_stems).

In xl_sheets_to_csv() allow first census.

map_tag() fails with numeric tags

Write helpers to categorize data

From @maurolepore on August 31, 2017 16:19

Gabriel proposed to develop a friendly way to categorize (cut) numeric variables. Important ones include:

from dbh to size category.
from gx and gy to quadrat: see add_quad().

suppressPackageStartupMessages(library(fgeo))
x <- tibble::tibble(gx = c(0, 50, 999.9, 1000), gy = gx/2)
add_quad(x)
#> Gessing: plotdim = c(1000, 500)
#>   * If guess is wrong, provide the correct argument `plotdim`
#> # A tibble: 4 x 3
#>      gx    gy quad 
#>   <dbl> <dbl> <chr>
#> 1    0     0  0101 
#> 2   50    25  0302 
#> 3 1000.  500. 5025 
#> 4 1000   500  NANA
add_quad(x, start = 0)
#> Gessing: plotdim = c(1000, 500)
#>   * If guess is wrong, provide the correct argument `plotdim`
#> # A tibble: 4 x 3
#>      gx    gy quad 
#>   <dbl> <dbl> <chr>
#> 1    0     0  0000 
#> 2   50    25  0201 
#> 3 1000.  500. 4924 
#> 4 1000   500  NANA

Created on 2018-07-02 by the reprex package (v0.2.0).

Copied from original issue: forestgeo/fgeo.abundance#46

document abundance_tree() not count_distinc() or both separately

I want to index abundance.

to_wide() to_long() may be a useful generic

It would spread or gather whatever makes sense for each fgeo class.

In csv_to_df() check that dataframes are `rowbind()`able.

This may be removed via #61

Add to Family: filters of tree status; and Family of stem status

tree_status(x, status = "not_dead")
tree_not_dead(x)
tree_status(x, status = "*")
tree_*(x)

stem_status(x, status = "not_dead")
stem_not_dead(x)
stem_status(x, status = "*")
stem_*(x)

as_fgeo(): wrap rename() to privide API with the format as_fgeo(expect_nm1 = NULL, expect_nm2 = NULL, ...)

It could output a message of a table showing what renames were made.

  old_name      new_name
-----------------------------
"old name"    "new name"

In add_var() don't change original names.

reexport the most important deplyr verbs

Fix warnings when reading ViewFullTable

The input is a comma separated file. I get no warning if the file is read with delim = ", ".

https://bookdown.org/fgeocomm/tool36/

Rename topic filter_status to drop_dead for discovery with fgeo_help()

Write function to read database output

Wrap as read_vft()

(Similar to read_taxa()).

read_tsv(
  file, 
  col_types = fgeo.tool::type_taxa(), 
  na = c("", "NA", "NULL")
)

In restructure elev acknowledge David

Add tools for screening

Helena apparently has code that may be useful as a guide.

Rename restructure_elev to reflect uncertainty

This relates to #16

Consider:
restructure_if_need()
restructure_if_should()
restructure_if_needed()

may_restructure()
restructure_maybe()
restructure_possibly()

Add add_qxqy()

New data structure has QX and QY variables

Update fgeo_pkgs in fgeo.tool::fgeo_package_deps()

generalize warn_duplicated_treeid() to flag_duplicated_var()

generalize warn_duplicated_treeid to flag_duplicated_var(flag) (function factory):
- flag can be message(), warning(), stop()
- var could be any .data$var, maybe use tidyeval
- var is used to group by and to summarize

Add nms_*() functions to handle names.

Consider that mpala must be padded with 5 digits -- not 4

mpala is the only dataset that has so many quadrats that it needs to be padded with 5 instead of 4 "0".

Write an example showing that if the data is subsetted before it is passed to plot_tag_status_by_subquadrat() the order of maps may be different than expected. So show how to arrange the data before subsetting it.

Update the variables dictionaly Create a google worksheet table names of different fgeo tables

Get the names of all tables
Import the available variable's definitions
Merge the available definitions with the names of each table
Push all data into a Google worksheet data_dictionary
Ask database team to fill the gaps
Create an app to quickly inspect variable names? Or it could just be the google worksheet

Clean flag_multiple() and multiple_var().

DRY. Some funcitons seem duplicated with fgeo.tool.

flat_multiple() is being used but flag_if() and multiple_var() are not. See if and which is a better alternative to flag_multiple() and similar functions in fgeo.tool. Then clean the rest.

Handle lowering and restoring names via attributes

Attach old names to the object in question, via attributes.

Add collapse_censusid()

Consider removing keep_alive_stem() et al

Can I replace it with base:::keep_*()?

Pull out to fgeo.dvlp the family of functions for developers

Write a function that stores attribute names_old and works with dplyr

Create tree table from stem table

Pull the code out of rtbl?

Rewrite? E.g.

pick_stem_max_dbh <- function(.data) {
  # Allow input as ViewFullTable, stem and tree table
  old <- names(.data)
  names(.data) <- tolower(old)
  fgeo.base::check_crucial_names(.data, c("treeid", "stemid", "dbh"))
  
  # TODO: Check for unique censusid or group by census id 
  
  # TODO:
  # if (tree_already) {return(stats::setNames(.data, old))}
  
  .data <- tibble::rowid_to_column(.data)
  .data <- dplyr::group_by(.data, .data$treeid)
  .data <- dplyr::arrange(.data, dplyr::desc(.data$dbh))
  .data <- dplyr::ungroup(.data)
  
  out <- dplyr::distinct(.data, .data$treeid, .keep_all = TRUE)
  # Recover original order
  out <- dplyr::select(dplyr::arrange(out, .data$rowid), -.data$rowid)
  
  stats::setNames(out, old)
}

df <- tibble::tibble(
  treeID = c(1, 1, 1, 2, 2, 2),
  stemID = letters[c(1, 2, 3, 1, 2, 3)],
  dbh = c(1, 2, NA, 4, 5, NA)
)
df
#> # A tibble: 6 x 3
#>   treeID stemID   dbh
#>    <dbl> <chr>  <dbl>
#> 1      1 a          1
#> 2      1 b          2
#> 3      1 c         NA
#> 4      2 a          4
#> 5      2 b          5
#> 6      2 c         NA

pick_stem_max_dbh(df)
#> # A tibble: 2 x 3
#>   treeID stemID   dbh
#>    <dbl> <chr>  <dbl>
#> 1      1 b          2
#> 2      2 b          5

Created on 2018-06-28 by the reprex package (v0.2.0).

Remove keep_alive_stem() and keep_alive_tree()

See fgeo.base::keep_status() and friends.

Fix add_vars() to work with PX, PY. Don't use old data as a model.

reexport the main dplyr verbs

Consider reexporting the main verbs and common tools.

pro:

everything is available with (fgeo)
con:
conflicts will show up when loading tidyverse
makes it harder to discover the tidyverse

main verbs

filter
select
arrange
summarize
mutate

also

group_by
tibble
tribble
as_tibble
count
add_count

select2()

select2 <- function(.data, ...) {
  dots <- rlang::list2(...)
  select(.data, !!!text_exprs(dots))
}

text_exprs <- function(...) {
  rlang::parse_exprs(semicolon(...))
}

semicolon <- function(...) {
  paste0(..., collapse = "; ")
}

library(tidyverse)

mtcars <- as.tibble(mtcars)

text_vars <- c("mpg", "am")
select2(mtcars, text_vars)
#> # A tibble: 32 x 2
#>      mpg    am
#>  * <dbl> <dbl>
#>  1  21       1
#>  2  21       1
#>  3  22.8     1
#>  4  21.4     0
#>  5  18.7     0
#>  6  18.1     0
#>  7  14.3     0
#>  8  24.4     0
#>  9  22.8     0
#> 10  19.2     0
#> # ... with 22 more rows

# If multiple strings, separate vars with comma
text_vars <- c("mpg", "am", "carb")
select2(mtcars, text_vars)
#> # A tibble: 32 x 3
#>      mpg    am  carb
#>  * <dbl> <dbl> <dbl>
#>  1  21       1     4
#>  2  21       1     4
#>  3  22.8     1     1
#>  4  21.4     0     1
#>  5  18.7     0     2
#>  6  18.1     0     1
#>  7  14.3     0     4
#>  8  24.4     0     2
#>  9  22.8     0     2
#> 10  19.2     0     4
#> # ... with 22 more rows

# If single string, separate with semicolon
text_vars <- c("mpg; am; carb")
select2(mtcars, text_vars)
#> # A tibble: 32 x 3
#>      mpg    am  carb
#>  * <dbl> <dbl> <dbl>
#>  1  21       1     4
#>  2  21       1     4
#>  3  22.8     1     1
#>  4  21.4     0     1
#>  5  18.7     0     2
#>  6  18.1     0     1
#>  7  14.3     0     4
#>  8  24.4     0     2
#>  9  22.8     0     2
#> 10  19.2     0     4
#> # ... with 22 more rows

# With a single string you can do all you can do with bare
text <- 'mpg:cyl; matches("df"); everything()'
select2(mtcars, text)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows
# Same
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
#>     flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
#>     rep_along, splice
bare <- exprs(mpg:cyl, matches("df"), everything())
select(mtcars, !!!bare)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>  * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ... with 22 more rows

Created on 2018-04-29 by the reprex package (v0.2.0).

Improve by_group() (or groupwise())

# I want to make any funciton work with grouped data. I propose 
# by_group(.data, .f, ...), similar to map(), by applies .f() not to each column
# of .data but to each group -- and to all of it if .data is ungrouped.
# alias: grouply(), groupwise().

# History of this feature including upcomming dplyr::nest_by()
# https://community.rstudio.com/t/is-nest-mutate-map-unnest-really-the-best-alternative-to-dplyr-do/11009/7?u=mauro_lepore

library(tidyverse)

# E.g.
# nest_by_groups(mtcars)
# nest_by_groups(group_by(mtcars, cyl))
nest_by_groups <- function(.data) {
  g <- group_vars(.data)
  .data %>% 
    tibble::as.tibble() %>% 
    tibble::add_column(.nest_id = dplyr::group_indices(.)) %>% 
    dplyr::ungroup() %>%
    tidyr::nest(-.nest_id)
}

by_group <- function(.data, .f, ...) {
  .data %>% 
    nest_by_groups() %>% 
    mutate(data = map(.data$data, .f, ...)) %>% 
    tidyr::unnest() %>% 
    dplyr::select(-.nest_id)
}

# E.g. 
first_row <- function(.x, to_chr = FALSE) {
  first <- .x[1, ]
  if (to_chr) {
    first[] <- lapply(first, as.character)
  }
  
  tibble::as.tibble(first)
}
mtcars %>% 
  by_group(first_row, to_chr = TRUE)
#> # A tibble: 1 x 11
#>   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
#>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4

mtcars %>% 
  group_by(cyl) %>% 
  by_group(first_row, to_chr = T)
#> # A tibble: 3 x 11
#>   mpg   cyl   disp  hp    drat  wt    qsec  vs    am    gear  carb 
#>   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 21    6     160   110   3.9   2.62  16.46 0     1     4     4    
#> 2 22.8  4     108   93    3.85  2.32  18.61 1     1     4     1    
#> 3 18.7  8     360   175   3.15  3.44  17.02 0     0     3     2

Created on 2018-07-16 by the reprex package (v0.2.0.9000).

In fgeo.tool::add_status_tree() check for TreeID instead of tag

First understand why there may be more than one tag per TreeID (forestgeo/fgeo.data#24).

Suzanne said it was an error.

Wrap vegan

Diversity

E.g. from qo00 in maurolepore/hrv

diversity_by_sample <- wrangle() %>%
  dplyr::group_by(sample_name) %>%
  dplyr::mutate(
    shannon = vegan::diversity(abun_pcnt, "shannon", MARGIN = 2),
    invsimpson = vegan::diversity(abun_pcnt, "invsimpson", MARGIN = 2),
    simpson = vegan::diversity(abun_pcnt, "simpson", MARGIN = 2),
    richness = length(sample_name)
  ) %>%
  dplyr::ungroup() %>%
  dplyr::select(
    time_interval, collected_from, sample_name, species, shannon:richness
  ) %>%
  # Keep only one row per sample -- not one per sample per species
  dplyr::select(-species) %>%
  unique() %>%
  # Transform to long format to allow facetting by metric(key)
  tidyr::gather(key = metric, value = value, shannon:richness)

In fill_na() add see also dplyr::coalesce()

Write a wrapper to recode species or anything.

Fix rm_dead_twice()

Bring collapse_censusid() from fgeo.map.

Combine multiple .rdata via rdata_to_df() and rdata_to_dfs()

See Read section here https://forestgeo.github.io/fgeo.tool/reference/index.html

fgeo classes and methods to classify tables

The job is to homogenize the different datasets so that they can be used with the greatest number of functions.

Use data from Yosemite for example. Suzanne said it's up to date

Identify each table by its names. Check the difference between names to understand what identifies each table uniquely
Check what names intersect
What variables are the same, regardless of their names. See definition in CFTS pakge
What variables are not the same
What variables matter -- the crucial names of all functions I have developed (search for crucial_names())

...
Fgeo_clasify x
If (names x match vft) as.vft
If ... as.stem

vft <- f (x, ...) structure(x, class = c(vft, tibble?, data.frame))

as.fgeo
as_fgeo?

homo_nms.vft
homo_nms.stem
...

error with xlff_to_dfs - repeating sheet name

After running the FastField Forms files through the xlff_to_xl function we then attempted to run the new xlff_to_dfs function. We were unsuccessful and received the following error:

> first_week_hf <- xlff_to_dfs("C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/exports/test")
Error: Column `sheet` must have a unique name
In addition: Warning message:
Adding missing sheets: original_stems, new_secondary_stems, recruits, [root.]

The name of the sheet for each stem repeats multiple times depending upon which sheet the stem originated from.
excel file

fill_na()

Rename replace _all_na() to fill_na() with is more specific. Replace is a general term on the implementation domain -- because it is inspired in replace() -- while fill_na() is on the problem domain.

Consider adding a method for lists.

fill_na <- function(x, filler) {
  x[is.na(x)] <- filler
  x
}

Write a function to identify then homogenize fgeo tables

homogenize may have a method for objects of class elevation.