forestgeo / fgeo.tool Goto Github PK
View Code? Open in Web Editor NEW[R-package on CRAN] General purpose tools for ForestGEO Packages
Home Page: https://forestgeo.github.io/fgeo.tool
License: Other
[R-package on CRAN] General purpose tools for ForestGEO Packages
Home Page: https://forestgeo.github.io/fgeo.tool
License: Other
New data structure has QX and QY variables
(Stuart's suggestion)
To Make it consistent with
not_dead_twice(x)
tree_not_dead_twice(x)
tree_status(x, status = "not_dead_twice")
See Read section here https://forestgeo.github.io/fgeo.tool/reference/index.html
Please briefly describe your problem and what output you expect. If you have a question, please don't use this form. Instead, ask on https://stackoverflow.com/, https://community.rstudio.com/, https://github.com/forestgeo/forum/ or email Mauro Lepore at [email protected].
Please include a minimal reproducible example (AKA a reprex). If you've never heard of a reprex before, start by reading https://www.tidyverse.org/help/#reprex.
Brief description of the problem
When using FastField Forms for a re-census we've discovered or had submissions missing one or multiple of the required sheets: 'new_secondary_stems', 'recruits', and 'original_stems'. If no recruits are recorded in a quadrat, no sheet is exported. Likewise, post-submission if a recruit is found, then added to a 'new' file by itself, the 'new_secondary_stems', and 'original_stems' sheets are missing. This of course breaks the code and the files cannot be compiled. Any solution to this, or should I reach out to FastField Forms/add the sheets ourselves?
> ######## Compile the separate worksheets from the quadrat workbooks into new excel files
> files.directory <- "C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/uploads"
> files.export <- "C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/exports/test"
> xl_sheets_to_xl(files.directory, files.export, first_census = FALSE)
Error: Data should contain these sheets:
original_stems, new_secondary_stems, recruits, root
* Missing sheets: recruits
In addition: Warning messages:
1: `new_secondary_stems` has cero rows.
2: Filling every cero-row dataframe with NAs (new_secondary_stems).
3: `new_secondary_stems` has cero rows.
4: Filling every cero-row dataframe with NAs (new_secondary_stems).
The job is to homogenize the different datasets so that they can be used with the greatest number of functions.
Use data from Yosemite for example. Suzanne said it's up to date
Identify each table by its names. Check the difference between names to understand what identifies each table uniquely
Check what names intersect
What variables are the same, regardless of their names. See definition in CFTS pakge
What variables are not the same
What variables matter -- the crucial names of all functions I have developed (search for crucial_names()
)
...
Fgeo_clasify x
If (names x match vft) as.vft
If ... as.stem
vft <- f (x, ...) structure(x, class = c(vft, tibble?, data.frame))
as.fgeo
as_fgeo?
homo_nms.vft
homo_nms.stem
...
Attach old names to the object in question, via attributes.
Helena apparently has code that may be useful as a guide.
This may be removed via #61
After running the FastField Forms files through the xlff_to_xl function we then attempted to run the new xlff_to_dfs function. We were unsuccessful and received the following error:
> first_week_hf <- xlff_to_dfs("C:/Users/shuej/Dropbox (Smithsonian)/Field_Form_Data_Entry/HF/exports/test")
Error: Column `sheet` must have a unique name
In addition: Warning message:
Adding missing sheets: original_stems, new_secondary_stems, recruits, [root.]
The name of the sheet for each stem repeats multiple times depending upon which sheet the stem originated from.
excel file
tree_status(x, status = "not_dead")
tree_not_dead(x)
tree_status(x, status = "*")
tree_*(x)
stem_status(x, status = "not_dead")
stem_not_dead(x)
stem_status(x, status = "*")
stem_*(x)
It would spread or gather whatever makes sense for each fgeo class.
It could output a message of a table showing what renames were made.
old_name new_name
-----------------------------
"old name" "new name"
Consider reexporting the main verbs and common tools.
pro:
everything is available with (fgeo)
con:
conflicts will show up when loading tidyverse
makes it harder to discover the tidyverse
filter
select
arrange
summarize
mutate
group_by
tibble
tribble
as_tibble
count
add_count
# I want to make any funciton work with grouped data. I propose
# by_group(.data, .f, ...), similar to map(), by applies .f() not to each column
# of .data but to each group -- and to all of it if .data is ungrouped.
# alias: grouply(), groupwise().
# History of this feature including upcomming dplyr::nest_by()
# https://community.rstudio.com/t/is-nest-mutate-map-unnest-really-the-best-alternative-to-dplyr-do/11009/7?u=mauro_lepore
library(tidyverse)
# E.g.
# nest_by_groups(mtcars)
# nest_by_groups(group_by(mtcars, cyl))
nest_by_groups <- function(.data) {
g <- group_vars(.data)
.data %>%
tibble::as.tibble() %>%
tibble::add_column(.nest_id = dplyr::group_indices(.)) %>%
dplyr::ungroup() %>%
tidyr::nest(-.nest_id)
}
by_group <- function(.data, .f, ...) {
.data %>%
nest_by_groups() %>%
mutate(data = map(.data$data, .f, ...)) %>%
tidyr::unnest() %>%
dplyr::select(-.nest_id)
}
# E.g.
first_row <- function(.x, to_chr = FALSE) {
first <- .x[1, ]
if (to_chr) {
first[] <- lapply(first, as.character)
}
tibble::as.tibble(first)
}
mtcars %>%
by_group(first_row, to_chr = TRUE)
#> # A tibble: 1 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
mtcars %>%
group_by(cyl) %>%
by_group(first_row, to_chr = T)
#> # A tibble: 3 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 21 6 160 110 3.9 2.62 16.46 0 1 4 4
#> 2 22.8 4 108 93 3.85 2.32 18.61 1 1 4 1
#> 3 18.7 8 360 175 3.15 3.44 17.02 0 0 3 2
Created on 2018-07-16 by the reprex package (v0.2.0.9000).
Wrap as read_vft()
(Similar to read_taxa()
).
read_tsv(
file,
col_types = fgeo.tool::type_taxa(),
na = c("", "NA", "NULL")
)
data_dictionary
First understand why there may be more than one tag
per TreeID
(forestgeo/fgeo.data#24).
Suzanne said it was an error.
Pull the code out of rtbl?
Rewrite? E.g.
pick_stem_max_dbh <- function(.data) {
# Allow input as ViewFullTable, stem and tree table
old <- names(.data)
names(.data) <- tolower(old)
fgeo.base::check_crucial_names(.data, c("treeid", "stemid", "dbh"))
# TODO: Check for unique censusid or group by census id
# TODO:
# if (tree_already) {return(stats::setNames(.data, old))}
.data <- tibble::rowid_to_column(.data)
.data <- dplyr::group_by(.data, .data$treeid)
.data <- dplyr::arrange(.data, dplyr::desc(.data$dbh))
.data <- dplyr::ungroup(.data)
out <- dplyr::distinct(.data, .data$treeid, .keep_all = TRUE)
# Recover original order
out <- dplyr::select(dplyr::arrange(out, .data$rowid), -.data$rowid)
stats::setNames(out, old)
}
df <- tibble::tibble(
treeID = c(1, 1, 1, 2, 2, 2),
stemID = letters[c(1, 2, 3, 1, 2, 3)],
dbh = c(1, 2, NA, 4, 5, NA)
)
df
#> # A tibble: 6 x 3
#> treeID stemID dbh
#> <dbl> <chr> <dbl>
#> 1 1 a 1
#> 2 1 b 2
#> 3 1 c NA
#> 4 2 a 4
#> 5 2 b 5
#> 6 2 c NA
pick_stem_max_dbh(df)
#> # A tibble: 2 x 3
#> treeID stemID dbh
#> <dbl> <chr> <dbl>
#> 1 1 b 2
#> 2 2 b 5
Created on 2018-06-28 by the reprex package (v0.2.0).
From @maurolepore on August 31, 2017 16:19
Gabriel proposed to develop a friendly way to categorize (cut) numeric variables. Important ones include:
add_quad()
.suppressPackageStartupMessages(library(fgeo))
x <- tibble::tibble(gx = c(0, 50, 999.9, 1000), gy = gx/2)
add_quad(x)
#> Gessing: plotdim = c(1000, 500)
#> * If guess is wrong, provide the correct argument `plotdim`
#> # A tibble: 4 x 3
#> gx gy quad
#> <dbl> <dbl> <chr>
#> 1 0 0 0101
#> 2 50 25 0302
#> 3 1000. 500. 5025
#> 4 1000 500 NANA
add_quad(x, start = 0)
#> Gessing: plotdim = c(1000, 500)
#> * If guess is wrong, provide the correct argument `plotdim`
#> # A tibble: 4 x 3
#> gx gy quad
#> <dbl> <dbl> <chr>
#> 1 0 0 0000
#> 2 50 25 0201
#> 3 1000. 500. 4924
#> 4 1000 500 NANA
Created on 2018-07-02 by the reprex package (v0.2.0).
Copied from original issue: forestgeo/fgeo.abundance#46
The input is a comma separated file. I get no warning if the file is read with delim = ", "
.
homogenize may have a method for objects of class elevation.
See fgeo.base::keep_status()
and friends.
mpala is the only dataset that has so many quadrats that it needs to be padded with 5 instead of 4 "0".
Write an example showing that if the data is subsetted before it is passed to plot_tag_status_by_subquadrat()
the order of maps may be different than expected. So show how to arrange the data before subsetting it.
Bring collapse_censusid()
from fgeo.map.
Can I replace it with base:::keep_*()?
replace_all_na <- function(x, filler = 0) {
replace(x, http://is.na (x), filler)
}
replace_all_na(df, 0)
#> x y
#> 1 0 2
#> 2 1 0
replace_all_na(df, "missing")
#> x y
#> 1 missing 2
#> 2 1 missing
Here the output should be 1 -- ont 2 (@).
library(fgeo)
#> -- Attaching packages -------------------------------------------- fgeo 0.0.0.9000 --
#> v bciex 0.0.0.9000 v fgeo.demography 0.0.0.9000
#> v fgeo.abundance 0.0.0.9004 v fgeo.habitat 0.0.0.9006
#> v fgeo.base 0.0.0.9001 v fgeo.map 0.0.0.9204
#> v fgeo.data 0.0.0.9002 v fgeo.tool 0.0.0.9003
#>
vft <- data.frame(
StemID = c("1", "2"),
Tag = c("0001", "0001"),
PlotName = "p",
Status = c("alive", "alive"),
DBH = c(10, 100),
ExactDate = c("2000-01-01", "2000-01-01"),
PlotCensusNumber = c(1, 1),
CensusID = c(1, 1),
Genus = c("A", "A"),
SpeciesName = c("a", "a"),
Family = "f",
stringsAsFactors = FALSE
)
vft
#> StemID Tag PlotName Status DBH ExactDate PlotCensusNumber CensusID
#> 1 1 0001 p alive 10 2000-01-01 1 1
#> 2 2 0001 p alive 100 2000-01-01 1 1
#> Genus SpeciesName Family
#> 1 A a f
#> 2 A a f
# First pick the data you want
pick1 <- pick_plotname(vft, "p")
#> Using: p.
pick2 <- drop_dead_trees_by_cns(pick1)
#> Calculating tree-status (from stem `Status`) by `PlotCensusNumber`.
#> Warning: No observation has .status = dead
#> * Detected values: alive
#> Dropping rows where `Status = dead`.
#> Warning: No observation has .status = dead
#> * Detected values: alive
pick3 <- pick_dbh_min(pick2, 10)
pick3
#> # A tibble: 2 x 12
#> StemID Tag PlotName Status DBH ExactDate PlotCensusNumber CensusID
#> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 1 0001 p alive 10 2000-01-01 1 1
#> 2 2 0001 p alive 100 2000-01-01 1 1
#> # ... with 4 more variables: Genus <chr>, SpeciesName <chr>, Family <chr>,
#> # status_tree <chr>
byyr_abundance(pick3)
#> # A tibble: 1 x 3
#> species Family `2000`
#> <chr> <chr> <dbl>
#> 1 A a f 2
Created on 2018-06-20 by the reprex package (v0.2.0).
Rename replace _all_na() to fill_na() with is more specific. Replace is a general term on the implementation domain -- because it is inspired in replace() -- while fill_na() is on the problem domain.
Consider adding a method for lists.
fill_na <- function(x, filler) {
x[is.na(x)] <- filler
x
}
DRY. Some funcitons seem duplicated with fgeo.tool.
flat_multiple()
is being used but flag_if()
and multiple_var()
are not. See if and which is a better alternative to flag_multiple()
and similar functions in fgeo.tool. Then clean the rest.
I want to index abundance.
E.g. from qo00 in maurolepore/hrv
diversity_by_sample <- wrangle() %>%
dplyr::group_by(sample_name) %>%
dplyr::mutate(
shannon = vegan::diversity(abun_pcnt, "shannon", MARGIN = 2),
invsimpson = vegan::diversity(abun_pcnt, "invsimpson", MARGIN = 2),
simpson = vegan::diversity(abun_pcnt, "simpson", MARGIN = 2),
richness = length(sample_name)
) %>%
dplyr::ungroup() %>%
dplyr::select(
time_interval, collected_from, sample_name, species, shannon:richness
) %>%
# Keep only one row per sample -- not one per sample per species
dplyr::select(-species) %>%
unique() %>%
# Transform to long format to allow facetting by metric(key)
tidyr::gather(key = metric, value = value, shannon:richness)
This relates to #16
Consider:
restructure_if_need()
restructure_if_should()
restructure_if_needed()
may_restructure()
restructure_maybe()
restructure_possibly()
select2 <- function(.data, ...) {
dots <- rlang::list2(...)
select(.data, !!!text_exprs(dots))
}
text_exprs <- function(...) {
rlang::parse_exprs(semicolon(...))
}
semicolon <- function(...) {
paste0(..., collapse = "; ")
}
library(tidyverse)
mtcars <- as.tibble(mtcars)
text_vars <- c("mpg", "am")
select2(mtcars, text_vars)
#> # A tibble: 32 x 2
#> mpg am
#> * <dbl> <dbl>
#> 1 21 1
#> 2 21 1
#> 3 22.8 1
#> 4 21.4 0
#> 5 18.7 0
#> 6 18.1 0
#> 7 14.3 0
#> 8 24.4 0
#> 9 22.8 0
#> 10 19.2 0
#> # ... with 22 more rows
# If multiple strings, separate vars with comma
text_vars <- c("mpg", "am", "carb")
select2(mtcars, text_vars)
#> # A tibble: 32 x 3
#> mpg am carb
#> * <dbl> <dbl> <dbl>
#> 1 21 1 4
#> 2 21 1 4
#> 3 22.8 1 1
#> 4 21.4 0 1
#> 5 18.7 0 2
#> 6 18.1 0 1
#> 7 14.3 0 4
#> 8 24.4 0 2
#> 9 22.8 0 2
#> 10 19.2 0 4
#> # ... with 22 more rows
# If single string, separate with semicolon
text_vars <- c("mpg; am; carb")
select2(mtcars, text_vars)
#> # A tibble: 32 x 3
#> mpg am carb
#> * <dbl> <dbl> <dbl>
#> 1 21 1 4
#> 2 21 1 4
#> 3 22.8 1 1
#> 4 21.4 0 1
#> 5 18.7 0 2
#> 6 18.1 0 1
#> 7 14.3 0 4
#> 8 24.4 0 2
#> 9 22.8 0 2
#> 10 19.2 0 4
#> # ... with 22 more rows
# With a single string you can do all you can do with bare
text <- 'mpg:cyl; matches("df"); everything()'
select2(mtcars, text)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
# Same
library(rlang)
#>
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#>
#> %@%, %||%, as_function, flatten, flatten_chr, flatten_dbl,
#> flatten_int, flatten_lgl, invoke, list_along, modify, prepend,
#> rep_along, splice
bare <- exprs(mpg:cyl, matches("df"), everything())
select(mtcars, !!!bare)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
Created on 2018-04-29 by the reprex package (v0.2.0).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.