Comments (25)
Also worth noting - when you originally asked this question a couple years ago there were some oddities to how tidytable worked with quosures. Those are now fixed. So you should be able to use tidy evaluation just like you do with dplyr.
from tidytable.
Another tip - in a "column selection" context (like in select()
or using .by
) you can pass vectors of strings with the all_of()
helper.
pacman::p_load(tidytable)
dynamic_mean <- function(dt, num_col, group_cols){
dt %>%
summarize.(new_val = mean(.data[[num_col]], na.rm=TRUE),
.by = all_of(group_cols))
}
df <- tidytable(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)
df %>% dynamic_mean("z", c("x", "y"))
#> # A tidytable: 2 × 3
#> x y new_val
#> <chr> <chr> <dbl>
#> 1 a a 1.5
#> 2 b b 3
from tidytable.
To be a bit more specific, it would help to know why this works:
iris$Species %>% unique %>% as.list %>% lapply(make_subsets, dt=as.data.table(iris))
...and this doens't:
c('setosa', 'versicolor', 'virginica') %>% as.list %>% lapply(make_subsets, dt=as.data.table(iris))
...with the make_subsets()
function shown above, and how to deal with the latter.
from tidytable.
Ah yep, thanks for opening this issue. I actually just discovered this the other day.
This is a side effect of how tidytable
uses eval(expr(dt_expression))
at the end of all code to build the data.table expression.
Even though "to_filter" is a normal variable input, it still needs to be quoted/unquoted.
make_subsets = function(to_filter, dt){
to_filter <- enexpr(to_filter)
dt %>% dt_filter(Species == !!to_filter)
}
I've been debating how to add this to the documentation, as this is a behavior that differs from the tidyverse
from tidytable.
It would also be nice to have an example of dynamically adding a new column with dt_mutate
or dynamically filtering. For example:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
add_one <- function(.data, add_col, new_col) {
add_col <- enexpr(add_col)
new_col <- enexpr(add_col)
.data %>%
mutate.(!!new_col = !!add_col + 1)
}
df %>%
add_one(x)
...and for filter:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
dynamic_filter <- function(.data, to_filter, filter_column) {
to_filter <- enexpr(to_filter)
filter_column <- enexpr(filter_column)
.data %>%
filter.(!!filter_column == !!to_filter)
}
df %>%
dynamic_filter(x, a, z)
Both of these examples don't work
from tidytable.
Even though "to_filter" is a normal variable input, it still needs to be quoted/unquoted.
Is there a why to allow the function to work regardless of whether the user quotes or does not quote the variable? For example:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
add_one <- function(.data, add_col) {
add_col <- enexpr(add_col)
.data %>%
mutate.(new_col = !!add_col + 1)
}
df %>% add_one(x) # workds
df %>% add_one('x') # fails with code shown above
from tidytable.
I think I'll just build a vignette for these, but for your mutate example this works just like it does in the tidyverse when naming a new column:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
add_one <- function(.data, add_col, new_col) {
add_col <- enexpr(add_col)
new_col <- enexpr(new_col)
.data %>%
mutate.(!!new_col := !!add_col + 1)
}
df %>%
add_one(x, stuff)
And your filter example works like so - I think you just had some arguments mixed up:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
dynamic_filter <- function(.data, to_filter, filter_column) {
to_filter <- enexpr(to_filter)
filter_column <- enexpr(filter_column)
.data %>%
filter.(!!filter_column == !!to_filter)
}
df %>%
dynamic_filter("a", z)
from tidytable.
Is there a why to allow the function to work regardless of whether the user quotes or does not quote the variable?
This behavior unfortunately can't be changed. This is pretty core to how data.table runs
from tidytable.
Thanks for all of the quick replies and explanations! I'm still trying to figure out the following:
make_subsets = function(to_filter, filter_col, dt){
to_filter = enexpr(to_filter)
filter_col = enexpr(filter_col)
dt_filter(dt, !!filter_col == !!to_filter)
}
iris$Species %>% unique %>% as.list %>%
lapply(make_subsets, filter_col=Species, dt=as.data.table(iris)) # works
c('setosa', 'versicolor', 'virginica') %>% as.list %>%
lapply(make_subsets, filter_col=Species, dt=as.data.table(iris)) # doesn't work
c('setosa', 'versicolor', 'virginica') %>% as.factor %>% as.list %>%
lapply(make_subsets, filter_col=Species, dt=as.data.table(iris)) # works
...but the as.factor %>% as.list %>%
doesn't work for everything:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
add_one <- function(add_col, .data) {
add_col <- enexpr(add_col)
.data %>%
mutate.(new_col := !!add_col + 1)
}
c('x') %>% as.factor %>% as.list %>%
lapply(add_one, .data=df) # generates NA's
nevermind, I'm forgetting that it's a factor
from tidytable.
Thanks for all of the quick replies and explanations!
Yep, no problem. I was actually just working on some R stuff so the timing worked out.
I'm still trying to figure out the following:
So there's a couple tips here:
lapply()
can iterate over a vector, so you can skip usingas.list()
.lapply()
is named that because it returns a list, not that it needs a list input.- If you're ever doing filtering of a data.frame/data.table, I would recommend working on character columns instead of factors. So I added a conversion step to the example.
make_subsets = function(to_filter, filter_col, dt){
to_filter = enexpr(to_filter)
filter_col = enexpr(filter_col)
dt_filter(dt, !!filter_col == !!to_filter)
}
# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)
c('setosa', 'versicolor', 'virginica') %>%
lapply(make_subsets, filter_col=Species, dt = iris_df)
from tidytable.
I would recommend working on character columns instead of factors
Is that due to a speedup when using characters instead of factors, or can the user get unintended results when using factors?
Is there a way to pass an unquoted arg trough multiple layers of functions, such as this trivial (but non-functioning) example:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
.add_one <- function(.data, add_col) {
add_col <- enexpr(add_col)
.data %>%
mutate.(new_col = !!add_col + 1)
}
add_one <- function(.data, add_col) {
.add_one(.data, add_col)
}
df %>%
add_one(x) # doesn't work
df %>%
.add_one(x) # does work
from tidytable.
Is that due to a speedup when using characters instead of factors, or can the user get unintended results when using factors?
It's because the user can get unintended results. I've never seen the issue, but supposedly it can occur.
Is there a way to pass an unquoted arg through multiple layers of functions, such as this trivial (but non-functioning) example:
Actually this is a great question. This is possible using rlang
, but can't be recreated using base R. This example is one of the biggest reasons rlang
is such a great package, and is the reason I used it to build tidytable.
The basic idea is you keep quoting/unquoting for the entire chain of functions:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
.add_one <- function(.data, add_col) {
add_col <- enexpr(add_col)
.data %>%
mutate.(new_col = !!add_col + 1)
}
add_one <- function(.data, add_col) {
add_col <- enexpr(add_col)
.add_one(.data, !!add_col)
}
df %>%
add_one(x) # now works
Note that your final add_one()
function can have a different name for "add_col" and it will still work:
.add_one <- function(.data, add_col) {
add_col <- enexpr(add_col)
.data %>%
mutate.(new_col = !!add_col + 1)
}
add_one <- function(.data, col_to_add) {
col_to_add <- enexpr(col_to_add)
.add_one(.data, !!col_to_add)
}
from tidytable.
Thanks @markfairbanks for all of the help today! I feel that I now understand much more about how to write functions that use tidytable!
from tidytable.
Yep, no problem! If you run into any more questions feel free to reach out.
As for this issue, I'll leave it open until I create a vignette to help with building functions.
from tidytable.
I am having a problem with unquoted args with furrr
:
make_subsets = function(to_filter, filter_col, dt){
to_filter = enexpr(to_filter)
filter_col = enexpr(filter_col)
dt_filter(dt, !!filter_col == !!to_filter)
}
# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)
#-- works --#
c('setosa', 'versicolor', 'virginica') %>%
purrr::map(make_subsets, filter_col=Species, dt=iris)
#-- doesn't work --#
c('setosa', 'versicolor', 'virginica') %>%
furrr::future_map(make_subsets, filter_col=Species, dt=iris)
The same error occurs when using future.apply::future_lapply
Any ideas? Maybe I should post something on the furrr
or future
repo?
from tidytable.
plyr
+ doParallel
also seems to have problems:
make_subsets = function(to_filter, filter_col, dt){
to_filter = enexpr(to_filter)
filter_col = enexpr(filter_col)
dt_filter(dt, !!filter_col == !!to_filter)
}
# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)
# -- works -- #
c('setosa', 'versicolor', 'virginica') %>%
plyr::llply(make_subsets, filter_col=Species, dt=iris, .parallel=FALSE)
# -- doesn't work -- #
doParallel::registerDoParallel(2)
c('setosa', 'versicolor', 'virginica') %>%
plyr::llply(make_subsets, filter_col=Species, dt=iris, .parallel=TRUE)
from tidytable.
Also %in%
doesn't work for enexpr
'd variables, but ==
does:
make_subsets = function(to_filter, filter_col, dt){
to_filter = enexpr(to_filter)
filter_col = enexpr(filter_col)
print(filter_col == 'Species') # prints TRUE
print(filter_col %in% c('Species', 'Genus')) # throws an error
}
# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)
c('setosa', 'versicolor', 'virginica') %>%
lapply(make_subsets, filter_col=Species, dt=iris)
from tidytable.
I am having a problem with unquoted args with furrr
This appears to be a furrr issue. Might be worth opening an issue on their page using the tidyverse (I wouldn't use tidytable because it's less common).
This is the example I would use:
library(tidyverse)
make_subsets = function(to_filter, filter_col, dt){
filter_col <- enexpr(filter_col)
filter(dt, !!filter_col == to_filter)
}
# Convert factor cols to character cols first
iris_df <- iris %>% mutate_if(is.factor, as.character)
#-- works --#
c('setosa', 'versicolor', 'virginica') %>%
purrr::map(make_subsets, filter_col=Species, dt=iris_df)
#-- doesn't work --#
c('setosa', 'versicolor', 'virginica') %>%
furrr::future_map(make_subsets, filter_col=Species, dt=iris_df)
from tidytable.
Also %in% doesn't work for enexpr'd variables, but == does:
The second one fails because you're comparing a symbol Species
to a character 'Species'
. ==
works (even though it probably shouldn't), but it's not designed for that comparison.
This will work as intended:
make_subsets = function(to_filter, filter_col, dt){
to_filter = enexpr(to_filter)
filter_col = enexpr(filter_col)
filter_col_char = as.character(filter_col)
print(filter_col_char == 'Species')
print(filter_col_char %in% c('Species', 'Genus'))
}
# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)
c('setosa', 'versicolor', 'virginica') %>%
lapply(make_subsets, filter_col=Species, dt=iris)
Note this works as intended:
df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))
dynamic_filter <- function(.data, to_filter, filter_column) {
to_filter <- enexpr(to_filter)
filter_column <- enexpr(filter_column)
.data %>%
filter.(!!filter_column %in% !!to_filter)
}
df %>%
dynamic_filter("a", z)
Since you're executing this expression inside of data.table/tidytable
from tidytable.
In regards to the parallelization issue, I posted it on the dplyr repo, and Hadley Wickham shut it down: tidyverse/dplyr#5135
It seems that tidytable is limited is its ability to be used in functions that are run in parallel.
from tidytable.
Interesting. Well, it is what it is. Normally there are ways to set up your processes so this isn't an issue. For example this replicates the process you were doing before with make_subset()
/map()
:
iris %>%
group_split.(Species)
Then you can put each data frame in a parallel operation from there.
from tidytable.
Your original example now works in the developmental version. You no longer need to quote/unquote function arguments that are not column names.
library(data.table, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
make_subsets = function(to_filter, dt) {
dt %>% filter.(Species == to_filter)
}
c('setosa', 'versicolor', 'virginica') %>%
lapply(make_subsets, dt=as.data.table(iris))
from tidytable.
All custom functions can be created using the same syntax that you use in the tidyverse. This blog post also talks about the embracing pattern {{}}
that can now be used in tidytable.
from tidytable.
I think I'll just build a vignette for these
Any updated on creating a vignette for this? I keep running into issues, especially when providing quoted variables to the function. For instance, neither of these work:
dynamic_mean <- function(dt, num_col, group_col){
dt %>%
summarize.(new_val = mean({{num_col}}, na.rm=TRUE),
.by = c({{group_col}}))
}
iris %>% dynamic_mean('Sepal.Length', 'Species')
dynamic_mean <- function(dt, num_col, group_col){
num_col <- enexpr(num_col)
group_col <- enexpr(group_col)
dt %>%
summarize.(new_val = mean(!!num_col, na.rm=TRUE),
.by = c(!!group_col))
}
iris %>% dynamic_mean('Sepal.Length', 'Species')
...but then I finally realized that I needed:
dynamic_mean <- function(dt, num_col, group_col){
dt %>%
summarize.(new_val = mean(.data[[num_col]], na.rm=TRUE),
.by = c(.data[[group_col]]))
}
iris %>% dynamic_mean('Sepal.Length', 'Species')
I'm using tidytable 0.9.0
from tidytable.
I don't think I'll be adding a vignette for this. Mostly because the tidyverse team has lots of documentation regarding this issue. There's also lots of resources online you can find by googling tidy evaluation.
This is probably a good place to start..
As far as your functions - I think you sort of stumbled upon the rule of thumb for this. There are strings (like "x"
) and there are symbols (like x
). When programming with dplyr you almost always use symbols (e.g. mean(x)
or x + y
). To pass those in a function you use {{ symbol }}
or use the enquo()
& !!
workflow. But if you want to use a string you can use .data[[string]]
.
Also if you ever have questions feel free to open a discussion here and I can help out.
from tidytable.
Related Issues (20)
- Release v0.10.1
- Tidyverse 2 update HOT 1
- Patch release v0.10.2
- `tidytable` fails with `ceiling_date()` function HOT 2
- `distinct` not working with lists HOT 2
- using `pick()` inside `count()` leads to an unhelpful error HOT 5
- deprecated notation HOT 1
- Bug: unnest() HOT 2
- Failure with dev glue HOT 3
- Release v0.11.0 HOT 1
- When two tidytables are subtracted, the class `tidytable` gets lost on the way. HOT 2
- weird error using testthat HOT 2
- exposing list_flatten HOT 1
- summarize with across behaves in a strange way and gives an error when I use lapply inside
- Renaming in `group_by()` can lead to grouping by nonexistent columns
- ifelse and if_else HOT 1
- pmap is not working for data frames HOT 1
- Error "mutate_rowwise()" several columns. HOT 2
- Group_by bug with purrr::reduce HOT 4
- pmap() doesn't name list elements using existing names HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tidytable.