Git Product home page Git Product logo

Comments (25)

markfairbanks avatar markfairbanks commented on July 19, 2024 1

Also worth noting - when you originally asked this question a couple years ago there were some oddities to how tidytable worked with quosures. Those are now fixed. So you should be able to use tidy evaluation just like you do with dplyr.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024 1

Another tip - in a "column selection" context (like in select() or using .by) you can pass vectors of strings with the all_of() helper.

pacman::p_load(tidytable)

dynamic_mean <- function(dt, num_col, group_cols){
  dt %>%
    summarize.(new_val = mean(.data[[num_col]], na.rm=TRUE),
               .by = all_of(group_cols))
}

df <- tidytable(x = c("a", "a", "b"), y = c("a", "a", "b"), z = 1:3)

df %>% dynamic_mean("z", c("x", "y"))
#> # A tidytable: 2 × 3
#>   x     y     new_val
#>   <chr> <chr>   <dbl>
#> 1 a     a         1.5
#> 2 b     b         3

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

To be a bit more specific, it would help to know why this works:

iris$Species %>% unique %>% as.list %>% lapply(make_subsets, dt=as.data.table(iris))

...and this doens't:

c('setosa', 'versicolor', 'virginica') %>% as.list %>% lapply(make_subsets, dt=as.data.table(iris))

...with the make_subsets() function shown above, and how to deal with the latter.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Ah yep, thanks for opening this issue. I actually just discovered this the other day.

This is a side effect of how tidytable uses eval(expr(dt_expression)) at the end of all code to build the data.table expression.

Even though "to_filter" is a normal variable input, it still needs to be quoted/unquoted.

make_subsets = function(to_filter, dt){
  to_filter <- enexpr(to_filter)
  
  dt %>% dt_filter(Species == !!to_filter)
}

I've been debating how to add this to the documentation, as this is a behavior that differs from the tidyverse

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

It would also be nice to have an example of dynamically adding a new column with dt_mutate or dynamically filtering. For example:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

add_one <- function(.data, add_col, new_col) {
  add_col <- enexpr(add_col)
  new_col <- enexpr(add_col)

  .data %>%
    mutate.(!!new_col = !!add_col + 1)
}

df %>%
  add_one(x)

...and for filter:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

dynamic_filter <- function(.data, to_filter, filter_column) {
  to_filter <- enexpr(to_filter)
  filter_column <- enexpr(filter_column)

  .data %>%
    filter.(!!filter_column == !!to_filter)
}

df %>%
  dynamic_filter(x, a, z)

Both of these examples don't work

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

Even though "to_filter" is a normal variable input, it still needs to be quoted/unquoted.

Is there a why to allow the function to work regardless of whether the user quotes or does not quote the variable? For example:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

add_one <- function(.data, add_col) {
  add_col <- enexpr(add_col)
  .data %>%
    mutate.(new_col = !!add_col + 1)
}

df %>%  add_one(x)   # workds
df %>%  add_one('x')   # fails with code shown above

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

I think I'll just build a vignette for these, but for your mutate example this works just like it does in the tidyverse when naming a new column:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

add_one <- function(.data, add_col, new_col) {
  add_col <- enexpr(add_col)
  new_col <- enexpr(new_col)

  .data %>%
    mutate.(!!new_col := !!add_col + 1)
}

df %>%
  add_one(x, stuff)

And your filter example works like so - I think you just had some arguments mixed up:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

dynamic_filter <- function(.data, to_filter, filter_column) {
  to_filter <- enexpr(to_filter)
  filter_column <- enexpr(filter_column)

  .data %>%
    filter.(!!filter_column == !!to_filter)
}

df %>%
  dynamic_filter("a", z)

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Is there a why to allow the function to work regardless of whether the user quotes or does not quote the variable?

This behavior unfortunately can't be changed. This is pretty core to how data.table runs

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

Thanks for all of the quick replies and explanations! I'm still trying to figure out the following:

make_subsets = function(to_filter, filter_col, dt){
    to_filter = enexpr(to_filter)
    filter_col = enexpr(filter_col)
    dt_filter(dt, !!filter_col == !!to_filter)
}
iris$Species %>% unique %>% as.list %>% 
    lapply(make_subsets, filter_col=Species, dt=as.data.table(iris))   # works

c('setosa', 'versicolor', 'virginica') %>% as.list %>% 
    lapply(make_subsets, filter_col=Species, dt=as.data.table(iris))   # doesn't work

c('setosa', 'versicolor', 'virginica') %>% as.factor %>% as.list %>% 
    lapply(make_subsets, filter_col=Species, dt=as.data.table(iris))   # works

...but the as.factor %>% as.list %>% doesn't work for everything:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

add_one <- function(add_col, .data) {
  add_col <- enexpr(add_col)

  .data %>%
    mutate.(new_col := !!add_col + 1)
}

c('x') %>% as.factor %>% as.list %>%
    lapply(add_one, .data=df)  # generates NA's

nevermind, I'm forgetting that it's a factor

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Thanks for all of the quick replies and explanations!

Yep, no problem. I was actually just working on some R stuff so the timing worked out.

I'm still trying to figure out the following:

So there's a couple tips here:

  1. lapply() can iterate over a vector, so you can skip using as.list(). lapply() is named that because it returns a list, not that it needs a list input.
  2. If you're ever doing filtering of a data.frame/data.table, I would recommend working on character columns instead of factors. So I added a conversion step to the example.
make_subsets = function(to_filter, filter_col, dt){
    to_filter = enexpr(to_filter)
    filter_col = enexpr(filter_col)
    dt_filter(dt, !!filter_col == !!to_filter)
}

# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)

c('setosa', 'versicolor', 'virginica') %>%
    lapply(make_subsets, filter_col=Species, dt = iris_df)

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

I would recommend working on character columns instead of factors

Is that due to a speedup when using characters instead of factors, or can the user get unintended results when using factors?

Is there a way to pass an unquoted arg trough multiple layers of functions, such as this trivial (but non-functioning) example:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

.add_one <- function(.data, add_col) {
  add_col <- enexpr(add_col)
  .data %>%
    mutate.(new_col = !!add_col + 1)
}

add_one <- function(.data, add_col) {
  .add_one(.data, add_col)
}

df %>%
  add_one(x)  # doesn't work

df %>%
  .add_one(x)  # does work

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Is that due to a speedup when using characters instead of factors, or can the user get unintended results when using factors?

It's because the user can get unintended results. I've never seen the issue, but supposedly it can occur.

Is there a way to pass an unquoted arg through multiple layers of functions, such as this trivial (but non-functioning) example:

Actually this is a great question. This is possible using rlang, but can't be recreated using base R. This example is one of the biggest reasons rlang is such a great package, and is the reason I used it to build tidytable.

The basic idea is you keep quoting/unquoting for the entire chain of functions:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

.add_one <- function(.data, add_col) {
  add_col <- enexpr(add_col)
  .data %>%
    mutate.(new_col = !!add_col + 1)
}

add_one <- function(.data, add_col) {
  add_col <- enexpr(add_col)
  
  .add_one(.data, !!add_col)
}

df %>%
  add_one(x)  # now works

Note that your final add_one() function can have a different name for "add_col" and it will still work:

.add_one <- function(.data, add_col) {
  add_col <- enexpr(add_col)
  .data %>%
    mutate.(new_col = !!add_col + 1)
}

add_one <- function(.data, col_to_add) {
  col_to_add <- enexpr(col_to_add)
  
  .add_one(.data, !!col_to_add)
}

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

Thanks @markfairbanks for all of the help today! I feel that I now understand much more about how to write functions that use tidytable!

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Yep, no problem! If you run into any more questions feel free to reach out.

As for this issue, I'll leave it open until I create a vignette to help with building functions.

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

I am having a problem with unquoted args with furrr:

make_subsets = function(to_filter, filter_col, dt){
    to_filter = enexpr(to_filter)
    filter_col = enexpr(filter_col)
    dt_filter(dt, !!filter_col == !!to_filter)
}

# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)

#-- works --#
c('setosa', 'versicolor', 'virginica') %>%
    purrr::map(make_subsets, filter_col=Species, dt=iris)

#-- doesn't work --#
c('setosa', 'versicolor', 'virginica') %>%
    furrr::future_map(make_subsets, filter_col=Species, dt=iris)

The same error occurs when using future.apply::future_lapply

Any ideas? Maybe I should post something on the furrr or future repo?

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

plyr + doParallel also seems to have problems:

make_subsets = function(to_filter, filter_col, dt){
    to_filter = enexpr(to_filter)
    filter_col = enexpr(filter_col)
    dt_filter(dt, !!filter_col == !!to_filter)
}

# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)

# -- works -- #
c('setosa', 'versicolor', 'virginica') %>%
   plyr::llply(make_subsets, filter_col=Species, dt=iris, .parallel=FALSE)

# -- doesn't work -- #
doParallel::registerDoParallel(2)
c('setosa', 'versicolor', 'virginica') %>%
   plyr::llply(make_subsets, filter_col=Species, dt=iris, .parallel=TRUE)

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

Also %in% doesn't work for enexpr'd variables, but == does:

make_subsets = function(to_filter, filter_col, dt){
    to_filter = enexpr(to_filter)
    filter_col = enexpr(filter_col)
    print(filter_col == 'Species')                           # prints TRUE
    print(filter_col %in% c('Species', 'Genus'))    # throws an error
}

# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)

c('setosa', 'versicolor', 'virginica') %>%
   lapply(make_subsets, filter_col=Species, dt=iris)

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

I am having a problem with unquoted args with furrr

This appears to be a furrr issue. Might be worth opening an issue on their page using the tidyverse (I wouldn't use tidytable because it's less common).

This is the example I would use:

library(tidyverse)

make_subsets = function(to_filter, filter_col, dt){
    filter_col <- enexpr(filter_col)
    
    filter(dt, !!filter_col == to_filter)
}

# Convert factor cols to character cols first
iris_df <- iris %>% mutate_if(is.factor, as.character)

#-- works --#
c('setosa', 'versicolor', 'virginica') %>%
    purrr::map(make_subsets, filter_col=Species, dt=iris_df)

#-- doesn't work --#
c('setosa', 'versicolor', 'virginica') %>%
    furrr::future_map(make_subsets, filter_col=Species, dt=iris_df)

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Also %in% doesn't work for enexpr'd variables, but == does:

The second one fails because you're comparing a symbol Species to a character 'Species'. == works (even though it probably shouldn't), but it's not designed for that comparison.

This will work as intended:

make_subsets = function(to_filter, filter_col, dt){
    to_filter = enexpr(to_filter)
    filter_col = enexpr(filter_col)
    
    filter_col_char = as.character(filter_col)

    print(filter_col_char == 'Species')
    print(filter_col_char %in% c('Species', 'Genus'))
}

# Convert factor cols to character cols first
iris_df <- iris %>% mutate_across.(is.factor, as.character)

c('setosa', 'versicolor', 'virginica') %>%
   lapply(make_subsets, filter_col=Species, dt=iris)

Note this works as intended:

df <- data.table(x = c(1,1,1), y = c(1,1,1), z = c("a","a","b"))

dynamic_filter <- function(.data, to_filter, filter_column) {
  to_filter <- enexpr(to_filter)
  filter_column <- enexpr(filter_column)

  .data %>%
    filter.(!!filter_column %in% !!to_filter)
}

df %>%
  dynamic_filter("a", z)

Since you're executing this expression inside of data.table/tidytable

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

In regards to the parallelization issue, I posted it on the dplyr repo, and Hadley Wickham shut it down: tidyverse/dplyr#5135

It seems that tidytable is limited is its ability to be used in functions that are run in parallel.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Interesting. Well, it is what it is. Normally there are ways to set up your processes so this isn't an issue. For example this replicates the process you were doing before with make_subset()/map():

iris %>%
  group_split.(Species)

Then you can put each data frame in a parallel operation from there.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Your original example now works in the developmental version. You no longer need to quote/unquote function arguments that are not column names.

library(data.table, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)

make_subsets = function(to_filter, dt) {
    dt %>% filter.(Species == to_filter)
}

c('setosa', 'versicolor', 'virginica') %>%
  lapply(make_subsets, dt=as.data.table(iris))

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

All custom functions can be created using the same syntax that you use in the tidyverse. This blog post also talks about the embracing pattern {{}} that can now be used in tidytable.

from tidytable.

nick-youngblut avatar nick-youngblut commented on July 19, 2024

I think I'll just build a vignette for these

Any updated on creating a vignette for this? I keep running into issues, especially when providing quoted variables to the function. For instance, neither of these work:

dynamic_mean <- function(dt, num_col, group_col){
  dt %>%
    summarize.(new_val = mean({{num_col}}, na.rm=TRUE),
               .by = c({{group_col}}))
}

iris %>% dynamic_mean('Sepal.Length', 'Species')
dynamic_mean <- function(dt, num_col, group_col){
  num_col <- enexpr(num_col)
  group_col  <- enexpr(group_col)
  dt %>%
    summarize.(new_val = mean(!!num_col, na.rm=TRUE),
               .by = c(!!group_col))
}

iris %>% dynamic_mean('Sepal.Length', 'Species')

...but then I finally realized that I needed:

dynamic_mean <- function(dt, num_col, group_col){
  dt %>%
    summarize.(new_val = mean(.data[[num_col]], na.rm=TRUE),
               .by = c(.data[[group_col]]))
}

iris %>% dynamic_mean('Sepal.Length', 'Species')

I'm using tidytable 0.9.0

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

I don't think I'll be adding a vignette for this. Mostly because the tidyverse team has lots of documentation regarding this issue. There's also lots of resources online you can find by googling tidy evaluation.

This is probably a good place to start..

As far as your functions - I think you sort of stumbled upon the rule of thumb for this. There are strings (like "x") and there are symbols (like x). When programming with dplyr you almost always use symbols (e.g. mean(x) or x + y). To pass those in a function you use {{ symbol }} or use the enquo() & !! workflow. But if you want to use a string you can use .data[[string]].

Also if you ever have questions feel free to open a discussion here and I can help out.

from tidytable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.