Git Product home page Git Product logo

Comments (8)

markfairbanks avatar markfairbanks commented on July 19, 2024

Hmm interesting. I'll take a look and see what I can figure out.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Unfortunately I think the example you gave might just end up being a limitation of tidytable. In a vacuum using quosures fixes the issue, but quosures don't work with data.table when using their "by" parameter.

I created a basic shell of how the function would operate. You'll see that while it fixes your example, it causes group aggregation to fail:

library(tidyverse, warn.conflicts = FALSE)
library(rlang, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)

new_mutate. <- function(.df, ..., by = NULL) {

  dots <- enquos(...)
  by <- enquo(by)
  by <- tidytable:::vec_selector_by(.df, !!by)
  .df <- copy(.df)
  
  dt_quo <- quo(.df[ , ':='(!!!dots), by = !!by][])

  eval_tidy(dt_quo, .df)
}

data <- tibble(
    id = seq(1, 3),
    val_1 = seq(1, 3, 1),
    val_2 = seq(4, 6, 1)
)

# Now works
data %>%
  nest_by.(id) %>%
  new_mutate.(example_1 = map2.(data, id, ~ .x %>%
                              new_mutate.(id = .y))) %>%
  unnest.(example_1)
#>    id val_1 val_2 id1
#> 1:  1     1     4   1
#> 2:  2     2     5   2
#> 3:  3     3     6   3

# Fails when using "by" parameter
data.table(x = 1:3, y = c("a", "a", "b")) %>%
  new_mutate.(double_x = x * 2,
              avg_x = mean(x),
              by = y)
#> Error: Base operators are not defined for quosures.
#> Do you need to unquote the quosure?
#> 
#>   # Bad:
#>   myquosure == rhs
#> 
#>   # Good:
#>   !!myquosure == rhs

There's an addition to data.table coming that looks like it might help fix the issue, but we'll see.

Rdatatable/data.table#4304

In the meantime I will try to make a workaround, but unfortunately it won't work as smoothly as the tidyverse implementation.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Quick question - are there any other situations you have run into that this limitation has come up?

One solution for your case would be for me to implement a .keep argument in nest_by.(). That way the columns used for nesting would be "kept" in the list column.

library(rlang, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)

new_nest_by. <- function(.data, ..., .key = "data", .keep = FALSE) {

  .data <- as_tidytable(.data)
  .keep <- .keep

  dots <- enexprs(...)

  if (length(dots) == 0) {

    .data <- tidytable:::eval_expr(.data[, list(data = list(.SD))])

  } else {
    
    split_vars <- tidytable:::dots_selector(.data, ...)
    
    split_list <- group_split.(.data, !!!split_vars, .keep = .keep)

    .data <- distinct.(.data, !!!split_vars)
    
    .data <- mutate.(.data, data = !!split_list)
  }

  if (.key != "data") .data <- rename.(.data, !!.key := data)

  .data
  
}

test_df <- data.table(
  id = LETTERS[seq(1, 3)],
  val_1 = seq(1, 3, 1),
  val_2 = seq(4, 6, 1)
)

test_df %>%
  new_nest_by.(id, .keep = TRUE) %>%
  unnest.(data)
#>    id id1 val_1 val_2
#> 1:  A   A     1     4
#> 2:  B   B     2     5
#> 3:  C   C     3     6

from tidytable.

Ari04T avatar Ari04T commented on July 19, 2024

I haven't had this issue on other situations, but I basically have the same workflow for many of the things that I do with tidytable.

Regarding the solution, the code that I posted is a bit of a toy example.

mutate.(example_1 = map2.(data, id, ~.x %>% 
                                  mutate.(id = .y)))

In reality, I do a couple (or sometimes more) operations inside the map function. The idea behind keeping the id value inside the map2 function was because if I remember correctly I did a left_join inside, and the id was necessary to perform that operation correctly.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Ah gotcha, makes sense.

Well for this specific use case I did add a .keep arg to the dev version of nest_by.(). That's part of the nest_by() function coming in dplyr 1.0, so it was needed functionality.

As far as the issue overall, thanks for pointing it out. I'll keep working on it and see what I can come up with.

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Your examples now work in the GitHub version of tidytable. I should have it on CRAN in a week or so.

Thanks again for finding this!

library(dplyr, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
    
data <- tibble(
  id = LETTERS[seq(1, 3)],
  val_1 = seq(1, 3, 1),
  val_2 = seq(4, 6, 1)
)

data %>% 
  nest_by.(id) %>% 
  mutate.(example_1 = map2.(data, id, ~.x %>% 
                              mutate.(id = .y))) %>% 
  unnest.(example_1)
#>    id val_1 val_2 id1
#> 1:  A     1     4   A
#> 2:  B     2     5   B
#> 3:  C     3     6   C

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Unfortunately I'm going to have to reopen this one. I found pretty massive performance issues from how I was getting quosures to work with data.table. Somehow the speed tests I did do completely missed these issues. The dev version of tidytable (v0.5.2) had to revert these changes.

I'll keep working on this! Apologies ahead of time for the inconvenience

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

This is all fixed in v0.5.3 of tidytable now up on CRAN. Also passes a much wider variety of speed tests 🎉

library(tidytable, warn.conflicts = FALSE)

test_df <- data.table(
  id = seq(1, 3),
  val_1 = seq(1, 3, 1),
  val_2 = seq(4, 6, 1)
)

test_df %>%
  nest_by.(id) %>%
  mutate.(example_1 = map2.(data, id, ~ .x %>%
                              mutate.(id = .y))) %>%
  unnest.(example_1)
#>    id val_1 val_2 id1
#> 1:  1     1     4   1
#> 2:  2     2     5   2
#> 3:  3     3     6   3

from tidytable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.