Complicated title, but fairly simple explanation. When I try to access the .y valu

Can't access the .y value from map2. when using mutate. inside map2. about tidytable HOT 8 CLOSED

markfairbanks commented on July 19, 2024

Can't access the .y value from map2. when using mutate. inside map2.

from tidytable.

Comments (8)

markfairbanks commented on July 19, 2024

Hmm interesting. I'll take a look and see what I can figure out.

from tidytable.

markfairbanks commented on July 19, 2024

Unfortunately I think the example you gave might just end up being a limitation of tidytable. In a vacuum using quosures fixes the issue, but quosures don't work with data.table when using their "by" parameter.

I created a basic shell of how the function would operate. You'll see that while it fixes your example, it causes group aggregation to fail:

library(tidyverse, warn.conflicts = FALSE)
library(rlang, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)

new_mutate. <- function(.df, ..., by = NULL) {

  dots <- enquos(...)
  by <- enquo(by)
  by <- tidytable:::vec_selector_by(.df, !!by)
  .df <- copy(.df)
  
  dt_quo <- quo(.df[ , ':='(!!!dots), by = !!by][])

  eval_tidy(dt_quo, .df)
}

data <- tibble(
    id = seq(1, 3),
    val_1 = seq(1, 3, 1),
    val_2 = seq(4, 6, 1)
)

# Now works
data %>%
  nest_by.(id) %>%
  new_mutate.(example_1 = map2.(data, id, ~ .x %>%
                              new_mutate.(id = .y))) %>%
  unnest.(example_1)
#>    id val_1 val_2 id1
#> 1:  1     1     4   1
#> 2:  2     2     5   2
#> 3:  3     3     6   3

# Fails when using "by" parameter
data.table(x = 1:3, y = c("a", "a", "b")) %>%
  new_mutate.(double_x = x * 2,
              avg_x = mean(x),
              by = y)
#> Error: Base operators are not defined for quosures.
#> Do you need to unquote the quosure?
#> 
#>   # Bad:
#>   myquosure == rhs
#> 
#>   # Good:
#>   !!myquosure == rhs

There's an addition to data.table coming that looks like it might help fix the issue, but we'll see.

Rdatatable/data.table#4304

In the meantime I will try to make a workaround, but unfortunately it won't work as smoothly as the tidyverse implementation.

from tidytable.

markfairbanks commented on July 19, 2024

Quick question - are there any other situations you have run into that this limitation has come up?

One solution for your case would be for me to implement a .keep argument in nest_by.(). That way the columns used for nesting would be "kept" in the list column.

library(rlang, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)

new_nest_by. <- function(.data, ..., .key = "data", .keep = FALSE) {

  .data <- as_tidytable(.data)
  .keep <- .keep

  dots <- enexprs(...)

  if (length(dots) == 0) {

    .data <- tidytable:::eval_expr(.data[, list(data = list(.SD))])

  } else {
    
    split_vars <- tidytable:::dots_selector(.data, ...)
    
    split_list <- group_split.(.data, !!!split_vars, .keep = .keep)

    .data <- distinct.(.data, !!!split_vars)
    
    .data <- mutate.(.data, data = !!split_list)
  }

  if (.key != "data") .data <- rename.(.data, !!.key := data)

  .data
  
}

test_df <- data.table(
  id = LETTERS[seq(1, 3)],
  val_1 = seq(1, 3, 1),
  val_2 = seq(4, 6, 1)
)

test_df %>%
  new_nest_by.(id, .keep = TRUE) %>%
  unnest.(data)
#>    id id1 val_1 val_2
#> 1:  A   A     1     4
#> 2:  B   B     2     5
#> 3:  C   C     3     6

from tidytable.

Ari04T commented on July 19, 2024

I haven't had this issue on other situations, but I basically have the same workflow for many of the things that I do with tidytable.

Regarding the solution, the code that I posted is a bit of a toy example.

mutate.(example_1 = map2.(data, id, ~.x %>% 
                                  mutate.(id = .y)))

In reality, I do a couple (or sometimes more) operations inside the map function. The idea behind keeping the id value inside the map2 function was because if I remember correctly I did a left_join inside, and the id was necessary to perform that operation correctly.

from tidytable.

markfairbanks commented on July 19, 2024

Ah gotcha, makes sense.

Well for this specific use case I did add a .keep arg to the dev version of nest_by.(). That's part of the nest_by() function coming in dplyr 1.0, so it was needed functionality.

As far as the issue overall, thanks for pointing it out. I'll keep working on it and see what I can come up with.

from tidytable.

markfairbanks commented on July 19, 2024

Your examples now work in the GitHub version of tidytable. I should have it on CRAN in a week or so.

Thanks again for finding this!

library(dplyr, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
    
data <- tibble(
  id = LETTERS[seq(1, 3)],
  val_1 = seq(1, 3, 1),
  val_2 = seq(4, 6, 1)
)

data %>% 
  nest_by.(id) %>% 
  mutate.(example_1 = map2.(data, id, ~.x %>% 
                              mutate.(id = .y))) %>% 
  unnest.(example_1)
#>    id val_1 val_2 id1
#> 1:  A     1     4   A
#> 2:  B     2     5   B
#> 3:  C     3     6   C

from tidytable.

markfairbanks commented on July 19, 2024

Unfortunately I'm going to have to reopen this one. I found pretty massive performance issues from how I was getting quosures to work with data.table. Somehow the speed tests I did do completely missed these issues. The dev version of tidytable (v0.5.2) had to revert these changes.

I'll keep working on this! Apologies ahead of time for the inconvenience

from tidytable.

markfairbanks commented on July 19, 2024

This is all fixed in v0.5.3 of tidytable now up on CRAN. Also passes a much wider variety of speed tests 🎉

library(tidytable, warn.conflicts = FALSE)

test_df <- data.table(
  id = seq(1, 3),
  val_1 = seq(1, 3, 1),
  val_2 = seq(4, 6, 1)
)

test_df %>%
  nest_by.(id) %>%
  mutate.(example_1 = map2.(data, id, ~ .x %>%
                              mutate.(id = .y))) %>%
  unnest.(example_1)
#>    id val_1 val_2 id1
#> 1:  1     1     4   1
#> 2:  2     2     5   2
#> 3:  3     3     6   3

from tidytable.

Can't access the .y value from map2. when using mutate. inside map2. about tidytable HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent