Comments (8)
Hmm interesting. I'll take a look and see what I can figure out.
from tidytable.
Unfortunately I think the example you gave might just end up being a limitation of tidytable. In a vacuum using quosures fixes the issue, but quosures don't work with data.table when using their "by" parameter.
I created a basic shell of how the function would operate. You'll see that while it fixes your example, it causes group aggregation to fail:
library(tidyverse, warn.conflicts = FALSE)
library(rlang, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
new_mutate. <- function(.df, ..., by = NULL) {
dots <- enquos(...)
by <- enquo(by)
by <- tidytable:::vec_selector_by(.df, !!by)
.df <- copy(.df)
dt_quo <- quo(.df[ , ':='(!!!dots), by = !!by][])
eval_tidy(dt_quo, .df)
}
data <- tibble(
id = seq(1, 3),
val_1 = seq(1, 3, 1),
val_2 = seq(4, 6, 1)
)
# Now works
data %>%
nest_by.(id) %>%
new_mutate.(example_1 = map2.(data, id, ~ .x %>%
new_mutate.(id = .y))) %>%
unnest.(example_1)
#> id val_1 val_2 id1
#> 1: 1 1 4 1
#> 2: 2 2 5 2
#> 3: 3 3 6 3
# Fails when using "by" parameter
data.table(x = 1:3, y = c("a", "a", "b")) %>%
new_mutate.(double_x = x * 2,
avg_x = mean(x),
by = y)
#> Error: Base operators are not defined for quosures.
#> Do you need to unquote the quosure?
#>
#> # Bad:
#> myquosure == rhs
#>
#> # Good:
#> !!myquosure == rhs
There's an addition to data.table coming that looks like it might help fix the issue, but we'll see.
In the meantime I will try to make a workaround, but unfortunately it won't work as smoothly as the tidyverse implementation.
from tidytable.
Quick question - are there any other situations you have run into that this limitation has come up?
One solution for your case would be for me to implement a .keep
argument in nest_by.()
. That way the columns used for nesting would be "kept" in the list column.
library(rlang, warn.conflicts = FALSE)
library(data.table, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
new_nest_by. <- function(.data, ..., .key = "data", .keep = FALSE) {
.data <- as_tidytable(.data)
.keep <- .keep
dots <- enexprs(...)
if (length(dots) == 0) {
.data <- tidytable:::eval_expr(.data[, list(data = list(.SD))])
} else {
split_vars <- tidytable:::dots_selector(.data, ...)
split_list <- group_split.(.data, !!!split_vars, .keep = .keep)
.data <- distinct.(.data, !!!split_vars)
.data <- mutate.(.data, data = !!split_list)
}
if (.key != "data") .data <- rename.(.data, !!.key := data)
.data
}
test_df <- data.table(
id = LETTERS[seq(1, 3)],
val_1 = seq(1, 3, 1),
val_2 = seq(4, 6, 1)
)
test_df %>%
new_nest_by.(id, .keep = TRUE) %>%
unnest.(data)
#> id id1 val_1 val_2
#> 1: A A 1 4
#> 2: B B 2 5
#> 3: C C 3 6
from tidytable.
I haven't had this issue on other situations, but I basically have the same workflow for many of the things that I do with tidytable.
Regarding the solution, the code that I posted is a bit of a toy example.
mutate.(example_1 = map2.(data, id, ~.x %>%
mutate.(id = .y)))
In reality, I do a couple (or sometimes more) operations inside the map function. The idea behind keeping the id
value inside the map2
function was because if I remember correctly I did a left_join
inside, and the id
was necessary to perform that operation correctly.
from tidytable.
Ah gotcha, makes sense.
Well for this specific use case I did add a .keep
arg to the dev version of nest_by.()
. That's part of the nest_by()
function coming in dplyr 1.0, so it was needed functionality.
As far as the issue overall, thanks for pointing it out. I'll keep working on it and see what I can come up with.
from tidytable.
Your examples now work in the GitHub version of tidytable. I should have it on CRAN in a week or so.
Thanks again for finding this!
library(dplyr, warn.conflicts = FALSE)
library(tidytable, warn.conflicts = FALSE)
data <- tibble(
id = LETTERS[seq(1, 3)],
val_1 = seq(1, 3, 1),
val_2 = seq(4, 6, 1)
)
data %>%
nest_by.(id) %>%
mutate.(example_1 = map2.(data, id, ~.x %>%
mutate.(id = .y))) %>%
unnest.(example_1)
#> id val_1 val_2 id1
#> 1: A 1 4 A
#> 2: B 2 5 B
#> 3: C 3 6 C
from tidytable.
Unfortunately I'm going to have to reopen this one. I found pretty massive performance issues from how I was getting quosures to work with data.table. Somehow the speed tests I did do completely missed these issues. The dev version of tidytable (v0.5.2) had to revert these changes.
I'll keep working on this! Apologies ahead of time for the inconvenience
from tidytable.
This is all fixed in v0.5.3 of tidytable now up on CRAN. Also passes a much wider variety of speed tests 🎉
library(tidytable, warn.conflicts = FALSE)
test_df <- data.table(
id = seq(1, 3),
val_1 = seq(1, 3, 1),
val_2 = seq(4, 6, 1)
)
test_df %>%
nest_by.(id) %>%
mutate.(example_1 = map2.(data, id, ~ .x %>%
mutate.(id = .y))) %>%
unnest.(example_1)
#> id val_1 val_2 id1
#> 1: 1 1 4 1
#> 2: 2 2 5 2
#> 3: 3 3 6 3
from tidytable.
Related Issues (20)
- Release v0.10.1
- Tidyverse 2 update HOT 1
- Patch release v0.10.2
- `tidytable` fails with `ceiling_date()` function HOT 2
- `distinct` not working with lists HOT 2
- using `pick()` inside `count()` leads to an unhelpful error HOT 5
- deprecated notation HOT 1
- Bug: unnest() HOT 2
- Failure with dev glue HOT 3
- Release v0.11.0 HOT 1
- When two tidytables are subtracted, the class `tidytable` gets lost on the way. HOT 2
- weird error using testthat HOT 2
- exposing list_flatten HOT 1
- summarize with across behaves in a strange way and gives an error when I use lapply inside
- Renaming in `group_by()` can lead to grouping by nonexistent columns
- ifelse and if_else HOT 1
- pmap is not working for data frames HOT 1
- Error "mutate_rowwise()" several columns. HOT 2
- Group_by bug with purrr::reduce HOT 4
- pmap() doesn't name list elements using existing names HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tidytable.