Comments (8)
The functionality of dt_unnest_legacy()
will probably stay as is. See the comparison between tidydt::dt_unnest_legacy()
and tidyr::unnest_legacy()
below. The two functions operate quite similarly. The main difference being that the keep
argument has to be specified in dt_unnest_legacy()
since data.table
doesn't have a concept of grouped data.tables.
Suggestion: let
keep
default to all columns (excluding those passed into col).
That being said, I'm open to there being a separate dt_unnest()
that operates the way you suggested.
pacman::p_load(tidydt)
model_iris <- iris %>%
dt_group_nest(Species) %>%
dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3))) %>%
dt_mutate(
data_rows = dt_map(data, nrow),
test_rows = dt_map(test, nrow)
) %>%
dt_mutate(model = dt_map(data, ~ lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
dt_mutate(fit = dt_map(model, ~ broom::tidy(.x)))
model_iris %>%
dt_unnest_legacy(col = fit, keep = Species)
#> Species term estimate std.error statistic p.value
#> 1: setosa (Intercept) 2.6390012 0.31001431 8.512514 3.742438e-11
#> 2: setosa Sepal.Width 0.6904897 0.08989888 7.680738 6.709843e-10
#> 3: versicolor (Intercept) 3.5397347 0.56287357 6.288685 9.069049e-08
#> 4: versicolor Sepal.Width 0.8650777 0.20193757 4.283887 8.771860e-05
#> 5: virginica (Intercept) 3.9068365 0.75706053 5.160534 4.656345e-06
#> 6: virginica Sepal.Width 0.9015345 0.25310551 3.561892 8.434625e-04
model_iris %>%
tidyr::unnest_legacy(fit)
#> Species term estimate std.error statistic p.value
#> 1: setosa (Intercept) 2.6390012 0.31001431 8.512514 3.742438e-11
#> 2: setosa Sepal.Width 0.6904897 0.08989888 7.680738 6.709843e-10
#> 3: versicolor (Intercept) 3.5397347 0.56287357 6.288685 9.069049e-08
#> 4: versicolor Sepal.Width 0.8650777 0.20193757 4.283887 8.771860e-05
#> 5: virginica (Intercept) 3.9068365 0.75706053 5.160534 4.656345e-06
#> 6: virginica Sepal.Width 0.9015345 0.25310551 3.561892 8.434625e-04
from tidytable.
Noted; thanks for the explanation.
I didn't realize you named it dt_unnest_legacy()
precisely to match tidyr::unnest_legacy(); thought it's a side-effect of edits to original
dt_unnest()` 😅
from tidytable.
@leungi FYI - tidytable CRAN release 3.1 has a breaking change to the dt_unnest_legacy()
syntax. It automatically "keeps" all unnested columns, and is 5-10x faster than before.
The API will be stable from here on out, so this will be the last breaking change.
Here's an example to run to show the new behavior:
library(tidytable)
nested_df <- data.table::data.table(
a = 1:10,
b = 11:20,
c = c(rep("a", 6), rep("b", 4)),
d = c(rep("a", 4), rep("b", 6))) %>%
dt_group_nest(c, d)
nested_df %>%
dt_mutate(new_data = dt_map(data, ~ .x %>% dt_mutate(double_a = a * 2))) %>%
dt_unnest_legacy(new_data)
from tidytable.
keep
is very handy indeed!
Thank you for your consideration.
from tidytable.
While I was testing commit 451ed20, I realized dt_unnest_legacy()
behaves differently than tidyr::unnest()
due to the keep=NULL
default args.
Suggestion: let keep
default to all columns (excluding those passed into col
).
library(tidydt)
#>
#> Attaching package: 'tidydt'
#> The following object is masked from 'package:stats':
#>
#> dt
iris %>%
dt_group_nest(Species) %>%
dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3.5))) %>%
dt_mutate(
data_rows = dt_map(data, nrow),
test_rows = dt_map(test, nrow)
)
#> Species data test data_rows test_rows
#> 1: setosa <data.table> <data.table> 50 16
#> 2: versicolor <data.table> <data.table> 50 0
#> 3: virginica <data.table> <data.table> 50 3
model_iris <- iris %>%
dt_group_nest(Species) %>%
dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3))) %>%
dt_mutate(
data_rows = dt_map(data, nrow),
test_rows = dt_map(test, nrow)
) %>%
dt_mutate(model = dt_map(data, ~ lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
dt_mutate(fit = dt_map(model, ~ broom::tidy(.x)))
model_iris %>%
dt_unnest_legacy(col = fit)
#> term estimate std.error statistic p.value
#> 1: (Intercept) 2.6390012 0.31001431 8.512514 3.742438e-11
#> 2: Sepal.Width 0.6904897 0.08989888 7.680738 6.709843e-10
#> 3: (Intercept) 3.5397347 0.56287357 6.288685 9.069049e-08
#> 4: Sepal.Width 0.8650777 0.20193757 4.283887 8.771860e-05
#> 5: (Intercept) 3.9068365 0.75706053 5.160534 4.656345e-06
#> 6: Sepal.Width 0.9015345 0.25310551 3.561892 8.434625e-04
model_iris %>%
tidyr::unnest(fit)
#> # A tibble: 6 x 11
#> Species data test data_rows test_rows model term estimate std.error
#> <fct> <lis> <lis> <list> <list> <lis> <chr> <dbl> <dbl>
#> 1 setosa <df[~ <df[~ <int [1]> <int [1]> <lm> (Int~ 2.64 0.310
#> 2 setosa <df[~ <df[~ <int [1]> <int [1]> <lm> Sepa~ 0.690 0.0899
#> 3 versic~ <df[~ <df[~ <int [1]> <int [1]> <lm> (Int~ 3.54 0.563
#> 4 versic~ <df[~ <df[~ <int [1]> <int [1]> <lm> Sepa~ 0.865 0.202
#> 5 virgin~ <df[~ <df[~ <int [1]> <int [1]> <lm> (Int~ 3.91 0.757
#> 6 virgin~ <df[~ <df[~ <int [1]> <int [1]> <lm> Sepa~ 0.902 0.253
#> # ... with 2 more variables: statistic <dbl>, p.value <dbl>
from tidytable.
Haha no worries. I always personally used tidyr::unnest_legacy()
. The original version of tidyr::unnest()
worked like unnest_legacy()
. When tidyr 1.0.0 was released last year they changed how unnest()
worked to how it works now, and made unnest_legacy()
so that people could still use the old functionality
from tidytable.
Indeed, I recall Hadley noting that change.
Thanks for clarification.
from tidytable.
Appreciate your accommodation!
Awesome progress for tidytable
👏
I'm slowly getting used to tidyr::unnest(., keep_empty = TRUE)
😆
from tidytable.
Related Issues (20)
- Release v0.10.1
- Tidyverse 2 update HOT 1
- Patch release v0.10.2
- `tidytable` fails with `ceiling_date()` function HOT 2
- `distinct` not working with lists HOT 2
- using `pick()` inside `count()` leads to an unhelpful error HOT 5
- deprecated notation HOT 1
- Bug: unnest() HOT 2
- Failure with dev glue HOT 3
- Release v0.11.0 HOT 1
- When two tidytables are subtracted, the class `tidytable` gets lost on the way. HOT 2
- weird error using testthat HOT 2
- exposing list_flatten HOT 1
- summarize with across behaves in a strange way and gives an error when I use lapply inside
- Renaming in `group_by()` can lead to grouping by nonexistent columns
- ifelse and if_else HOT 1
- pmap is not working for data frames HOT 1
- Error "mutate_rowwise()" several columns. HOT 2
- Group_by bug with purrr::reduce HOT 4
- pmap() doesn't name list elements using existing names HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tidytable.