<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

While I was testing commit <a class="commit-link" data-hovercard-type="commit" data-ho

Appreciate your accommodation! Awesome progress for <code class="not

dt_unnest_legacy() API change about tidytable HOT 8 CLOSED

markfairbanks commented on July 19, 2024 1

dt_unnest_legacy() API change

from tidytable.

Comments (8)

markfairbanks commented on July 19, 2024 1

The functionality of dt_unnest_legacy() will probably stay as is. See the comparison between tidydt::dt_unnest_legacy() and tidyr::unnest_legacy() below. The two functions operate quite similarly. The main difference being that the keep argument has to be specified in dt_unnest_legacy() since data.table doesn't have a concept of grouped data.tables.

Suggestion: let keep default to all columns (excluding those passed into col).

That being said, I'm open to there being a separate dt_unnest() that operates the way you suggested.

pacman::p_load(tidydt)

model_iris <- iris %>%
  dt_group_nest(Species) %>%
  dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3))) %>%
  dt_mutate(
    data_rows = dt_map(data, nrow),
    test_rows = dt_map(test, nrow)
  ) %>%
  dt_mutate(model = dt_map(data, ~ lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
  dt_mutate(fit = dt_map(model, ~ broom::tidy(.x)))

model_iris %>%
  dt_unnest_legacy(col = fit, keep = Species)
#>       Species        term  estimate  std.error statistic      p.value
#> 1:     setosa (Intercept) 2.6390012 0.31001431  8.512514 3.742438e-11
#> 2:     setosa Sepal.Width 0.6904897 0.08989888  7.680738 6.709843e-10
#> 3: versicolor (Intercept) 3.5397347 0.56287357  6.288685 9.069049e-08
#> 4: versicolor Sepal.Width 0.8650777 0.20193757  4.283887 8.771860e-05
#> 5:  virginica (Intercept) 3.9068365 0.75706053  5.160534 4.656345e-06
#> 6:  virginica Sepal.Width 0.9015345 0.25310551  3.561892 8.434625e-04

model_iris %>%
  tidyr::unnest_legacy(fit)
#>       Species        term  estimate  std.error statistic      p.value
#> 1:     setosa (Intercept) 2.6390012 0.31001431  8.512514 3.742438e-11
#> 2:     setosa Sepal.Width 0.6904897 0.08989888  7.680738 6.709843e-10
#> 3: versicolor (Intercept) 3.5397347 0.56287357  6.288685 9.069049e-08
#> 4: versicolor Sepal.Width 0.8650777 0.20193757  4.283887 8.771860e-05
#> 5:  virginica (Intercept) 3.9068365 0.75706053  5.160534 4.656345e-06
#> 6:  virginica Sepal.Width 0.9015345 0.25310551  3.561892 8.434625e-04

from tidytable.

leungi commented on July 19, 2024 1

Noted; thanks for the explanation.

I didn't realize you named it dt_unnest_legacy() precisely to match tidyr::unnest_legacy(); thought it's a side-effect of edits to original dt_unnest()` 😅

from tidytable.

markfairbanks commented on July 19, 2024 1

@leungi FYI - tidytable CRAN release 3.1 has a breaking change to the dt_unnest_legacy() syntax. It automatically "keeps" all unnested columns, and is 5-10x faster than before.

See news

The API will be stable from here on out, so this will be the last breaking change.

Here's an example to run to show the new behavior:

library(tidytable)

nested_df <- data.table::data.table(
  a = 1:10,
  b = 11:20,
  c = c(rep("a", 6), rep("b", 4)),
  d = c(rep("a", 4), rep("b", 6))) %>%
  dt_group_nest(c, d)

nested_df %>%
  dt_mutate(new_data = dt_map(data, ~ .x %>% dt_mutate(double_a = a * 2))) %>%
  dt_unnest_legacy(new_data)

from tidytable.

leungi commented on July 19, 2024

keep is very handy indeed!

Thank you for your consideration.

from tidytable.

leungi commented on July 19, 2024

While I was testing commit 451ed20, I realized dt_unnest_legacy() behaves differently than tidyr::unnest() due to the keep=NULL default args.

Suggestion: let keep default to all columns (excluding those passed into col).

library(tidydt)
#> 
#> Attaching package: 'tidydt'
#> The following object is masked from 'package:stats':
#> 
#>     dt

iris %>%
  dt_group_nest(Species) %>%
  dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3.5))) %>%
  dt_mutate(
    data_rows = dt_map(data, nrow),
    test_rows = dt_map(test, nrow)
  )
#>       Species         data         test data_rows test_rows
#> 1:     setosa <data.table> <data.table>        50        16
#> 2: versicolor <data.table> <data.table>        50         0
#> 3:  virginica <data.table> <data.table>        50         3

model_iris <- iris %>%
  dt_group_nest(Species) %>%
  dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3))) %>%
  dt_mutate(
    data_rows = dt_map(data, nrow),
    test_rows = dt_map(test, nrow)
  ) %>%
  dt_mutate(model = dt_map(data, ~ lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
  dt_mutate(fit = dt_map(model, ~ broom::tidy(.x)))

model_iris %>%
  dt_unnest_legacy(col = fit)
#>           term  estimate  std.error statistic      p.value
#> 1: (Intercept) 2.6390012 0.31001431  8.512514 3.742438e-11
#> 2: Sepal.Width 0.6904897 0.08989888  7.680738 6.709843e-10
#> 3: (Intercept) 3.5397347 0.56287357  6.288685 9.069049e-08
#> 4: Sepal.Width 0.8650777 0.20193757  4.283887 8.771860e-05
#> 5: (Intercept) 3.9068365 0.75706053  5.160534 4.656345e-06
#> 6: Sepal.Width 0.9015345 0.25310551  3.561892 8.434625e-04

model_iris %>%
  tidyr::unnest(fit)
#> # A tibble: 6 x 11
#>   Species data  test  data_rows test_rows model term  estimate std.error
#>   <fct>   <lis> <lis> <list>    <list>    <lis> <chr>    <dbl>     <dbl>
#> 1 setosa  <df[~ <df[~ <int [1]> <int [1]> <lm>  (Int~    2.64     0.310 
#> 2 setosa  <df[~ <df[~ <int [1]> <int [1]> <lm>  Sepa~    0.690    0.0899
#> 3 versic~ <df[~ <df[~ <int [1]> <int [1]> <lm>  (Int~    3.54     0.563 
#> 4 versic~ <df[~ <df[~ <int [1]> <int [1]> <lm>  Sepa~    0.865    0.202 
#> 5 virgin~ <df[~ <df[~ <int [1]> <int [1]> <lm>  (Int~    3.91     0.757 
#> 6 virgin~ <df[~ <df[~ <int [1]> <int [1]> <lm>  Sepa~    0.902    0.253 
#> # ... with 2 more variables: statistic <dbl>, p.value <dbl>

from tidytable.

markfairbanks commented on July 19, 2024

Haha no worries. I always personally used tidyr::unnest_legacy(). The original version of tidyr::unnest() worked like unnest_legacy(). When tidyr 1.0.0 was released last year they changed how unnest() worked to how it works now, and made unnest_legacy() so that people could still use the old functionality

from tidytable.

leungi commented on July 19, 2024

Indeed, I recall Hadley noting that change.

Thanks for clarification.

from tidytable.

leungi commented on July 19, 2024

Appreciate your accommodation!

Awesome progress for tidytable 👏

I'm slowly getting used to tidyr::unnest(., keep_empty = TRUE) 😆

from tidytable.

dt_unnest_legacy() API change about tidytable HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent