Git Product home page Git Product logo

Comments (8)

markfairbanks avatar markfairbanks commented on July 19, 2024 1

The functionality of dt_unnest_legacy() will probably stay as is. See the comparison between tidydt::dt_unnest_legacy() and tidyr::unnest_legacy() below. The two functions operate quite similarly. The main difference being that the keep argument has to be specified in dt_unnest_legacy() since data.table doesn't have a concept of grouped data.tables.

Suggestion: let keep default to all columns (excluding those passed into col).

That being said, I'm open to there being a separate dt_unnest() that operates the way you suggested.

pacman::p_load(tidydt)

model_iris <- iris %>%
  dt_group_nest(Species) %>%
  dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3))) %>%
  dt_mutate(
    data_rows = dt_map(data, nrow),
    test_rows = dt_map(test, nrow)
  ) %>%
  dt_mutate(model = dt_map(data, ~ lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
  dt_mutate(fit = dt_map(model, ~ broom::tidy(.x)))

model_iris %>%
  dt_unnest_legacy(col = fit, keep = Species)
#>       Species        term  estimate  std.error statistic      p.value
#> 1:     setosa (Intercept) 2.6390012 0.31001431  8.512514 3.742438e-11
#> 2:     setosa Sepal.Width 0.6904897 0.08989888  7.680738 6.709843e-10
#> 3: versicolor (Intercept) 3.5397347 0.56287357  6.288685 9.069049e-08
#> 4: versicolor Sepal.Width 0.8650777 0.20193757  4.283887 8.771860e-05
#> 5:  virginica (Intercept) 3.9068365 0.75706053  5.160534 4.656345e-06
#> 6:  virginica Sepal.Width 0.9015345 0.25310551  3.561892 8.434625e-04
model_iris %>%
  tidyr::unnest_legacy(fit)
#>       Species        term  estimate  std.error statistic      p.value
#> 1:     setosa (Intercept) 2.6390012 0.31001431  8.512514 3.742438e-11
#> 2:     setosa Sepal.Width 0.6904897 0.08989888  7.680738 6.709843e-10
#> 3: versicolor (Intercept) 3.5397347 0.56287357  6.288685 9.069049e-08
#> 4: versicolor Sepal.Width 0.8650777 0.20193757  4.283887 8.771860e-05
#> 5:  virginica (Intercept) 3.9068365 0.75706053  5.160534 4.656345e-06
#> 6:  virginica Sepal.Width 0.9015345 0.25310551  3.561892 8.434625e-04

from tidytable.

leungi avatar leungi commented on July 19, 2024 1

Noted; thanks for the explanation.

I didn't realize you named it dt_unnest_legacy() precisely to match tidyr::unnest_legacy(); thought it's a side-effect of edits to original dt_unnest()` 😅

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024 1

@leungi FYI - tidytable CRAN release 3.1 has a breaking change to the dt_unnest_legacy() syntax. It automatically "keeps" all unnested columns, and is 5-10x faster than before.

See news

The API will be stable from here on out, so this will be the last breaking change.

Here's an example to run to show the new behavior:

library(tidytable)

nested_df <- data.table::data.table(
  a = 1:10,
  b = 11:20,
  c = c(rep("a", 6), rep("b", 4)),
  d = c(rep("a", 4), rep("b", 6))) %>%
  dt_group_nest(c, d)

nested_df %>%
  dt_mutate(new_data = dt_map(data, ~ .x %>% dt_mutate(double_a = a * 2))) %>%
  dt_unnest_legacy(new_data)

from tidytable.

leungi avatar leungi commented on July 19, 2024

keep is very handy indeed!

Thank you for your consideration.

from tidytable.

leungi avatar leungi commented on July 19, 2024

While I was testing commit 451ed20, I realized dt_unnest_legacy() behaves differently than tidyr::unnest() due to the keep=NULL default args.

Suggestion: let keep default to all columns (excluding those passed into col).

library(tidydt)
#> 
#> Attaching package: 'tidydt'
#> The following object is masked from 'package:stats':
#> 
#>     dt

iris %>%
  dt_group_nest(Species) %>%
  dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3.5))) %>%
  dt_mutate(
    data_rows = dt_map(data, nrow),
    test_rows = dt_map(test, nrow)
  )
#>       Species         data         test data_rows test_rows
#> 1:     setosa <data.table> <data.table>        50        16
#> 2: versicolor <data.table> <data.table>        50         0
#> 3:  virginica <data.table> <data.table>        50         3

model_iris <- iris %>%
  dt_group_nest(Species) %>%
  dt_mutate(test = dt_map(data, ~ dt_filter(.x, Sepal.Width > 3))) %>%
  dt_mutate(
    data_rows = dt_map(data, nrow),
    test_rows = dt_map(test, nrow)
  ) %>%
  dt_mutate(model = dt_map(data, ~ lm(Sepal.Length ~ Sepal.Width, data = .x))) %>%
  dt_mutate(fit = dt_map(model, ~ broom::tidy(.x)))

model_iris %>%
  dt_unnest_legacy(col = fit)
#>           term  estimate  std.error statistic      p.value
#> 1: (Intercept) 2.6390012 0.31001431  8.512514 3.742438e-11
#> 2: Sepal.Width 0.6904897 0.08989888  7.680738 6.709843e-10
#> 3: (Intercept) 3.5397347 0.56287357  6.288685 9.069049e-08
#> 4: Sepal.Width 0.8650777 0.20193757  4.283887 8.771860e-05
#> 5: (Intercept) 3.9068365 0.75706053  5.160534 4.656345e-06
#> 6: Sepal.Width 0.9015345 0.25310551  3.561892 8.434625e-04

model_iris %>%
  tidyr::unnest(fit)
#> # A tibble: 6 x 11
#>   Species data  test  data_rows test_rows model term  estimate std.error
#>   <fct>   <lis> <lis> <list>    <list>    <lis> <chr>    <dbl>     <dbl>
#> 1 setosa  <df[~ <df[~ <int [1]> <int [1]> <lm>  (Int~    2.64     0.310 
#> 2 setosa  <df[~ <df[~ <int [1]> <int [1]> <lm>  Sepa~    0.690    0.0899
#> 3 versic~ <df[~ <df[~ <int [1]> <int [1]> <lm>  (Int~    3.54     0.563 
#> 4 versic~ <df[~ <df[~ <int [1]> <int [1]> <lm>  Sepa~    0.865    0.202 
#> 5 virgin~ <df[~ <df[~ <int [1]> <int [1]> <lm>  (Int~    3.91     0.757 
#> 6 virgin~ <df[~ <df[~ <int [1]> <int [1]> <lm>  Sepa~    0.902    0.253 
#> # ... with 2 more variables: statistic <dbl>, p.value <dbl>

from tidytable.

markfairbanks avatar markfairbanks commented on July 19, 2024

Haha no worries. I always personally used tidyr::unnest_legacy(). The original version of tidyr::unnest() worked like unnest_legacy(). When tidyr 1.0.0 was released last year they changed how unnest() worked to how it works now, and made unnest_legacy() so that people could still use the old functionality

from tidytable.

leungi avatar leungi commented on July 19, 2024

Indeed, I recall Hadley noting that change.

Thanks for clarification.

from tidytable.

leungi avatar leungi commented on July 19, 2024

Appreciate your accommodation!

Awesome progress for tidytable 👏

I'm slowly getting used to tidyr::unnest(., keep_empty = TRUE) 😆

from tidytable.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.