milesmcbain / friendlyeval Goto Github PK

View Code? Open in Web Editor NEW

107.0 10.0 6.0 132 KB

A friendly interface to tidyeval/rlang that will excuse itself when you're done.

License: Other

R 100.00%

r dplyr rlang friendly-interface

friendlyeval's Introduction

friendlyeval

A friendly interface to the tidy eval framework and the rlang package for casual dplyr users.

This package provides an alternative, auto-complete friendly interface to rlang that is more closely aligned with the task domain of a user 'programming with dplyr'. It implements most of the cases in the 'programming with dplyr' vignette.

The interface can also convert itself to standard rlang with the help of an RStudio addin that replaces friendlyeval functions with their rlang equivalents. This allows you to prototype in friendly, then subsequently automagically transform to rlang. Your friends won't know the difference.

Installation

devtools::install_github("milesmcbain/friendlyeval")

Overview

Arguments passed to dplyr can be treated as:

a single literal column name (e.g. mpg in select(mtcars, mpg))
a single expression (e.g. cyl <= 6 in filter(mtcars, cyl <= 6))
a list of literal column names (e.g. mpg, cyl in select(mtcars, mpg, cyl))
a list of expressions (e.g. hp >= mean(hp), wt > 3 in filter(mtcars, hp >= mean(hp), wt > 3))

dplyr uses special argument handling to interpret and treat those arguments as one or more column names or expressions. User functions don't perform that same argument handling, so we need some way to tell dplyr how to treat these arguments we pass from our enclosing functions. rlang provides the functions we need to do just that, but knowing which rlang function maps to each use case requires a fairly nuanced understanding of metaprogramming concepts.

friendlyeval helps bridge that gap by providing a descriptive (and auto-complete friendly) set of eight complimentary functions that instruct dplyr to resolve arguments we pass using either:

the literal input provided as the arguments to our function (e.g. the text lat and lon in my_select(dat, lat, lon))
the string values of those arguments (e.g. "lat" and "lon" in my_select(dat, arg1 = "lat", "lon")):

function	usage
`treat_input_as_col`	Treat the literal text input provided as a `dplyr` column name.
`treat_input_as_expr`	Treat the literal text input provided as an expression involving a `dplyr` column name (e.g. in `filter(dat, col1 == 0)`, `col1 == 0` is an expression involving the value of col1).
`treat_inputs_as_cols`	Treat a comma separated list of arguments as a list of `dplyr` column names.
`treat_inputs_as_exprs`	Treat a comma separated list of arguments as a list of expressions.
`treat_string_as_col`	Treat the character value of your function argument as a `dplyr` column name.
`treat_string_as_expr`	Treat the character value of your function argument as an expression involving a `dplyr` column name.
`treat_strings_as_cols`	Treat a list of character values as a list of `dplyr` column names.
`treat_strings_as_exprs`	Treat a vector of strings as a list of expressions involving `dplyr` column names.

These eight functions are used in conjunction with three tidy eval operators:

!!
!!!
:=

!! and !!! are signposts that tell dplyr:

"Stop! This needs to be evaluated to resolve column names or expressions".

!! tells dplyr to expect a single column name or expression, whereas !!! says to expect a list of column names or expressions.

:= is used in place of = in the special case where we need dplyr to resolve a column name on the left hand side of an = like in mutate(!!treat_input_as_col(colname) = rownumber). Evaluating on the left hand side in this example is not legal R syntax, so instead we must write: mutate(!!treat_input_as_col(colname) := rownumber).

Writing functions that call `dplyr`

dplyr functions try to be user-friendly by saving you typing. This allows you to write code like mutate(data, col1 = abs(col2), col3 = col4*100) instead of the more cumbersome base R style: data$col <- abs(data$col2); data$col3 <- data$col4*100.

The cost of this convenience is more work when we want to write functions that call dplyr, because dplyr needs to be instructed how to treat the arguments we pass to it. For example, this function does not work as we might expect:

double_col <- function(dat, arg){
  mutate(dat, result = arg*2)
}

double_col(mtcars, cyl)

# Error in mutate_impl(.data, dots) : 
#   Evaluation error: object 'cyl' not found.

This is because our double_col function doesn't perform the same special argument handling as dplyr functions. What if we pass our column name as a string value instead?

double_col(mtcars, arg = 'cyl')

# Error in mutate_impl(.data, dots) : 
#  Evaluation error: non-numeric argument to binary operator.

That doesn't work either, even though those were our only options under normal evaluation rules! Fortunately, there are two ways to make double_col work. We can either:

Instruct dplyr to treat the literal input provided for the arg argument as a column name. So double_col(mtcars, cyl) would work.
Instruct dplyr to treat the string value bound to arg - "cyl" - as a column name, rather than as a normal character vector. So double_col(mtcars, arg = "cyl") would work.

Usage examples

Making `double_col` work

Using what was input, dplyr style:

double_col <- function(dat, arg){
  mutate(dat, result = !!treat_input_as_col(arg) * 2)
}

## working call form:
double_col(mtcars, cyl)

Using the supplied value:

double_col <- function(dat, arg){
  mutate(dat, result = !!treat_string_as_col(arg) * 2)
}

## working call form:
double_col(mtcars, arg = 'cyl')

Supplying column names to assign results to (lhs variant)

A more useful version of double_col would be to allow the name of the resulting column to be set via the function. Again, this can be done using the literal input, dplyr style:

double_col <- function(dat, arg, result){
  ## note usage of ':=' for lhs eval. 
  mutate(dat, !!treat_input_as_col(result) := !!treat_input_as_col(arg) * 2)
}

## working call form:
double_col(mtcars, cyl, cylx2)

Or using supplied values:

double_col <- function(dat, arg, result){
  ## note usage of ':=' for lhs eval. 
  mutate(dat, !!treat_string_as_col(result) := !!treat_string_as_col(arg) * 2)
}

## working call form:
double_col(mtcars, arg = 'cyl',  result = 'cylx2')

Working with argument lists containing column names

When wrapping group_by, you will likely want to pass a list of column names. Here's how to do that using what was input, dplyr style:

reverse_group_by <- function(dat, ...){
  ## this expression is split out for readability, but it can be nested into below.
  groups <- treat_inputs_as_cols(...)

  group_by(dat, !!!rev(groups))
}

## working call form
reverse_group_by(mtcars, gear, am)

Here's how to do it using a list of values:

reverse_group_by <- function(dat, columns){
  groups <- treat_strings_as_cols(columns)

  group_by(dat, !!!rev(groups))
}

## working call form:
reverse_group_by(mtcars, c('gear', 'am'))

And here's how to do it using the values of ...:

reverse_group_by <- function(dat, ...){
  ## note the list() around ... to collect the values of the arguments into a list.
  groups <- treat_strings_as_cols(list(...)) 

  group_by(dat, !!!rev(groups))
}

## working call form:
reverse_group_by(mtcars, 'gear', 'am')

Passing expressions involving columns

Using the _expr functions, you can pass expressions involving column names to dplyr functions like filter, mutate and summarise. An example of a function involving an expression is a more general version of the double_col function from above, called double_anything, that can take expressions involving columns, and double those:

double_anything <- function(dat, arg){
  mutate(dat, result = (!!treat_input_as_expr(arg)) * 2)
}

## working call form:
double_anything(mtcars, cyl * am)

##     mpg cyl  disp  hp drat    wt  qsec vs am gear carb result
## 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4     12
## 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4     12
## 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1      8
## 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      0
## 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2      0

A common usage pattern is to take a list of expressions. Consider the filter_loudly function that reports the number of rows filtered:

filter_loudly(mtcars, cyl >= 6, am == 1) 

## Filtered out 27 rows.
##   mpg cyl disp  hp drat    wt  qsec vs am gear carb
## 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## 3 15.8   8  351 264 4.22 3.170 14.50  0  1    5    4
## 4 19.7   6  145 175 3.62 2.770 15.50  0  1    5    6
## 5 15.0   8  301 335 3.54 3.570 14.60  0  1    5    8

You can implement this function using filtering expressions exactly as input, dplyr style:

filter_loudly <- function(x, ...){
  in_rows <- nrow(x)
  out <- filter(x, !!!treat_inputs_as_exprs(...))
  out_rows <- nrow(out)
  message("Filtered out ",in_rows-out_rows," rows.")
  return(out)
}

## working call form:
## filter_loudly(mtcars, cyl >= 6, am == 1)

Or using a list/vector of values, parsed as expressions:

filter_loudly <- function(x, filter_expressions){
  ## if accepting list arguments, should check all are character
  stopifnot(purrr::every(filter_expressions, is.character))
  
  in_rows <- nrow(x)
  out <- filter(x, !!!treat_strings_as_exprs(filter_expressions))
  out_rows <- nrow(out)
  message("Filtered out ",in_rows-out_rows," rows.")
  return(out)
}

## working call form:
## filter_loudly(mtcars, list('cyl >= 6', 1))

Or capturing the values of ..., parsed as expressions:

filter_loudly <- function(x, ...){
  dots <- list(...)
  ## if accepting list arguments, should check all are character
  stopifnot(purrr::every(dots, is.character))
  
  in_rows <- nrow(x)
  out <- filter(x, !!!treat_strings_as_exprs(dots))
  out_rows <- nrow(out)
  message("Filtered out ",in_rows-out_rows," rows.")
  return(out) 
}

## working call form:
## filter_loudly(mtcars, 'cyl >= 6', 'am == 1')

A note on expressions vs columns

It may have occurred to you that there are cases where a column name is a valid expression and vice versa. This is true, and it means that in some situations, you could switch the _col and the _expr versions of functions (e.g. use treat_input_as_expr in place of treat_input_as_col) and things would continue to work. However, using the col version where appropriate invokes checks that assert that what was passed to it can be interpreted as a simple column name. This is useful in situations where expressions are not permitted, like in select or on the left hand side of the internal assignment in mutate: e.g. mutate(lhs_col = some_expr).

friendlyeval's People

Contributors

Stargazers

Watchers

Forkers

pkq anthonyebert arun-ramamurthy ransinghsatyajitray latlio johnmackintosh

friendlyeval's Issues

Regex need to be more robust.

Once the second iteration of the functions is nailed down, the regular expressions in the transform need to use word boundaries, so they can't accidently transform other similar text. The main benefit is it means the transforms can be applied in any order, which is a bug time bomb at the moment.

Add another RStudio add-in to convert rlang back to friendlyeval?

Thanks for the package. I found it useful for beginners to learn rlang, so why not include another add-in to replace rlang functions with their friendlyeval counterparts?

Unexpected behaviour when passing string to be evaluated as expression

Below is a reprex.

The goal of function is to allow custom variables and functions for a group_by() + summarise().

In many tidyeval examples I see, authors used ... as function arguments to to do this; however, I'd like to have an explicit definition of function to be passed to summarise().

If there's a better way to do this, please enlighten me!

library(dplyr)
library(friendlyeval)

## this works
GroupNSummarise <- function(.data, .var, .group, .fun) {
  group <- treat_strings_as_cols(.group)
  func <- treat_string_as_expr(.fun)
  var <- treat_strings_as_cols(.var)
  
  .data %>% 
    group_by(!!!group) %>% 
    summarise_at(vars(.var), ~ eval(func))
}

mtcars %>% 
  GroupNSummarise(.var = c("mpg", "wt"),
                  .group = c("cyl"),
                  .fun = 'mean(., na.rm = TRUE)')
#> Warning: package 'bindrcpp' was built under R version 3.4.4
#> # A tibble: 3 x 3
#>     cyl   mpg    wt
#>   <dbl> <dbl> <dbl>
#> 1     4  26.7  2.29
#> 2     6  19.7  3.12
#> 3     8  15.1  4.00

## this fails
GroupNSummarise2 <- function(.data, .var, .group, .fun) {
  group <- treat_strings_as_cols(.group)
  func <- treat_string_as_expr(.fun)
  var <- treat_strings_as_cols(.var)
  
  .data %>% 
    group_by(!!!group) %>% 
    summarise_at(vars(.var), ~ !!(func))
}

mtcars %>% 
  GroupNSummarise2(.var = c("mpg", "wt"),
                  .group = c("cyl"),
                  .fun = 'mean(., na.rm = TRUE)')
#> Error in summarise_impl(.data, dots): Evaluation error: invalid argument type.

Add non-Rstudio extension transform option.

It would be nice to offer a transform at the file level for non-rstudio users. There is probably code that can be stolen from deplearning, packup, and mmmisc to piece this together.

Better example!

Since it seems like the example I had suggested (filter_loudly), while maybe a fun & useful function, was not the best for this particular use case maybe we can think up a better one for the README? (And perhaps use filter_loudly as an example function for when tidyeval not needed?). I'll try to find or think up a good, simple example to contribute, but thought I'd first poll if you or anyone else might already have one in mind...

function names

This package is a great idea! One thought -- I found the function names a bit confusing and unwieldy. Here are some ideas:

bare_input
bare_input_lhs
bare_input_list
string_input
string_input_list

It seems like typed is intending to mean a user typed it in... I think there is a clash with the "type" concept of having objects with a specified type. I'm not sure if "bare" is super clear either, but at least I don't think it has as much of a clash with another programming concept.

For value, my sense is that this means string in which case I think string is more clear but maybe I am not thinking about other possibilities...

The as_name bit I think is a bit confusing because I think it leads to the question of "name of what?" even though it seems to be technically appropriate. It makes the function names super long too...

These suggestions surely have flaws too, but wanted to get a convo going with some ideas 😄

installing friendlyeval broke dplyr rename

Hi @MilesMcBain ,

I installed your friendlyeval package today to get around my frustration at functionalizing dplyr to work on arbitrary named columns. The function I wrote using friendlyeval::treat_string_as_col() works beautifully... BUT in the course of installing friendlyeval from github (and its dependencies, including rlang), I seem to have screwed up dplyr's rename.

Very simple reproducible example from a site that I was reading about this bug:

iris %>%
as_tibble() %>%
rename(my_species = Species)
Error: my_species = Species must be a symbol or a string, not a formula
Call rlang::last_error() to see a backtrace

This is true even after I uninstall rlang and try to install it (version 0.3.1) from CRAN rather than Github as advised in researching this bug on the web.

Have you seen this before with your package? Can you guide me (a newish R programmer) on how to get dplyr rename (and perhaps other aspects) working correctly again?

Copying functions from rlang now fixes version on installation

If you do the following:

typed_as_name <- rlang::enquo

the function code is copied from rlang into your own package at time of installation. That also means that when an updated version of rlang is installed by the user, your function typed_as_name still uses the old rlang::enquo code. The prefered (and documented) way to get around this issue, is to wrap it in a function (see eg explanation in Writing R extensions:

typed_as_name <- function(arg) rlang::enquo(arg)

But in that case typed_as_name won't work any longer due to dplyr::mytate_impl() not being able to process arg correctly. Same happens when you try with

typed_as_name <- function(arg) eval.parent(rlang::enquo(arg))

Given the nature of the function rlang::enquo, this issue is pretty hard to get around. Then again, these rlang functions are unlikely to change so this shouldn't really be a problem in everyday use.

error when using rlang 0.3.3 and dplyr 0.8.0.1

I'm not sure which of the packages is causing the error, but here is the error I receive when running the filter_loudly example in the readme. This is running on R 3.5.2 with dplyr 0.8.0.1, rlang 0.3.3, and friendlyeval 0.1.1:

filter_loudly <- function(x, ...){
  dots <- list(...)
  ## if accepting list arguments, should check all are character
  stopifnot(purrr::every(dots, is.character))
  
  in_rows <- nrow(x)
  out <- filter(x, !!!treat_strings_as_exprs(dots))
  out_rows <- nrow(out)
  message("Filtered out ",in_rows-out_rows," rows.")
  return(out) 
}

data(mtcars)
filter_loudly(mtcars, 'cyl >= 6', 'am == 1')

Error:

Error: `x` must be a character vector or an R connection
Call `rlang::last_error()` to see a backtrace 

> rlang::last_error()
<error>
message: `x` must be a character vector or an R connection
class:   `rlang_error`
backtrace:
 1. global::filter_loudly(mtcars, "cyl >= 6", "am == 1")
 8. friendlyeval::treat_strings_as_exprs(dots)
 9. rlang::parse_exprs(arg)
Call `rlang::last_trace()` to see the full backtrace

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.