Git Product home page Git Product logo

tidyups's Introduction

tidyups

This repo is a work-in-progress exploration of a process for formalising bigger changes to the tidyverse. It is used by the tidyverse team to propose and discuss changes that have significant cross-package scope. The process itself is still very much a work in progress, but you can read the current outline in 001-tidyup-process.md

tidyups's People

Contributors

davisvaughan avatar hadley avatar tracykteal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tidyups's Issues

What's the process for changing/updating tidyups?

  • Do we need to version them somehow? (e.g. just do a version number in the doc metadata)
  • Should they be immutable? (i.e. you propose a new tidyup that supersedes a previous)
  • Feels like substantial changes should be proposed through a tidyup

Write up options options

# Problems to solve:
# 1. Getting values:
#    1. Consistent naming scheme that pairs well with auto-complete
#    2. Defines a default value in one place
#    3. Provides fallback to deprecated options (possibly in other packages)
#    4. Can vary defaults (e.g. quiet in tests)
#    5. Messages user about important defaults
#    6. Can use env vars if needed
# 2. Setting values:
#    1. Consistent naming scheme that pairs well with auto-complete
#    2. Should be able to generate deprecation warnings
#    3. Helpers to reduce need for custom `local_`/`with_` wrappers
# 3. Documentation
#    1. Document options where they're used
#    2. Provide a central list of all options for a package

# Get only ----------------------------------------------------------------
# This is the current state of the art

option_foo <- function() {
  getOption("package_foo", 10)
}

option_foo()
options(package_foo = 10)


# Complex getters ---------------------------------------------------------
# Some examples of where our option getters do more work

# Fallback to deprecated option in same package
option_foo <- function() {
  getOption("package_foo") %||% getOption("package_old") %||% 10
}
# Fallback to deprecated option in different package
option_foo <- function() {
  getOption("package_foo") %||% getOption("other_foo") %||% 10
}
# Deprecate this option
option_foo <- function() {
  lifecycle::deprecate_warn("1.0.0", "option_foo()", "option_bar()")
  getOption("package_foo", 10)
}
# Also look in an env var
option_foo <- function() {
  getOption("package_foo") %||% get_env("PACKAGE_FOO") %||% 10
}
# Vary option in tests
option_foo <- function() {
  if (is_testing()) {
    getOption("package_foo", 0)
  } else {
    getOption("package_foo", 10)
  }
}
# Inform user that we're using a default
option_foo <- function() {
  default <- 10
  if (!has_option("package_foo")) {
    inform("Using `package_foo` option with default of {default}")
  }
  getOption("package_foo", default)
}

# Get + set ----------------------------------------------------------------
# I don't know if we've done this anywhere but any obvious next step
# would be to also provide a function for setting the option.

option_foo_get <- function() {
  getOption("package_foo", 10)
}
option_foo_set <- function(val) {
  invisible(options(package_foo = val)$package_foo)
}

option_foo_get()
option_foo_set(10)

# Since _get() called more often could consider breaking symmetry
# and calling it option_foo().

# Can't do this because there has to be an argument
option_foo() <- 20

# One function to get and set ---------------------------------------------

option_foo <- function(val) {
  if (missing(val)) {
    getOption("package_foo", 10)
  } else {
    options(package_foo = val)
  }
}
option_foo()
option_foo(20)

# Could organise multiple options into a list - list name MUST include
# package name to avoid too frequent clashes. Could also consider
# options_package since that would autocomplete from options.

package_options <- list(
  foo = function(val) {
    if (missing(val)) {
      getOption("package_foo", 10)
    } else {
      options(package_foo = val)
    }
  }
)
package_options$foo()
package_options$foo(20)

# Active bindings ---------------------------------------------------------

makeActiveBinding("option_foo", function(val) {
  if (missing(val)) {
    getOption("package_foo", 10)
  } else {
    options(package_foo = val)
    # Has to return value so x <- y <- z isn't surprising
    invisible(val)
  }
}, globalenv())

option_foo
option_foo <- 20

# https://github.com/r-lib/withr/issues/87
local_bindings(option_foo = 20)
with_bindings(option_foo = 20, option_foo)

# I'm assuming it works like
old <- option_foo
on.exit(option_foo <- old)
body
# Bit if of misuse of local_bindings

local_options(options_foo = 10, {
  print(option_foo)
})


local_options(pkg::options$foo = 10, {
  print(option_foo)
})


# Could arrange active bindings into an environment

package_options <- new_environment()
env_bind_active(package_options,
  foo = function(val) {
    if (missing(val)) {
      getOption("package_foo", 10)
    } else {
      options(package_foo = val)
      # Has to return value so x <- y <- z isn't surprising
      invisible(val)
    }
  }
)
package_options$foo
package_options$foo <- 20

with_option("rlang", foo = 20, bar = 20, {
  rlang_options$foo + rlang_options$bar
})

Write more about standard processess

Brain dump below


These are defaults; they're not required, but since they're the convention, if you depart from them you'll need to document how.
This documentation should go in CONTRIBUTING.md.

Smaller conventions on the more technical side can be found in r-lib/usethis#1416 (comment).
Many of the smaller components are not explicitly mentioned in this doc.

Authorship

See 004-governance for details.
In brief we use the following terms:

  • Contributors are thanked in blog posts, but not recorded in the package.

  • Authors are listed in Authors@R.

  • The (one) maintainer is listed listed in Authors@R with role "cre", e.g. person(given = "Hadley", family = "Wickham", role = c("aut", "cre"), email = "[email protected]").

  • Additionally, for packages written primarily by RStudio employees, we list RStudio as copyright holder and funder: person(given = "RStudio", role = c("cph", "fnd")).

Code contribution

Code is usually contributed via PR, even for authors who could push directly.
Particularly high-stakes project may want to protect the main branch and require "request reviews before merging".

All pull requests should be reviewed by at least one other author.
Helps both people learn, and ensures that every project has at least one person apart from the maintainer with some idea of what's going on.
The primary exception to this rule is when a package is still very young, and the interface and internals are changing rapidly.
Even in this situation it's a good idea to still make a PR, let it rest overnight, and then re-read the next morning.

PRs are usually squashed-merged so that individual contributors don't need to worry about maintaining a clean history.
Rewrite commit message if that happens: https://style.tidyverse.org/gitgithub.html#commit-messages.
If a PR is larger and the author has carefully created a clean history, you can

Licensing

Where possible, we use the MIT license.
This is a simple, well understood license that makes our open source code as easy for others to use a possible.
Because it's very permissive, there's no need for a CLA.

There are some exceptions: if bundling open source code with other licenses will need to ensure that the package license is compatible.
This most commonly arises with GPL C code; in that case, just match the license of the package to the license of the bundled code.

Whenever you bundled any code, the primary authors must be listed in Authors@R, and the licenses listed in LICENSE.note.
See details in https://r-pkgs.org/license.html#code-you-bundle for more details and best practices.

The easiest way to add an MIT license is to use usethis::use_mit_license().
This was revamped at {DATE}, so if you have an older package check that the LICENSE uses the updated author wording ("all authors of the package").
There is no need to regularly update the copyright year, but you will need to check it prior to initial CRAN submission.

Backward compatibility

Any backward incompatible changes (i.e. changes that cause reverse dependencies to fail R CMD check or are likely to cause problems in user code) must be approved by the maintainer.
Significant backward incompatible changes need to be accompanied with a plan for how they will be communicated to the community.

Release process

Package releases are made on an as-needed basis, and increment either the major, minor, or patch version depending on the scope of the release.
The process itself is defined by usethis::use_release_issue().

Style

Follow the advice in http://style.tidyverse.org/

Documentation

It's important!
Put time into it.
We use roxygen2.
Conventions at https://style.tidyverse.org/documentation.html.

Every site should have a pkgdown website build automatically by GitHub actions and hosted on GitHub pages.
Url should be either {pkgname}.{site}.org or pkgs.rstudio.com/{pkgname}.
Listed in URL field of DESCRIPTION.

Readme should follow template in usethis::use_readme_rmd().
Start with a brief overview of the package, then get into a meaningful example as quickly as possible.

Readme should include code of conduct at bottom.

News file following https://style.tidyverse.org/news.html.

Testing

We use testthat 3e, along with covr for test coverage.
Don't strive for 100% test coverage, but use it tactically to double check that your tests cover what you think they cover.
Test style in https://style.tidyverse.org/tests.html.

Issues

Batched processing.
Regular triage.

Welcoming and inclusive, but efficient.
Invest time into carefully crafted responses that we re-use a lot.

Feedback on tidyup 006: Ordering of `dplyr::group_by()`

We’d love to get your thoughts on https://github.com/tidyverse/tidyups/blob/main/006-dplyr-group-by-ordering.md, a proposal to swap out the internal algorithm used by group_by() with one that is often more performant.

The main potential issue with this adjustment is that we would be switching to using the C locale when ordering character grouping columns, where previously we used the system locale (through order()). As a reminder, group_by() internally orders the group keys, so that the next summarize() returns sorted group keys along with the computed summary columns. A result of the change proposed in this tidyup is that the order of those group keys would be different than before, since they would no longer respect the system locale.

Please feel free to contribute however you feel comfortable — you're welcome to make small fixes via PR, comment here, or open bigger discussion topics in an new issue. If there are things you’d prefer to discuss in private, please feel free to email me. I’ll plan to close the discussion on September 15, so we can review and make adjustments as needed.

@markfairbanks, @mgirlich, @eutwt, @dgrtwo, @mine-cetinkaya-rundel

Feedback on tidyup 004: Governance model

We’d love to get your thoughts on https://github.com/tidyverse/tidyups/blob/main/004-governance.md, a proposal to define a standard governance model across the tidyverse. Hopefully there are no major surprises here; this is the model we've implicitly been operating under for a while.

Please feel free to contribute however you feel comfortable — you're welcome to make small fixes via PR, comment here, or open bigger discussion topics in an new issue. If there are things you’d prefer to discuss in private, please feel free to email me. I’ll plan to close the discussion on August 18, so we can review and make adjustments as needed.

@tidyverse/authors

Feedback on the "tidyup" process for making big changes to the tidyverse

@clauswilke, @dgrtwo, @dpseidel, @karawoo, @krlmlr, @LucyMcGowan, @paleolimbot, @smbache, @vspinu, @yutannihilation, @apreshill, @jjallaire, @colearendt, @dcossyleon, @edgararuiz-zz, @evanmiller, @kohske, @markfairbanks, @mgirlich

We’d love to get your thoughts on https://github.com/tidyverse/tidyups/blob/main/001-tidyup-process.md, a proposal for a process for making bigger changes to the tidyverse. Please feel free to contribute however you feel comfortable — you're welcome to make small fixes via PR, comment here, or open bigger discussion topics in an issue. If there are things you’d prefer to discuss in private, please feel free to email me. We’ll plan to close the discussion on August 13, so we can review and make adjustments as needed.

Feedback on tidyup 3: Radix Ordering in `dplyr::arrange()`

@krlmlr, @markfairbanks, @mgirlich, @earowang, @yutannihilation

We'd love to get your thoughts on tidyup 3, Radix Ordering in dplyr::arrange() https://github.com/tidyverse/tidyups/blob/main/003-dplyr-radix-ordering.md.

If you aren't familiar with tidyups, you can read about them here:
https://github.com/tidyverse/tidyups/blob/main/001-tidyup-process.md

At a high level, we are planning on re-working arrange() on top of a new version of vctrs::vec_order(), which is backed by a radix ordering algorithm that is inspired by the one used by data.table. This should improve performance, in particular with character vectors, but to properly support ordering in various locales we are proposing adding a .locale argument to arrange() that defaults to American English, but would accept any locale identifier returned by stringi::stri_locale_list().

Please feel free to contribute however you feel comfortable — you're welcome to make small fixes via PR or comment here for broader discussions. In particular, @yutannihilation if you have any thoughts on how impactful this might be for Japanese R users, we'd love to hear them!

We'll plan to close the discussion on August 20th.

Feedback on tidyup 2: stringr <-> tidyr realignment

@tidyverse/authors, @AmeliaMN, @rpruim, @hardin47, @rundel, @beanumber, @gagolews, @rjpat, @jminnier:

We’d love to get your thoughts on https://github.com/tidyverse/tidyups/blob/main/002-tidyr-stringr.md, a proposal to align the functions that break apart strings into variables/observations in stringr and tidyr. The goal is to come up with a cohesive family of string manipulation functions that tackle common problems that arise during data tidying. You'd first encounter the high level tidyr functions that work with variables in data frames, and later, when you learn more about programming and string manipulation, you'd learn about the lower level stringr functions that work with character vectors.

Please feel free to contribute however you feel comfortable — you're welcome to make small fixes via PR, comment here, or open bigger discussion topics in an new issue. If there are things you’d prefer to discuss in private, please feel free to email me. I’ll plan to close the discussion on Jan 17, so we can start on implementation early next year.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.