Git Product home page Git Product logo

Comments (8)

topepo avatar topepo commented on June 9, 2024

Those seem pretty reasonable but I think that it would be good to have the generic for estimate only involve the second argument to be less restrictive.

We also have the convention to make the first argument x whenever possible. That might seem like we are being uptight, but these generics are meant to be used in a broad context. For example:

estimate <- function(.data, .model){
  UseMethod("estimate")
}

I would want to use this on an existing model object (where the data have already been consumed). For example, I wrote an S package for Shewhart charts a long time ago and wanted a generic that would return estimates of the process mean and variance (mean and var weren't generic back then). This signature would preclude something like that.

Are you doing double dispatch on these two objects? If not, you wouldn't lose anything by using a single argument.

from generics.

mitchelloharawild avatar mitchelloharawild commented on June 9, 2024

Completely agree with the usage of estimate(). Actually I changed this for the same reason last week in tidyverts/fabletools@d34f1c7.

The only reason it was so restrictive, was because I had originally planned to use this functionality only internally.

I don't feel too strongly about the name of the first argument, however I think using .data will restrict the context of the generic in a beneficial way. In what scenarios do you anticipate estimate() to be used without data as the first argument? Consistency for the usage of this generic may result in less cognitive load for the users.

from generics.

topepo avatar topepo commented on June 9, 2024

In what scenarios do you anticipate estimate() to be used without data as the first argument

One thing that I'd use it for is unsupervised methods. So if I have an object with PCA loadings, I'd use estimate(object, new_data = df) to get projections for new data points.

from generics.

mitchelloharawild avatar mitchelloharawild commented on June 9, 2024

Sure, x or object is fine. So generics for model(x, ...) and estimate(x, ...)?

How general do you think the documentation of their functionality should be? Should it distinguish functionality between these generics, or should the methods have flexibility to use them inconsistently.

For example, in fable these generics would be used as follows:
model.tbl_ts(x, ...) trains multiple model definitions to data, where x is a tsibble, and ... are the model definitions.
estimate.tbl_ts(x, .model, ...) trains a single model definition to data, where x is a tsibble and .model is the model definition. ... is unused.

Having some recommended usage of these verbs would make it easier for users to learn their functionality, it would also make them less flexible.

from generics.

topepo avatar topepo commented on June 9, 2024

I'd suggest x for both. I wouldn't really get too specific about how we think that these should be used. I think that the doc files can give examples of what existing methods do.

from generics.

mitchelloharawild avatar mitchelloharawild commented on June 9, 2024

Sounds reasonable. I'll work on this a bit and make a PR.

from generics.

hadley avatar hadley commented on June 9, 2024

I think it's most important that you give some thought to the type signature of the generic — i.e. what does it return? Does it return a data frame? A tibble? An object of the same type as x?

from generics.

mitchelloharawild avatar mitchelloharawild commented on June 9, 2024

The implementation in the fable series of packages is:

model(.data, ...)

  • Returns a mable object (a tibble with model attributes).
  • Rows of models are identified by groups of the input .data (keys+groups in a tsibble).
  • Columns of models are specified in the ....
  • Cells are the result from a call to estimate() with the appropriate data split and model definition.
  • summarise-esque semantics. Respects groups and reduces data into summary statistics (model fit parameters).
  • Input is a tsibble (tibble), output is a mable (tibble)

estimate(.data, .model, ...)

  • Returns a model object (a list containing the model specification, response, transformation, and the fit object (result from model training method).
  • Input is a tsibble (tibble), output is a model (list)

edit: Using estimate() is discouraged, but is exported to allow users access to the lower level objects if they're particularly inquisitive. It also makes the nest-map-unnest workflow better if they're uncomfortable with using model().


These functions dispatch on a data object, and so if a similar approach is supported for cross-sectional modelling there would not be many more methods required. So I think the purpose of this generic may be less about consistent functionality, but more about avoiding namespace conflicts.

You could also argue that estimate() should dispatch on .model rather than .data, which could make it easier to define model training methods. Currently fable keeps the model's training method in the R6 class for the model definition.

from generics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.