Git Product home page Git Product logo

regressinator's People

Contributors

capnrefsmmat avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

regressinator's Issues

Detect use of `$` in model formulas instead of `data =`

Suppose you do

pop <- population(...)
d <- sample_x(pop, 100) |> sample_y()

fit <- lm(d$y ~ d$x)

Now sampling_distribution(), model_lineup(), and the rest will not work correctly: they use update(fit, data = ...) to update the data, but fit doesn't use the data argument at all.

We should alert to this possibility. Perhaps, for instance, we can look at the call of fit and check whether there's a data argument provided, and give an informative warning otherwise.

Provide a way to get the population mean function

It may be useful (when plotting examples, for instance) to get the mean function for a particular population. For instance:

foo <- population(
  x = predictor("runif", min = 0, max = 10),
  y = response(1 + 2 * x, error_scale = 1)
)

f <- mean_function(foo)
f(x = 2) #=> 5

I suppose this API only works if the population has 1 response variable. An alternative would be to take a data frame of predictors and return a data frame with the mean responses, like sample_y() but without the randomness.

Support binomial responses with n > 1

Currently, sample_y() sets size = 1 when sampling binomial responses. It should be possible to simulate binomial outcome data with more than one trial, by somehow linking the number of trials to a column in the data.

Maybe response() can take some kind of weights argument, like glm() does, specifying the counts? Are there any other GLMs this mechanism would be useful for?

Support experimental designs

Instead of sampling X from the population, fix it according to a design, i.e. a data frame specifying the combinations of predictors to set. Something like this:

design <- some_design_function(...)

pop <- population(
  y = response(x1 + 2 * x2, error_scale = 1)
)

pop |>
  design_x(design) |>
  sample_y()

`partial_residuals()` misfires when at least one predictor is a factor

If one predictor is a factor, predictor_value will be a factor, even for those predictors that are not factors. This will make for confusing plots, since ggplot will treat the axes as factors even for numeric predictors, and try to label every unique value.

Partial residuals for factors do not seem useful, so it may be best to filter out factors when generating the data frame.

Robustly handle invalid response values

If evaluating the response expression in sample_y() fails, we throw a nice error giving the expression and indicating which predictors are available in the environment, since most failures would be from someone typo-ing a variable name.

But if we successfully evaluate the response expression, but then it's not valid for the inverse link function (e.g. it's a string or a list), we just barf an error:

Error in response$family$linkinv(as.numeric(.eval_verbosely(response$response_expr,  : 
  'list' object cannot be coerced to type 'double'

We should have an extra tryCatch() in sample_y() to detect this and show the response expression and its output type.

No error for missing `error_scale`

Reproducer:

library(regressinator)
pop <- population(
x = predictor("rnorm", mean = 5, sd = 4), y = response(
4 + 2 * x, # relationship between X and Y family = gaussian(), # errors are Normal error_scale = 10.0 # errors are N(0, 10^2)
) )

simulated_x <- sample_x(pop, n = 50)
sim_data <- sample_y(simulated_x)

Caused by a copy/paste error, this produces

Error in rnorm(n, mean = y_resp, sd = error_scale) : invalid arguments

This isn't a helpful error. response() should complain properly about the missing error_scale.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.