Git Product home page Git Product logo

tidydensity's Introduction

TidyDensity

CRAN_Status_Badge Lifecycle: stable PRs Welcome

The goal of {TidyDensity} is to make working with random numbers from different distributions easy. All tidy_ distribution functions provide the following components:

  • [r_]
  • [d_]
  • [q_]
  • [p_]

Installation

You can install the released version of {TidyDensity} from CRAN with:

install.packages("TidyDensity")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("spsanderson/TidyDensity")

Example

This is a basic example which shows you how to solve a common problem:

library(TidyDensity)
library(dplyr)
library(ggplot2)

tidy_normal()
#> # A tibble: 50 × 7
#>    sim_number     x      y    dx       dy     p      q
#>    <fct>      <int>  <dbl> <dbl>    <dbl> <dbl>  <dbl>
#>  1 1              1  0.227 -2.97 0.000238 0.590  0.227
#>  2 1              2  1.12  -2.84 0.000640 0.869  1.12 
#>  3 1              3  1.26  -2.71 0.00153  0.897  1.26 
#>  4 1              4  0.204 -2.58 0.00326  0.581  0.204
#>  5 1              5  1.04  -2.44 0.00620  0.852  1.04 
#>  6 1              6 -0.180 -2.31 0.0106   0.429 -0.180
#>  7 1              7  0.299 -2.18 0.0167   0.618  0.299
#>  8 1              8  1.73  -2.04 0.0243   0.959  1.73 
#>  9 1              9 -0.770 -1.91 0.0338   0.221 -0.770
#> 10 1             10  0.385 -1.78 0.0463   0.650  0.385
#> # ℹ 40 more rows

An example plot of the tidy_normal data.

tn <- tidy_normal(.n = 100, .num_sims = 6)

tidy_autoplot(tn, .plot_type = "density")

tidy_autoplot(tn, .plot_type = "quantile")

tidy_autoplot(tn, .plot_type = "probability")

tidy_autoplot(tn, .plot_type = "qq")

We can also take a look at the plots when the number of simulations is greater than nine. This will automatically turn off the legend as it will become too noisy.

tn <- tidy_normal(.n = 100, .num_sims = 20)

tidy_autoplot(tn, .plot_type = "density")

tidy_autoplot(tn, .plot_type = "quantile")

tidy_autoplot(tn, .plot_type = "probability")

tidy_autoplot(tn, .plot_type = "qq")

tidydensity's People

Contributors

kokbent avatar olivroy avatar spsanderson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

kokbent

tidydensity's Issues

Add `d_norm`, `q_norm`, `p_norm` column to `tidy_rnorm()`

tidy_rnorm <- function(.n = 50, .mean = 0, .sd = 1, .num_sims = 1){
  
  # Tidyeval ----
  n         <- as.integer(.n)
  num_sims <- as.integer(.num_sims)
  mu  <- as.numeric(.mean)
  std <- as.numeric(.sd)
  
  # Checks ----
  if(!is.integer(n) | n < 0){
    rlang::abort(
      "The parameters '.n' must be of class integer. Please pass a whole
            number like 50 or 100. It must be greater than 0."
    )
  }
  
  if(!is.integer(num_sims) | num_sims < 0){
    rlang::abort(
      "The parameter `.num_sims' must be of class integer. Please pass a
            whole number like 50 or 100. It must be greater than 0."
    )
  }
  
  if(!is.numeric(mu)){
    rlang::abort(
      "The parameters of '.mean' and '.sd' must be of class numeric.
            Please pass a numer like 1 or 1.1 etc."
    )
  }
  
  if(!is.numeric(std)){
    rlang::abort(
      "The parameters of '.mean' and '.sd' must be of class numeric.
            Please pass a numer like 1 or 1.1 etc."
    )
  }
  
  x <- seq(1, num_sims, 1)
  
  ps <- seq(-n, n-1, 2)
  qs <- seq(0, 1, (1/(n-1)))
  
  df <- dplyr::tibble(sim_number = as.factor(x)) %>%
    dplyr::group_by(sim_number) %>%
    dplyr::mutate(x = list(1:n)) %>%
    dplyr::mutate(y = list(stats::rnorm(n, mu, std))) %>%
    dplyr::mutate(d_norm = list(stats::dnorm(unlist(y), mu, std))) %>%
    dplyr::mutate(p_norm = list(stats::pnorm(ps, mu, std))) %>%
    dplyr::mutate(q_norm = list(stats::qnorm(qs, mu, std))) %>%
    tidyr::unnest(cols = c(x, y, d_norm, p_norm, q_norm)) %>%
    dplyr::ungroup()
  
  
  # Attach descriptive attributes to tibble
  attr(df, ".mean") <- .mean
  attr(df, ".sd") <- .sd
  attr(df, ".n") <- .n
  attr(df, ".num_sims") <- .num_sims
  attr(df, "tibble_type") <- "tidy_gaussian"
  attr(df, "ps") <- ps
  attr(df, "qs") <- qs
  
  # Return final result as function output
  return(df)
  
}

Update `tidy_autoplot()`

update tidy_autoplot() to work with all tidy_ distribution functions.

  • Add other tidy_ distribution functions
  • Make sure the probability plot uses x = ps
  • Make sure the quantile plot uses x = qs

crayon dependency

Describe the bug

The dev version of pillar no longer imports the crayon package, coloring is now provided via the cli package. Please change to conditional use of crayon or switch to cli.

To Reproduce

Install the development version of pillar, and TidyDensity, into a clean library, try to load the package.

Update `tidy_autoplot()`

Update tidy_autoplot() to plot out the tidy_multi_ distribution functions.

  ggplot2::theme_minimal() +
  ggplot2::geom_smooth(aes(group = dist_type), color = "black", linetype = "dashed",
                      se = FALSE, size = 0.5, alpha = .382)

Release TidyDensity 0.0.1

First release:

Prepare for release:

  • devtools::build_readme()
  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_dev_version()

Add function plotting functions

Add a function to plot the density of the tidy_distribution_tbl

Must find attributes(tn)$tibble_type == "tidy_qaussian"

plot_density()

tn <- tidy_rnorm(.num_sim = 3)
tn_grp_mean <- tn %>%
  group_by(sim_number) %>%
  summarise(grp_mean = mean(y, na.rm = TRUE)) %>%
  ungroup()

p1 <- tn %>%
  ggplot(aes(x = y, group = sim_number, color = sim_number)) +
  geom_density() + 
  geom_vline(data = tn_grp_mean, aes(xintercept = grp_mean, color = sim_number), 
                                     linetype ="dashed") +
  theme_minimal() + 
  theme(legend.position = "bottom") + 
  labs(
    title = "Density Distribution", 
    subtitle = paste0(
      "Simulations: ", attributes(tn)$.num_sims, 
      " - mu: ", attributes(tn)$.mean, 
      " - sd: ", attributes(tn)$.sd, 
      " - Random Points: ", attributes(tn)$.n
    ),
    color = "Simulation",
    x = "x",
    y = "Density"
  )

plot_probability()

need ps from tidy_rnorm()
x <- attributes(data)$ps
tn %>%
  ggplot(aes(
    x = x,
    y = p_norm,
    group = sim_number,
    color = sim_number
  )) +
  geom_point() + 
  geom_line() + 
  theme_minimal() + 
  theme(legend.position = "bottom") + 
  labs(
    title = "Probability Distribution", 
    subtitle = paste0(
      "Simulations: ", attributes(tn)$.num_sims, 
      " - mu: ", attributes(tn)$.mean, 
      " - sd: ", attributes(tn)$.sd, 
      " - Random Points: ", attributes(tn)$.n
    ),
    color = "Simulation",
    x = "x",
    y = "Probability"
  )

plot_quantile()

tn %>%
  ggplot(aes(
    x = x,
    y = q_norm,
    group = sim_number,
    color = sim_number
  )) +
  geom_point() + 
  geom_line() + 
  theme_minimal() + 
  theme(legend.position = "bottom") + 
  labs(
    title = "Quantile Distribution", 
    subtitle = paste0(
      "Simulations: ", attributes(tn)$.num_sims, 
      " - mu: ", attributes(tn)$.mean, 
      " - sd: ", attributes(tn)$.sd, 
      " - Random Points: ", attributes(tn)$.n
    ),
    color = "Simulation",
    x = "x",
    y = "Quantile"
  )

plot_qq()

tn %>%
  ggplot(aes(sample = y, group = sim_number, color = sim_number)) +
  stat_qq() + 
  stat_qq_line() +
  theme_minimal() + 
  theme(legend.position = "bottom") + 
  labs(
    title = "QQ-Plot", 
    subtitle = paste0(
      "Simulations: ", attributes(tn)$.num_sims, 
      " - mu: ", attributes(tn)$.mean, 
      " - sd: ", attributes(tn)$.sd, 
      " - Random Points: ", attributes(tn)$.n
    ),
    color = "Simulation",
    x = "Test-X",
    y = "Test-Y"
  )

plot_dot_density()

tn %>%
  ggplot(aes(x = y, y = d_norm, group = sim_number, color = sim_number)) +
  geom_point() + 
  geom_line() +
  theme_minimal() + 
  theme(legend.position = "bottom") + 
  labs(
    title = "Density Distribution", 
    subtitle = paste0(
      "Simulations: ", attributes(tn)$.num_sims, 
      " - mu: ", attributes(tn)$.mean, 
      " - sd: ", attributes(tn)$.sd, 
      " - Random Points: ", attributes(tn)$.n
    ),
    color = "Simulation",
    x = "x",
    y = "Density"
  )

Add function `tidy_rnorm()`

Is your feature request related to a problem? Please describe.
Make a function to generate randomly generated gaussian data. Add a parameter or .n for how many points should be generated and .num_walks for how many simulations should be run.

Describe the solution you'd like
A function to return the above, randomly generated gaussian data

Add function `tidy_rgamma()`

tidy_rgamma <- function(.n = 50, .shape = 1, .rate = 1, .num_sims = 1){
  
  # Tidyeval ----
  n        <- as.integer(.n)
  num_sims <- as.integer(.num_sims)
  shp <- .shape
  rte <- .rate
  
  # Checks ----
  if(!is.integer(n) | n < 0){
    rlang::abort(
      "The parameters '.n' must be of class integer. Please pass a whole
            number like 50 or 100. It must be greater than 0."
    )
  }
  
  if(!is.integer(num_sims) | num_sims < 0){
    rlang::abort(
      "The parameter `.num_sims' must be of class integer. Please pass a
            whole number like 50 or 100. It must be greater than 0."
    )
  }
  
  if(!is.numeric(shp) | shp < 0){
    rlang::abort(
      "The parameters of '.shapte' and '.rate' must be of class numeric.
            Please pass a numer like 1 or 1.1 etc. and must be greater than 0."
    )
  }
  
  if(!is.numeric(rte)){
    rlang::abort(
      "The parameters of '.shape' and '.rate' must be of class numeric.
            Please pass a numer like 1 or 1.1 etc."
    )
  }
  
  x <- seq(1, num_sims, 1)
  
  ps <- seq(-n, n-1, 2)
  qs <- seq(0, 1, (1/(n-1)))
  
  df <- dplyr::tibble(sim_number = as.factor(x)) %>%
    dplyr::group_by(sim_number) %>%
    dplyr::mutate(x = list(1:n)) %>%
    dplyr::mutate(y = list(stats::rgamma(n = n, shape = shp, rate = rte))) %>%
    dplyr::mutate(d_gamma = list(stats::dgamma(unlist(y), shape = shp, rate = rte))) %>%
    dplyr::mutate(p_gamma = list(stats::pgamma(ps, shape = shp, rate = rte))) %>%
    dplyr::mutate(q_gamma = list(stats::qgamma(qs, shape = shp, rate = rte))) %>%
    tidyr::unnest(cols = c(x, y, d_gamma, p_gamma, q_gamma)) %>%
    dplyr::ungroup()
  
  
  # Attach descriptive attributes to tibble
  attr(df, ".shape") <- .shape
  attr(df, ".rate") <- .rate
  attr(df, ".n") <- .n
  attr(df, ".num_sims") <- .num_sims
  attr(df, "tibble_type") <- "tidy_gamma"
  attr(df, "ps") <- ps
  attr(df, "qs") <- qs
  
  # Return final result as function output
  return(df)
  
}

tidy_rgamma()

Add function set for `tidy_multi_` distribution functions

library(tidyverse)

tidy_multi_normal <- function(.n = 50, .mean = c(-1,0,1), .sd = c(1), .num_sims = 2){
  
  # Tidyeval ----
  n <- as.integer(.n)
  mu <- as.numeric(.mean)
  std <- as.numeric(.sd)
  num_sims <- as.integer(.num_sims)
  
  # Checks ----
  if(!is.integer(n) | n <= 0){
    rlang::abort(
      "The .n parameter must be an integer and greater than 0"
    )
  }
  
  if(!is.integer(num_sims) | num_sims <= 1){
    rlang::abort(
      "The .num_sims parameter must be an integer grater than 1."
    )
  }
  
  if(!is.numeric(mu)){
    rlang::abort(
      "The .mean parameter must be numeric."
    )
  }
  
  if(!is.numeric(std)){
    rlang::abort(
      "The .sd parameter must be numeric"
    )
  }
  
  x <- seq(1, num_sims, 1)
  
  ps <- seq(-n, n-1, 2)
  qs <- seq(0, 1, (1/(n-1)))
  
  df <- expand_grid(
    sim_number = x,
    mu = mu,
    std = std
  )
  
  df <- df %>%
    mutate(sim_number = as.factor(sim_number)) %>%
    dplyr::group_by(sim_number, mu, std) %>%
    dplyr::mutate(x = list(1:n)) %>%
    dplyr::mutate(y = list(stats::rnorm(n, mu, std))) %>%
    dplyr::mutate(d = list(density(unlist(y), n = n)[c("x","y")] %>%
                             purrr::set_names("dx","dy") %>%
                             dplyr::as_tibble())) %>%
    dplyr::mutate(p = list(stats::pnorm(ps, mu, std))) %>%
    dplyr::mutate(q = list(stats::qnorm(qs, mu, std))) %>%
    tidyr::unnest(cols = c(x, y, d, p, q)) %>%
    dplyr::ungroup() %>%
    dplyr::mutate(dist_type = paste0("Gaussian: c(", mu, ", ", std, ")")) %>%
    dplyr::mutate(dist_type = as.factor(dist_type)) %>%
    dplyr::select(
      sim_number, dist_type, dplyr::everything()
    ) %>%
    dplyr::arrange(sim_number, dist_type)
  
  # Attach attributes ----
  attr(df, ".n") <- .n
  attr(df, ".mean") <- .mean
  attr(df, ".sd") <- .sd
  attr(df, ".num_sims") <- .num_sims
  attr(df, "tibble_type") <- "tidy_multi_gaussian"
  attr(df, "ps") <- ps
  attr(df, "qs") <- qs
  
  # Return ----
  return(df)
  
}

tn <- tidy_multi_normal(.n = 500,.num_sims = 5)

atb <- attributes(tn)
n <- atb$.n
sims <- atb$.num_sims
dist_type = stringr::str_remove(atb$tibble_type, "tidy_multi_") %>%
  stringr::str_to_title()
sub_title = paste0(
  "Grouped Gaussian - Data Points: ", n, " - ",
  "Simulations: ", sims, "\n",
  "Distribution Family: ", dist_type, "\n",
  "Parameters: ", if(atb$tibble_type == "tidy_multi_gaussian"){
    paste0("Mean: c(", paste0(toString(atb$.mean), ")", " - SD: c(", toString(atb$.sd), ")"))
  }
)

tn %>%
  ggplot2::ggplot(ggplot2::aes(
    x = dx, 
    y = dy, 
    group = interaction(dist_type, sim_number), 
    color = dist_type)) +
  ggplot2::geom_line() +
  ggplot2::theme_minimal() +
  ggplot2::theme(legend.position = "bottom") +
  ggplot2::labs(
    title = "Density Plot",
    subtitle = sub_title,
    x = "",
    y = "Density",
    col

image

> tn %>%
+   group_by(dist_type, sim_number) %>%
+   summarise(mean_mu = mean(y))
`summarise()` has grouped output by 'dist_type'. You can override using the `.groups` argument.
# A tibble: 15 x 3
# Groups:   dist_type [3]
   dist_type          sim_number  mean_mu
   <fct>              <fct>         <dbl>
 1 Gaussian: c(-1, 1) 1          -0.947  
 2 Gaussian: c(-1, 1) 2          -0.971  
 3 Gaussian: c(-1, 1) 3          -1.03   
 4 Gaussian: c(-1, 1) 4          -0.959  
 5 Gaussian: c(-1, 1) 5          -0.981  
 6 Gaussian: c(0, 1)  1           0.00196
 7 Gaussian: c(0, 1)  2           0.00435
 8 Gaussian: c(0, 1)  3           0.0225 
 9 Gaussian: c(0, 1)  4          -0.0188 
10 Gaussian: c(0, 1)  5          -0.0652 
11 Gaussian: c(1, 1)  1           0.936  
12 Gaussian: c(1, 1)  2           0.971  
13 Gaussian: c(1, 1)  3           1.01   
14 Gaussian: c(1, 1)  4           0.916  
15 Gaussian: c(1, 1)  5           0.913  
> tn %>%
+   group_by(dist_type) %>%
+   summarise(mean_mu = mean(y))
# A tibble: 3 x 2
  dist_type          mean_mu
  <fct>                <dbl>
1 Gaussian: c(-1, 1) -0.978 
2 Gaussian: c(0, 1)  -0.0110
3 Gaussian: c(1, 1)   0.950 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.