Git Product home page Git Product logo

r-packages's Introduction

r-packages

R Packages for Data Science

Table of Contents

  1. Development
    1. Testing and Exceptions
    2. Utilities
  2. Collection
  3. Storage
  4. Data Structure
    1. Data Frames
    2. Matrices
    3. Time Series
  5. Databases
  6. Data Transformation
  7. Data Validation and Cleaning
  8. Data Inspection and Summary
  9. Visualization
  10. Feature Transformation
    1. Missing Data
    2. Optimal Transforms
    3. Factors
    4. Representation Learning
  11. Feature Filtering and Dimension Reduction
  12. Feature Selection and Importance
  13. Preprocessing
  14. Models and Statistics
  15. Resampling
  16. Hyperparameter Optimization
  17. Performance
  18. Interpretation
  19. Report and Deploy

Development

  • devtools: tools for package development
  • usethis: automated package tasks
  • pacman: package installation/loading utilities
  • import: import single functions from a package
  • packrat: reproducible development with local package installation
  • jetpack: lightweight dependency management for projects and local installs
  • checkpoint: install packages from a specific time and date
  • config: configuration file management

Testing and Exceptions

  • testthat:

  • assertthat:

Utilities

  • butcher: remove unneeded data from a model
  • insight: extract model parameters; see also other packages by the same author
  • wrapr: operators and environments
  • zeallot: destructuring assignment
  • fastpipe: efficient pipe operators

Collection

  • rio: universal import and export
  • opendata: Task View
  • tabulizer: pdf table scraping

Storage

  • RData: native dataframe storage
  • RDS: native R objects
  • fst: a fast read/write format, like feather
  • archivist: collections of objects for reproducibility

Data Structures

Data Frames

  • base::data.frame: tabular data
  • data.table: efficient data frames
  • tibble: user-friendly data frames
  • disk.frame: for mid-sized data sets too large to fit in RAM

Matrices

  • Matrix:

Time Series

  • base::ts:

  • zoo:

  • xts: comprehensive library for timeseries built on zoo

  • tidyverts: tidy tools for time series

  • tsbox: universal time series converter and utilites

Databases

  • DBI: for database connections
  • dbplyr: dplyr db interface
  • rquery: query generator

Data Transformation

  • dplyr: piped data transformations
  • dtplyr: `dplyr` interface to `data.table`
  • seplyr: evaluation extensions for dplyr Quoted evaluation, evaluation partitioning.
  • cdata: generalized pivots using data description
  • forcats: categorical transformations
  • stringr: string manipulation

Data Validation and Cleaning

  • validate: powerful validation framework with error localization, imputation, and reporting
  • janitor: common cleaning operations
  • dataMaid: common checks and report generation
  • OpenRefine: Java app for cleaning and formatting
  • rrefine: R interface to OpenRefine
  • assertr: similar to validate but more lightweight
  • OfficialStatistics: CRAN view

Data Inspection and Summary

  • inspectdf: text summary and plots

  • visdat: data visualization

  • UpSetR: intersecting set visualization

  • corrplot:

  • base::head:

  • utils::str:

  • dplyr::glimpse:

  • base::summary: summary statistics

  • skimr: text summary

  • HMisc::describe: text summary

  • finalfit::ffglimpse: text summary

  • DescTools: Abstract and Describe, text summary and plots

  • DataExplorer: text summary and plots, reporting

Visualization

  • base:

  • lattice:

  • vcd: categorical plots in a lattice framework

  • ggplot2: see here for a list of extensions

  • cowplot: various extensions to ggplot, themes, formatting, gridplot of all classes

  • GGally: various extensions to ggplot, model diagnostics, formatting, and more

  • ggExtra: marginal plots for ggplot

  • ggpmisc: various extensions, drawing on plots, time series with peaks and valleys , stat plots with embedded statistics

  • WVPlots: explanatory and comparison plots in ggplot; "prescribed presentations"

  • ggfortify: `autoplot` for a large number of packages

  • ggRandomForests:

Feature Transformation

Missing Data

  • naniar: (check out the vignettes, too)
  • VIM: imputation of NA data and visualization its distribution
  • mice: multiple imputation
  • MissMech: tests for the nature of missingness
  • finalfit: various plots and reports

Optimal Transforms

  • acepack: regression transform selection
  • homals: optimal scaling transforms
  • aspect: optimal scaling transforms
  • vtreat: "y-aware" scaling

Factors

  • factorMerger: factor response clustering

Representation Learning

Feature Filtering and Dimension Reduction

Feature Selection and Importance

  • vip: variable importance plots
  • varImp: variable importance for random forests

Preprocessing

  • vtreat:

  • recipes:

  • mlrCPO:

Models and Statistics

  • nns: nonlinear nonparametric statistics with partial moments; classification, clustering and regression, correlation and dependence, forecasting

  • forecast:

  • smooth:

Resampling

  • vtreat: has CV index generating functions
  • sampler:

Hyperparameter Optimization

  • mlrMBO:

  • IRACE:

  • mlr:

  • dials:

Performance

  • performance:

  • plotmo: residuals, response, partial dependence

  • visreg: regression plots

  • car: regression plots

  • gamlss::wp: wormplots

Interpretation

  • DrWhy: comprehensive collection of tools for exploration and interpretation homepage github - a great resource; lots of interesting and useful packages Predictive Models: Explore, Explain, Debug - book by the authors of DrWhy
  • DALEX: local and global interpretation of variables tutorial
  • iml: local and global interpretation of variables with interaction support tutorial
  • LIME: local interpretation of variables tutorial

Report and Deploy

  • finalfit: creates formatted tables of many types
  • dashR: R interface to plotly's Dash framework
  • shiny:

r-packages's People

Contributors

ryanholbrook avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.