Git Product home page Git Product logo

catcher's Introduction

catcher - Drop In Caching for R Functions

Build Status codecov

Author: Chris Hua / [email protected]

There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton

Easily speed up repeated expensive operations by saving the results on disk and loading from disk on subsequent runs, with many quality-of-life enhancements for data scientists.

Installation

Install from this repository:

if(!require("devtools")) install.packages("devtools")
devtools::install_github("stillmatic/catcher")

Why catcher

This package is very light but includes a number of important helpers: advanced management of the cache and when to use the cache. It does not require use of databases or any other dependencies. Its core functions are easy to understand and can easily replace existing code without breaking old functionality. This package additionally takes care of all necessary hashing to store files.

Existing R caching packages are either too complicated, e.g. R.cache, or too barebones, e.g. simpleRCache. This package attempts to create a sensible middle ground, with a number of tools essential for data scientists built in. For example, we offer first-class handling of stale cached data and notation of when data was loaded. Existing

In terms of design, this package is essentially a hashtable held solely on disk and 'instantiated' by examining the files in your cache directory.

This package was inspired by some of the work I did on Rbnb while at Airbnb Data Science this summer. Particular thanks are due to Ricardo Bion and Jason Goodman for their earlier work on this problem.

Usage

The main function is cache_op():

cache_op("read.csv", 
  "https://cdn.rawgit.com/Keno/8573181/raw/7e97f56f521d1f49b966e04457687e87da1b062b/gistfile1.txt", 
  header = T)

This is equivalent to:

read.csv("https://cdn.rawgit.com/Keno/8573181/raw/7e97f56f521d1f49b966e04457687e87da1b062b/gistfile1.txt", 
  header = T)

You can also use this library to create wrapped versions of your own functions. Take for example a hypothetical function sql that connects to your SQL database:

sql <- function(query, user = Sys.getEnv("db_auth_user"), pass = Sys.getEnv("db_auth_pass")) {
  # ...
}

sql("SELECT * FROM dim_users LIMIT 100")

You can wrap this function easily with the following:

sql_c <- function(query, ...) {
  catcher::cache_op("sql", query, ...)
}

sql_c("SELECT * FROM dim_users LIMIT 100")

Easily reproducible proof of concept:

sin_c <- function(query, ...) { 
  catcher::cache_op("sin", query, ...)
}

sin_c(pi)
# reading from cache created at 2016-09-03 14:03:32; 0.05 days old.
# [1] 1.224606e-16

Options

The function cache_op() takes a few additional arguments which customize the behavior of the caching.

  • use_cache - self explanatory; sometimes you want to bypass the cache and neither load the file from cache nor save the query results to cache. Default is TRUE, i.e. use the caching functionality.
  • overwrite - this is useful when the file exists in the cache; ie has been cached before, but you want to update the cached value with the newest version.
  • max_lifetime - the max number of days old a cached file can be. Default is 30 days. If the cached version is older than this, then the function will rerun and overwrite with the new data.

When wrapping your own custom functions with caches, you can modify these parameters however you want. Let's say you have some function get_ga() that gathers Google Analytics data for the last week, so you want to redownload the data if it's been more than 7 days since the last update. Then, you would initiate it as:

get_ga_c <- function(query, ...) { 
  catcher::cache_op("get_ga", max_lifetime = 7, query, ...)
}

# or, if you want to expose catcher's options:

get_ga_c <- function(query, max_lifetime = 7, ...) {
  catcher::cache_op("get_ga", max_lifetime = max_lifetime, query, ...)
}

catcher's People

Contributors

jsdelivrbot avatar stillmatic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

jsdelivrbot

catcher's Issues

configR options

currently we hardcode a few options, which are defaults that should work for most people. however, some may wish to enforce different defaults given their setups (in particular, where the ds_cache is located.

TODO:

  • write configR
  • ds_cache name/location
  • defaults of use_cache / overwrite / max_lifetime

new name

there was an old package named cacher by @rdpeng, but was archived. should probably think of a different name tho

  • gottacachethemall
  • cachethemall

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.