Git Product home page Git Product logo

bread's Introduction

bread

bread offers simple wrapper functions of data.table::fread() that aim at making it easier to use the "cmd" argument with shell Unix (and sometimes PowerShell if available) commands like grep, wd and sed. The functions auto-generate those commands from arguments provided to the function. The main use is to allow computers with low memory to analyze big files (the "b" in bread stands for "big files") and count rows, look up column names, subset rows by index numbers or value and select columns without hitting the memory limit (and the "cannot allocate vector of size" error.) bread functions allow to analyze a 50Gb file with a computer with 8Gb of memory and:

  • split it in several smaller ones by number of rows or by values in one or many columns
  • count the number of rows
  • subset it by row number or column values (string pattern or numerical value)
  • select only the relevant variables/columns

Best practices

There are other (better) ways to do that, like - for example - loading a large file in a SQLite database. Or not working on huge csv files in the first place. But I happened to use those commands often in order to explore data. If you have to, you hopefully won't have to delve right away into the fascinating grammar of Unix commands.

Pre-requisites

bread makes heavy use of Unix commands like grep, sed, wc and cut. They are available by default in all Unix environments. For Windows, you need to install those commands externally in order to simulate a Unix environment and make sure that the executables are in the Windows PATH variable. To my knowledge, the simplest ways are to install RTools, Git or Cygwin. If they have been correctly installed (with the expected registry entries), they will be detected on loading the package and the correct directories will be added automatically to the PATH.

Installation

# Install bread from CRAN
install.packages("bread")
# Or the development version from GitHub:
# install.packages("bread")
devtools::install_github("MagicHead99/bread")

bread's People

Contributors

webstat-bdf avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.