Git Product home page Git Product logo

unix_utils's Introduction

unix_utils

Like FileUtils, but provides zip, unzip, bzip2, bunzip2, tar, untar, sed, du, md5sum, shasum, cut, head, tail, wc, unix2dos, dos2unix, iconv, curl, perl, etc.

You must have these binaries in your PATH. Not a pure-ruby implementation of all these UNIX greats!

Works in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. No gem dependencies; uses stdlib

Real-world usage

Brighter Planet logo

We use unix_utils for data science at Brighter Planet and in production at

Originally extracted from remote_table

Philosophy

Use a subprocess to perform a big task and then get out of memory.

cover of the Unix Philosophy book Tenet 2: Make Each Program Do One Thing Well. The best programs, like Cousteau's lake fly, does but one task in its life and does it well. The program is loaded into memory, accomplishes its function, and then gets out ot the way to allow
the next single-minded program to begin. This sounds simple, yet it may surprise you how many software developers have difficulty sticking to this singular goal.

Rules (what you can expect)

For commands like zip, untar, sed, head, cut, dos2unix, etc.:

  1. Just returns a path to the output, randomly named, located in the system tmp dir (UnixUtils.unzip('kittens.zip) โ†’ '/tmp/unix_utils-129392301-kittens')
  2. Never touches the input
  3. Sticks a useful file extension on the output, if applicable (UnixUtils.tar('puppies/') โ†’ '/tmp/unix_utils-99293192-puppies.tar')

For commands like du, md5sum, shasum, etc.:

  1. Just returns the good stuff (the checksum, for example, not the filename that is listed after it in the standard command output)
  2. Never touches the input

But I can just spawn these myself

This lib was created to ease the pain of remembering command options for Gentoo, deciding which spawning method to use, possibly handling pipes...

require 'tmpdir'
destdir = File.join(Dir.tmpdir, "kittens_#{Kernel.rand(1e11)}")
require 'open3'
Open3.popen3('unzip', '-q', '-n', 'kittens.zip, '-d', destdir) do |stdin, stdout, stderr|
  stdin.close
  @error_message = stderr.read
end

is replaced safely with

destdir = UnixUtils.unzip 'kittens.zip'

But I can just use Digest::SHA256

(Note: Balazs Kutil pointed out this is a bad example... I will replace it soon)

This will load an entire file into memory before it can be processed...

require 'digest'
str = Digest::SHA256.hexdigest File.read('kittens.zip')

... so you're really replacing this ...

sha256 = Digest::SHA256.new
File.open('kittens.zip', 'r') do |f|
  while chunk = f.read(4_194_304)
    sha256 << chunk
  end
end
str = sha256.hexdigest

You get the same low memory footprint with

str = UnixUtils.shasum 'kittens.zip', 256

Compatibility

Uses open3 because it's in the Ruby stdlib and is consistent across MRI and JRuby.

Wishlist

  • cheat sheet based on GNU Coreutils cheat sheet
  • yarddocs
  • properly use Dir.tmpdir(name), etc.
  • smarter tmp file name generation - don't include url params for curl, etc.

Authors

Copyright

Copyright (c) 2012 Seamus Abshere

unix_utils's People

Contributors

etehtsea avatar seamusabshere avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.