Git Product home page Git Product logo

node-stream-mmmagic's Introduction

node-stream-mmmagic

Build Status

Node module to sniff the start of a stream (non-destructively) to detect the file type and encoding when you don't have the luxury of being able to restart the stream again.

It does so by using buffer-peek-stream to get the first 16KB of the stream then send that to mmmagic (which uses libmagic). Before it's finished the peek stream will unshift the bytes it's received back onto the origin stream thereby making it appear as if the origin stream was new.

npm install stream-mmmagic

Use

var magic = require('stream-mmmagic');

var input = fs.createReadStream('somefile.csv');

magic(input, function (err, mime, output) {
  if (err) throw err;

  console.log('TYPE:', mime.type);
  console.log('ENCODING:', mime.encoding);

  // will print the *whole* file
  output.pipe(process.stdout);
});

//- TYPE: text/plain
//- ENCODING: us-ascii
//- <the file content>

options.magicFile Custom Magic File

A magic file is bundled with the mmmagic npm module but if you want to use your own then set the path to the file on the magicFile option.

const magicFile = '/usr/share/magic';
magic(input, {magicFile}, callback);

options.splitMime Original Mime String

Use {splitMime: false} option to get back the original mime string instead of a split object.

magic(input, {splitMime: false}, (err, mime, output) => {
  console.log(mime);
});
//- text/plain; charset=us-ascii

options.peekBytes Control Bytes Used for Analysis

As the input stream starts to get data the first 16KB is buffered and sent to libmagic for analysis to get file type and encoding. 1KB is more than enough for detecting file type with a standard magicFile but the reliabilty of getting the correct encoding is increased the more bytes are buffered. The tradeoff is performance and memory use.

Set peekBytes to the number of bytes you want buffered and sent to libmagic. For best results do not set below 256 bytes.

// somefile.txt is a utf8 file where the first doublebyte char is after the first 1KB of the file
var input = fs.createReadStream('somefile.txt');

magic(input, {peekBytes: 1024}, (err, mime, output) => {
  console.log(mime);
});
// not detected as utf8 because the first doublebyte char wasn't until later in the stream
//- text/plain; charset=us-ascii

magic(input, {peekBytes: 16384}, (err, mime, output) => {
  console.log(mime);
});
// now we're peeking 16KB into the file libmagic gets that first doublebyte char and knows it's utf8
//- text/plain; charset=utf8

LICENSE

MIT

node-stream-mmmagic's People

Contributors

seangarner avatar freitagbr avatar

Watchers

James Cloos avatar Quinn Diggity avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.