Git Product home page Git Product logo

node-dbf's Introduction

IMPORTANT NOTICE

This library is no longer being actively maintained.

If you're interested in taking ownership, please leave a comment on: #46

DBF Parser

This is an event-based dBase file parser for very efficiently reading data from *.dbf files.

Build Status Node Version NPM Version NPM Downloads

The codebase is written in ES6 JavaScript but compiled in the npm module to pure JavaScript.

To get started, simply install the module using npm:

$ npm install node-dbf

and then import it:

import Parser from 'node-dbf';

Classes

There are two classes - the Parser and the Header. The Parser is the most interesting class.

Parser

This class is the main interface for reading data from dBase files. It extends EventEmitter and its output is via events.

new Parser(path, options)

  • path String The full path to the DBF file to parse
  • options Object An object containing options for the parser.

The support options are:

  • encoding String The character encoding to use (default = utf-8)

Creates a new Parser and attaches it to the specified filename.

import Parser from 'node-dbf';

let parser = new Parser('/path/to/my/dbase/file.dbf');

parser.on(event, listener)

  • event String The event name to listen for (see below for details)
  • listener Function The callback to bind to the event

This method is inherited from the EventEmitter class.

parser.parse()

Call this method once you have bound to the events you are interested in. Although it returns the parser object (for chaining), all the dBase data is outputted via events.

parser.parse();

Event: 'start'

  • parser Parser The parser object

This event is emitted as soon as the parser.parse() method has been invoked.

Event: 'header'

  • header Header The header object as parsed from the dBase file

This event is emitted once the header has been parsed from the dBase file

Event: 'record'

  • record Object An object representing the record that has been found

The record object will have a key for each field within the record, named after the field. It is trimmed (leading and trailing) of any blank characters (dBase files use \x20 for padding).

In addition to the fields, the object contains two special keys:

  • @sequenceNumber Number indicates the order in which it was extracted
  • @deleted Boolean whether this record has been deleted or not

This object may look like:

{
    "@sequenceNumber": 123,
    "@deleted": false,
    "firstName": "John",
    "lastName": "Smith"
}

Event: 'end'

  • parser Parser The parser object

This event is fired once the dBase parsing is complete and there are no more records remaining.

Usage

The following code example illustrates a very simple usage for this module:

import Parser from 'node-dbf';

let parser = new Parser('/path/to/my/dbase/file.dbf');

parser.on('start', (p) => {
    console.log('dBase file parsing has started');
});

parser.on('header', (h) => {
    console.log('dBase file header has been parsed');
});

parser.on('record', (record) => {
    console.log('Name: ' + record.firstName + ' ' + record.lastName); // Name: John Smith
});

parser.on('end', (p) => {
    console.log('Finished parsing the dBase file');
});

parser.parse();

Command-Line Interface (CLI)

The parser also supports a command-line interface (CLI) for converting DBF files to CSV. You can invoke it as follows:

$ node-dbf convert /path/to/file.dbf

This will write the converted rows to stdout and metadata about the process (e.g. number of rows, etc) to stderr. This allows you to write stdout directly to an output file, for example:

$ node-dbf convert file.dbf > file.csv

For more help information on using the command line options, use the integrated help:

$ node-dbf help

Tests

Tests are written in Mocha using Chai BDD for the expectations. Data on San Francisco zip codes was used as a reference test file - downloaded from SF OpenData and included in the ./test/fixtures/bayarea_zipcodes.dbf file within the repository.

To Do

  • Add more tests
  • Add support for field types other than Character and Numeric
  • Use fs.readStream instead of fs.readFile for increased performance
  • Add a CLI interface for converting to CSV, etc
  • Improve error handling to emit an error event

node-dbf's People

Contributors

abstractvector avatar cutofffrequency avatar davidbruant avatar eschwartz avatar irandom avatar moklick avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

node-dbf's Issues

dbf writer

do you plan make dbf-writer? in my task i need export data into dbf... i would be greate if you add such functionality

thx

@deleted condition is incomplete

I noticed that the '@deleted' property in the record objects was not always accurate (possibly related to
this issue)

I found in the .DBF specification (and confirmed) that the deleted flag could have one of four different values: 20h, 2Ah, '*', and " " (blank).

The assignment for record['@deleted'] in lib/parser.js was only accounting for one. I fixed this by (locally) changing the assignment to:

'@deleted': ( (buffer.slice(0, 1))[0] === 42 || (buffer.slice(0, 1))[0] === '*' ) ? true : false,

I hope this helps

Read register by key

Hi.

Is it possible to read only one register, instead of the whole DBF file? I want to know a concrete value, a CARD ID, from person ID. I have the ID, and I want to know the CARD ID from it without reading the whole file (SQL style).

Thanks in advance.

Field Length Problem

This code is really helpful and performs well.

But, I had problems with DBF columns having lengths >128 characters, which are parsed wrong (i.e. column lengths appear as <0). IMHO, there is a bug in function convertBinaryToInteger() in header.js. The code uses buffer.readIntLE(0, buffer.length) but should use the unsigned version instead: buffer.readUIntLE(0, buffer.length).

Thanks,
Ken

Got some unknown characters

Desktop.zip

I have a dbf file whose column names are partially alphabets and partially unknown characters.
How can I get value of a specific field in this case?

I have attached the dbf file as well as the screenshot of output that I get when I am reading dbf file.

converting to csv always throw -> Error: ENOENT: no such file or directory, open ...

I honestly dont understand why the file path passed to cli as
$ node-dbf convert file.dbf

get joined to node-dbf folder inside node_modules.

I ended up replacing this line:

.action(function(f) { file = path.join(__dirname, '../', f); })

with:
.action(function(f) { file = f; })

this solved the issue and allowed me to convert to csv, but feels like I'm missing something.

License?

What license is this released under?

Error encoding

In the module "node-dbf" there is the following problem:
the encoding with large files,

buffer = overflow + buffer //return string utf-8

replace

buffer = Buffer.concat([overflow,buffer], buffer.length + overflow.length); //return buffer

fields, records "N" may be not integer ("N 15 3"),
when processing fields of type "F" is equal to 0 it returns NaN.

value = +value; //for field.type 'F','N' (standard no leading zeros)

I would also like to ask to add the ability to specify the decoding function of string

value = (buffer.toString @encoding).trim()

replace

if (this.encodingFunction) {
    value = (this.encodingFunction(buffer)).trim();
} else {
    value = (buffer.toString(this.encoding)).trim();
}

and add type the date

if (field.type === 'D') {
    value = new Date(+value.slice(0,4), +value.slice(4,6), +value.slice(6,8));
}

Encoding

Missing encoding in parse.js
_fs2.default.createReadStream(_this2.filename);

_fs2.default.createReadStream(_this2.filename, 'latin1');

or

var stream = _fs2.default.createReadStream(_this2.filename, _this2.header.encoding);

Parser handles Logical fields incorrectly

Logical fields are parsed to null because there is an error in parser.js on line 166:

switch (value) {
case ['Y', 'y', 'T', 't'].includes(value):
   value = true;
   break;
case ['N', 'n', 'F', 'f'].includes(value):
   value = false;
   break`
default:
   value = null;
}

It should be:

switch(true){

support for CP1252 encoding

according to what I read on the internet, DBF doesn't use UTF-8 encoding (for characters like è, é, ü, and so on...), but the Windows CP1252 encoding.

Could you fix that so it supports CP1252 decoding?

Thx!

Read big file issue with overflow

Node v 4.3.1
This code don`t work

if (overflow !== null) {
  buffer = overflow + buffer;
}

It actually decrease buffer length
This is my 2 chunks read log

read 65536
read + overflow 65536
record 93
overflow 86
read 65536
read + overflow 64298

The first record in 2nd chunk was parsed wrong.


Buffer.concat do the trick

if (overflow !== null) {
 buffer = Buffer.concat([overflow, buffer]);
}

3 chunk log
read 65536
read + overflow 65536
record 93
overflow 86
read 65536
read + overflow 65622
record 93
overflow 57
read 65536
read + overflow 65593

Doesn't 'end' should be emitted on stream 'close' event instead?

I think this line should be the event 'close' from the stream:

return stream.on('end', () => {

Because I have some nested promises creating multiple parsers, it seems that when I call parser.parse(), the stream gets "mixed" somehow and the parsing is totally wrong. For example:

const parseDbfFile = (file) =>  new Promise((resolve => {
 const results = [];
 // instanciate parser etc...
 stream.on('record', record => results.push(record));
 stream.on('end', () => resolve(results));
});

parseDbfFile(fileA).then(resA => {
  // resA is fine
  return parseDbfFile(fileB).then(resB => {
    // resB is completely wrong <======
  });
});

I don't understand much about streams in node.js but changing the event from 'end' to 'close' on the source code fixes my issue. If someone knows what is happening behind I would love to know more about it.

parseInt in parseField() - Floating Point Number Issue

I'm parsing some GPS coordinates and it appears that all numbers are parsed as integers. I tracked it down to parseInt() in parseField() in parser.js.

I changed that to parseFloat() and now I'm getting the correct decimal numbers.

Library no longer maintained

As you may have noticed from the lack of activity, this library is no longer being maintained. I no longer have any use cases for DBF files, and many of the issues, bugs and feature requests refer to capabilities that the test files I was using don't cover.

If you're interested in taking over this library and updating it, please let me know by leaving a comment on this issue and a leaving a preferred way for me to contact you.

Unidentified 15 in logs

I've tried the following code:

myFunction: function (req, res) {

      var uploadedFile = req.files['fileId'],
      var parser = new Parser(uploadedFile.path);
      parser.parse();

And I've got a mysterious 15 appearing in my console. If I comment the parser.parse() line it does no longer display the 15 so this is really coming from the parser.

parses float as integer

I am using node-dbf version 0.2.1
I have float values (including negative values) in my dbf file. Its getting parsed as integer.

Float read as NaN

Hi, I have this problem reading a DBF file with this

PRECIO1,N,14,2

That is a float (I opened the file in Libre Office Calc)

Any ideas on how I can open it?

'Record' events block any other operation

Love your library, very easy to use and works pretty well.

I'm having an issue where if I call another function from the 'record' event that performs some sort of operation, like a mysql query, it only performs the operation for the first record. It won't perform the function for subsequent records until after the 'end' event.

For example:

parser.on('start', function() {
  console.log('start parsing');
});
parser.on('record', function(record) {
  var sql = 'insert into table(field1,field2) values(?,?);';
  var inserts = [];
  inserts.push(record.field1);
  inserts.push(record.field2);
  console.log('inserting');
  var cmd = connection.format(sql, inserts);
  connection.query(cmd, function(err, res) {
    if (err) {
      console.log('MYSQL: ' + sql + ' \n' + err);
      return;
    }
    console.log('inserted successfully');
  });
});
parser.on('end', function() {
  console.log('end parsing');
});

If I run the parser on a dataset of 100 records, I'll see 'start parsing', 'inserting' x 100, 'end parsing', then 'inserted successfully' x 100. In my real world application I'm reading datasets with several 100,000 rows, so this issue is much more apparent.

I'm not sure what could be causing the issue, perhaps reading the file as a stream would allow the db operation to flow through. Let me know if you would like any more specific details or troubleshooting, I'd be more than happy to help. Again, thanks for your work on this project, it's proven very useful to me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.