leonidas-from-xiv / node-xml2js Goto Github PK

XML to JavaScript object converter.

License: MIT License

CoffeeScript 100.00%

coffeescript parsing xml-parser xml xml2json xml2js javascript nodejs node-js node

node-xml2js's Introduction

node-xml2js

Ever had the urge to parse XML? And wanted to access the data in some sane, easy way? Don't want to compile a C parser, for whatever reason? Then xml2js is what you're looking for!

Description

Simple XML to JavaScript object converter. It supports bi-directional conversion. Uses sax-js and xmlbuilder-js.

Note: If you're looking for a full DOM parser, you probably want JSDom.

Installation

Simplest way to install xml2js is to use npm, just npm install xml2js which will download xml2js and all dependencies.

xml2js is also available via Bower, just bower install xml2js which will download xml2js and all dependencies.

Usage

No extensive tutorials required because you are a smart developer! The task of parsing XML should be an easy one, so let's make it so! Here's some examples.

Shoot-and-forget usage

You want to parse XML as simple and easy as possible? It's dangerous to go alone, take this:

var parseString = require('xml2js').parseString;
var xml = "<root>Hello xml2js!</root>"
parseString(xml, function (err, result) {
    console.dir(result);
});

Can't get easier than this, right? This works starting with xml2js 0.2.3. With CoffeeScript it looks like this:

{parseString} = require 'xml2js'
xml = "<root>Hello xml2js!</root>"
parseString xml, (err, result) ->
    console.dir result

If you need some special options, fear not, xml2js supports a number of options (see below), you can specify these as second argument:

parseString(xml, {trim: true}, function (err, result) {
});

Simple as pie usage

That's right, if you have been using xml-simple or a home-grown wrapper, this was added in 0.1.11 just for you:

var fs = require('fs'),
    xml2js = require('xml2js');

var parser = new xml2js.Parser();
fs.readFile(__dirname + '/foo.xml', function(err, data) {
    parser.parseString(data, function (err, result) {
        console.dir(result);
        console.log('Done');
    });
});

Look ma, no event listeners!

You can also use xml2js from CoffeeScript, further reducing the clutter:

fs = require 'fs',
xml2js = require 'xml2js'

parser = new xml2js.Parser()
fs.readFile __dirname + '/foo.xml', (err, data) ->
  parser.parseString data, (err, result) ->
    console.dir result
    console.log 'Done.'

But what happens if you forget the new keyword to create a new Parser? In the middle of a nightly coding session, it might get lost, after all. Worry not, we got you covered! Starting with 0.2.8 you can also leave it out, in which case xml2js will helpfully add it for you, no bad surprises and inexplicable bugs!

Promise usage

var xml2js = require('xml2js');
var xml = '<foo></foo>';

// With parser
var parser = new xml2js.Parser(/* options */);
parser.parseStringPromise(xml).then(function (result) {
  console.dir(result);
  console.log('Done');
})
.catch(function (err) {
  // Failed
});

// Without parser
xml2js.parseStringPromise(xml /*, options */).then(function (result) {
  console.dir(result);
  console.log('Done');
})
.catch(function (err) {
  // Failed
});

Parsing multiple files

If you want to parse multiple files, you have multiple possibilities:

You can create one xml2js.Parser per file. That's the recommended one and is promised to always just work.
You can call reset() on your parser object.
You can hope everything goes well anyway. This behaviour is not guaranteed work always, if ever. Use option #1 if possible. Thanks!

So you wanna some JSON?

Just wrap the result object in a call to JSON.stringify like this JSON.stringify(result). You get a string containing the JSON representation of the parsed object that you can feed to JSON-hungry consumers.

Displaying results

You might wonder why, using console.dir or console.log the output at some level is only [Object]. Don't worry, this is not because xml2js got lazy. That's because Node uses util.inspect to convert the object into strings and that function stops after depth=2 which is a bit low for most XML.

To display the whole deal, you can use console.log(util.inspect(result, false, null)), which displays the whole result.

So much for that, but what if you use eyes for nice colored output and it truncates the output with …? Don't fear, there's also a solution for that, you just need to increase the maxLength limit by creating a custom inspector var inspect = require('eyes').inspector({maxLength: false}) and then you can easily inspect(result).

XML builder usage

Since 0.4.0, objects can be also be used to build XML:

var xml2js = require('xml2js');

var obj = {name: "Super", Surname: "Man", age: 23};

var builder = new xml2js.Builder();
var xml = builder.buildObject(obj);

will result in:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
  <name>Super</name>
  <Surname>Man</Surname>
  <age>23</age>
</root>

At the moment, a one to one bi-directional conversion is guaranteed only for default configuration, except for attrkey, charkey and explicitArray options you can redefine to your taste. Writing CDATA is supported via setting the cdata option to true.

To specify attributes:

var xml2js = require('xml2js');

var obj = {root: {$: {id: "my id"}, _: "my inner text"}};

var builder = new xml2js.Builder();
var xml = builder.buildObject(obj);

will result in:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root id="my id">my inner text</root>

Adding xmlns attributes

You can generate XML that declares XML namespace prefix / URI pairs with xmlns attributes.

Example declaring a default namespace on the root element:

let obj = { 
  Foo: {
    $: {
      "xmlns": "http://foo.com"
    }   
  }
};

Result of buildObject(obj):

<Foo xmlns="http://foo.com"/>

Example declaring non-default namespaces on non-root elements:

let obj = {
  'foo:Foo': {
    $: {
      'xmlns:foo': 'http://foo.com'
    },
    'bar:Bar': {
      $: {
        'xmlns:bar': 'http://bar.com'
      }
    }
  }
}

Result of buildObject(obj):

<foo:Foo xmlns:foo="http://foo.com">
  <bar:Bar xmlns:bar="http://bar.com"/>
</foo:Foo>

Processing attribute, tag names and values

Since 0.4.1 you can optionally provide the parser with attribute name and tag name processors as well as element value processors (Since 0.4.14, you can also optionally provide the parser with attribute value processors):

function nameToUpperCase(name){
    return name.toUpperCase();
}

//transform all attribute and tag names and values to uppercase
parseString(xml, {
  tagNameProcessors: [nameToUpperCase],
  attrNameProcessors: [nameToUpperCase],
  valueProcessors: [nameToUpperCase],
  attrValueProcessors: [nameToUpperCase]},
  function (err, result) {
    // processed data
});

The tagNameProcessors and attrNameProcessors options accept an Array of functions with the following signature:

function (name){
  //do something with `name`
  return name
}

The attrValueProcessors and valueProcessors options accept an Array of functions with the following signature:

function (value, name) {
  //`name` will be the node name or attribute name
  //do something with `value`, (optionally) dependent on the node/attr name
  return value
}

Some processors are provided out-of-the-box and can be found in lib/processors.js:

normalize: transforms the name to lowercase. (Automatically used when options.normalize is set to true)
firstCharLowerCase: transforms the first character to lower case. E.g. 'MyTagName' becomes 'myTagName'
stripPrefix: strips the xml namespace prefix. E.g <foo:Bar/> will become 'Bar'. (N.B.: the xmlns prefix is NOT stripped.)
parseNumbers: parses integer-like strings as integers and float-like strings as floats E.g. "0" becomes 0 and "15.56" becomes 15.56
parseBooleans: parses boolean-like strings to booleans E.g. "true" becomes true and "False" becomes false

Options

Apart from the default settings, there are a number of options that can be specified for the parser. Options are specified by new Parser({optionName: value}). Possible options are:

attrkey (default: $): Prefix that is used to access the attributes. Version 0.1 default was @.
charkey (default: _): Prefix that is used to access the character content. Version 0.1 default was #.
explicitCharkey (default: false) Determines whether or not to use a charkey prefix for elements with no attributes.
trim (default: false): Trim the whitespace at the beginning and end of text nodes.
normalizeTags (default: false): Normalize all tag names to lowercase.
normalize (default: false): Trim whitespaces inside text nodes.
explicitRoot (default: true): Set this if you want to get the root node in the resulting object.
emptyTag (default: ''): what will the value of empty nodes be. In case you want to use an empty object as a default value, it is better to provide a factory function () => ({}) instead. Without this function a plain object would become a shared reference across all occurrences with unwanted behavior.
explicitArray (default: true): Always put child nodes in an array if true; otherwise an array is created only if there is more than one.
ignoreAttrs (default: false): Ignore all XML attributes and only create text nodes.
mergeAttrs (default: false): Merge attributes and child elements as properties of the parent, instead of keying attributes off a child attribute object. This option is ignored if ignoreAttrs is true.
validator (default null): You can specify a callable that validates the resulting structure somehow, however you want. See unit tests for an example.
xmlns (default false): Give each element a field usually called '$ns' (the first character is the same as attrkey) that contains its local name and namespace URI.
explicitChildren (default false): Put child elements to separate property. Doesn't work with mergeAttrs = true. If element has no children then "children" won't be created. Added in 0.2.5.
childkey (default $$): Prefix that is used to access child elements if explicitChildren is set to true. Added in 0.2.5.
preserveChildrenOrder (default false): Modifies the behavior of explicitChildren so that the value of the "children" property becomes an ordered array. When this is true, every node will also get a #name field whose value will correspond to the XML nodeName, so that you may iterate the "children" array and still be able to determine node names. The named (and potentially unordered) properties are also retained in this configuration at the same level as the ordered "children" array. Added in 0.4.9.
charsAsChildren (default false): Determines whether chars should be considered children if explicitChildren is on. Added in 0.2.5.
includeWhiteChars (default false): Determines whether whitespace-only text nodes should be included. Added in 0.4.17.
async (default false): Should the callbacks be async? This might be an incompatible change if your code depends on sync execution of callbacks. Future versions of xml2js might change this default, so the recommendation is to not depend on sync execution anyway. Added in 0.2.6.
strict (default true): Set sax-js to strict or non-strict parsing mode. Defaults to true which is highly recommended, since parsing HTML which is not well-formed XML might yield just about anything. Added in 0.2.7.
attrNameProcessors (default: null): Allows the addition of attribute name processing functions. Accepts an Array of functions with following signature:
```
function (name){
    //do something with `name`
    return name
}
```
Added in 0.4.14
attrValueProcessors (default: null): Allows the addition of attribute value processing functions. Accepts an Array of functions with following signature:
```
function (value, name){
  //do something with `name`
  return name
}
```
Added in 0.4.1
tagNameProcessors (default: null): Allows the addition of tag name processing functions. Accepts an Array of functions with following signature:
```
function (name){
  //do something with `name`
  return name
}
```
Added in 0.4.1
valueProcessors (default: null): Allows the addition of element value processing functions. Accepts an Array of functions with following signature:
```
function (value, name){
  //do something with `name`
  return name
}
```
Added in 0.4.6

Options for the `Builder` class

These options are specified by new Builder({optionName: value}). Possible options are:

attrkey (default: $): Prefix that is used to access the attributes. Version 0.1 default was @.
charkey (default: _): Prefix that is used to access the character content. Version 0.1 default was #.
rootName (default root or the root key name): root element name to be used in case explicitRoot is false or to override the root element name.
renderOpts (default { 'pretty': true, 'indent': ' ', 'newline': '\n' }): Rendering options for xmlbuilder-js.
- pretty: prettify generated XML
- indent: whitespace for indentation (only when pretty)
- newline: newline char (only when pretty)
xmldec (default { 'version': '1.0', 'encoding': 'UTF-8', 'standalone': true }: XML declaration attributes.
- xmldec.version A version number string, e.g. 1.0
- xmldec.encoding Encoding declaration, e.g. UTF-8
- xmldec.standalone standalone document declaration: true or false
doctype (default null): optional DTD. Eg. {'ext': 'hello.dtd'}
headless (default: false): omit the XML header. Added in 0.4.3.
allowSurrogateChars (default: false): allows using characters from the Unicode surrogate blocks.
cdata (default: false): wrap text nodes in <![CDATA[ ... ]]> instead of escaping when necessary. Does not add <![CDATA[ ... ]]> if it is not required. Added in 0.4.5.

renderOpts, xmldec,doctype and headless pass through to xmlbuilder-js.

Updating to new version

Version 0.2 changed the default parsing settings, but version 0.1.14 introduced the default settings for version 0.2, so these settings can be tried before the migration.

var xml2js = require('xml2js');
var parser = new xml2js.Parser(xml2js.defaults["0.2"]);

To get the 0.1 defaults in version 0.2 you can just use xml2js.defaults["0.1"] in the same place. This provides you with enough time to migrate to the saner way of parsing in xml2js 0.2. We try to make the migration as simple and gentle as possible, but some breakage cannot be avoided.

So, what exactly did change and why? In 0.2 we changed some defaults to parse the XML in a more universal and sane way. So we disabled normalize and trim so xml2js does not cut out any text content. You can reenable this at will of course. A more important change is that we return the root tag in the resulting JavaScript structure via the explicitRoot setting, so you need to access the first element. This is useful for anybody who wants to know what the root node is and preserves more information. The last major change was to enable explicitArray, so everytime it is possible that one might embed more than one sub-tag into a tag, xml2js >= 0.2 returns an array even if the array just includes one element. This is useful when dealing with APIs that return variable amounts of subtags.

Running tests, development

The development requirements are handled by npm, you just need to install them. We also have a number of unit tests, they can be run using npm test directly from the project root. This runs zap to discover all the tests and execute them.

If you like to contribute, keep in mind that xml2js is written in CoffeeScript, so don't develop on the JavaScript files that are checked into the repository for convenience reasons. Also, please write some unit test to check your behaviour and if it is some user-facing thing, add some documentation to this README, so people will know it exists. Thanks in advance!

Getting support

Please, if you have a problem with the library, first make sure you read this README. If you read this far, thanks, you're good. Then, please make sure your problem really is with xml2js. It is? Okay, then I'll look at it. Send me a mail and we can talk. Please don't open issues, as I don't think that is the proper forum for support problems. Some problems might as well really be bugs in xml2js, if so I'll let you know to open an issue instead :)

But if you know you really found a bug, feel free to open an issue instead.

node-xml2js's People

Contributors

Stargazers

Watchers

Forkers

jiaaro poetro mattpardee simplegeo grahamscott azproduction jaekwon thegoleffect felixge max-mapper polera arunoda bmatheny jxa neopunisher dmachi kewinwang yocontra mdiniz tkpage ssssssssssss raoulmillais apeace mediaupstream rybesh salsita andrerod matthewjcarlson aoj esatterwhite mmlin machadogj xunchangguo gotomypc explodingbarrel kaoshijuan markkwhelan sleetish drivo calderas daemonchen kc-dot-io jacksontian robertjustjones sreekanthrv bradley45 sathyasrini jay61439476 seanmcgary peerlibrary christav ds82 gazetaj nicholasrio dbitting fraunhoferfokus tonny-zhang-fork kevinsawicki zrodev iamdenny gskielian yswang851 nichrome zubairov teopetuk fuson ravi1521 tgriesser-projects ibank jylertones stefrv hoelzl yorkie angleman web5design tsgautier langpavel watson creynders xatgithub mcclintock-lab jahon thesadboy imyelo is00hcw doron2402 ishmaelthedestroyer zoutaojlq javascript-forks ulhi-xin allenzhong actionshrimp jkso ribeiro freethejazz mitar shaney-orrowe-practiceweb assasinbox nvdnkpr asselin

node-xml2js's Issues

Tag names in all caps in results from parser

Hi.

I am using xml2js to parse some xml that I get from a web service. Everything was working great until a little while ago when randomly a lot of things were coming back undefined.

It turns out all of the xml tag names are all caps?

Did something change? I am not entirely sure I updated anything, but like most people I am working on many things at once so it's quite possible I did grab something that was updated.

Otherwise, I have no clue what is going on.

console.log(sys.inspect(result)) -> console.dir(result) in example

simpler ;)

Syntax feels a bit awkward

What is the purpose of the '#' and '@' syntax? Would it not be better to use a dot syntax like .text and .attr and especially .name? Does it serve some other purpose?

Cannot throw custom errors inside parseString

CODE: test.js

  var parseXML = require('xml2js').parseString;

  var XML = '<?xml version="1.0" encoding="utf-8"?><test>test</test>';

  parseXML(XML, function(err, xml){
    throw new Error('This is an error message');                                                                                       
  });

EXPECTED RESULT

$ node test.js 

/path/to/test.js:6
    throw new Error('This is an error message');
          ^
Error: This is an error message
    at /Users/slajax/repos/appla.bz/test.js:6:11
    at Object.<anonymous> (/path/to/test.js:7:5)
    at Module._compile (module.js:449:26)
    at Object.Module._extensions..js (module.js:467:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.runMain (module.js:492:10)
    at process.startup.processNextTick.process._tickCallback (node.js:244:9)

ACTUAL RESULT

$ node test.js 

events.js:73
        throw new Error("Uncaught, unspecified 'error' event.");
              ^
Error: Uncaught, unspecified 'error' event.
    at Parser.EventEmitter.emit (events.js:73:15)
    at Parser.exports.Parser.Parser.parseString (/path/to/node_modules/xml2js/lib/xml2js.js:223:21)
    at Parser.__bind [as parseString] (/path/to/node_modules/xml2js/lib/xml2js.js:6:61)
    at exports.parseString (/path/to/node_modules/xml2js/lib/xml2js.js:247:19)
    at Object.<anonymous> (/path/to/test.js:5:3)
    at Module._compile (module.js:449:26)
    at Object.Module._extensions..js (module.js:467:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.runMain (module.js:492:10)

Convert to JSON?

Is there something in this library to convert XML to JSON? If not could you recommend one?

Can't use a singleton parser object

My web application goes like this:
var xml2js = require('./xml2js/lib/xml2js'); var parser = new xml2js.Parser({ trim:false, normalize:false });
And in express router:
app.get('/fd/:api', function(req, myResponse, next) { http.get({path..host...port}, function(res) { var data = ''; res.setEncoding('utf8'); res.on('data', function(chunk) { data += chunk; }); res.on('end', function() { if (res.statusCode == 200) { parser.parseString(data, function (err, result) { myResponse.contentType('application/json'); myResponse.send(err ? { success:false } : result); myResponse.end(); }); } }); }); })
For the second request, it will throw error. But if I new xml2js.Parser() when statusCode == 200. everything works fine.

feature req: optional spaces cutting and trimming

sax (https://github.com/isaacs/sax-js/) has aruments for this (your code at xml2js.js, line 42)
// turn 2 or more spaces into one space
obj['#'] = obj['#'].replace(/\s{2,}/g, " ").trim();
So you can use
this.saxParser = sax.parser(true, {normalize: true}); // make the sax parser

or even so
this.saxParser = sax.parser(true, {normalize: true, trim: true}); // make the sax parser

at line 9

please make this optional

Use first argument of callback instead of throw on error

Hi Leonidas,

While working with xml2js, I found that the first argument of the callback is not used when an error is encountered (like parsing incorrect XML), instead throw is used. (See line 183 of xml2js.coffee.)

Why is an error thrown? This requires the usage of a try/catch block, instead of checking if err is set.

Let's say I have the following code where I check the first argument of the callback:

parseString xml, (err, result) ->
    console.log 'encountered error' if err

This does not work at the moment, causing the need for a try/catch block:

try {
    parseString xml, (err, result) ->
        # process result
} catch (e) {
    console.log 'encountered error', e
}

However, when the # process result becomes very long, the code for which the try is needed and the catch become distanced from each other, making it difficult to read.

To summarize my question: can the first argument of the callback be used to communicate an error instead of throwing one?

Synchronous Operation?

Is there a synchronous version of the parser? Seems silly to have to pass callback and deal with function nesting, etc, all for a parser...

callback never gets called

request({uri : "http://www.novedadesmovistar.com/feed/"}, function(err, response, body) {
var parser = new xml2js.Parser();
parser.addListener('end', function(result) {
    console.log(sys.inspect(result,false,null))
});
parser.parseString(body);
});

the parser 'end' event never gets called after when request callback gets called and the response successfully completes!

How to get around the JSON object

Trying to figure out how to get around this JSON object that comes back.

I used the simple example, and this is what comes back to the console:
{ '@':
{ messageId: 'a66f0e98-67f6-482d-8a56-5c366127f161',
receiptDate: '2012-07-09 16:24:18-0500' },
source: { '@': { address: '2084033200', carrier: '104', type: 'MDN' } },
destination: { '@': { address: '72826', type: 'SC' } },
message: 'Testtest hiya' }

I kind of figured I could do something like: [email protected][email protected] to get at that element, but it is throwing errors.

Interestingly, to get to the message I can simply do: result.message

But result.source.address in not defined.

The results dont pass jsonlint validation so I am wondering how to get around the object to get the data I need?

Export a function directly

As discussed in #65, since reusing parsers is not a good idea, it would be nice to have a require-and-execute API.

Proposal for more compact structure

The current data structure feels pretty cumbersome to me. In particular, having attributes in a sub-object seems unnecessary and makes them harder to get at. I'm wondering if there's a specific reason for this, or would a more compact datastructure be feasible.

For example, I've put together this method to merge the node and attribute objects, by simply prefixing attribute names with a '$'. (I also move '#' to '$', for consistency.)

function flattenXML(o, k) {
  var v = k != null ? o[k] : o;

  if (k == '@') { // ['@'] -> $(attribute name)
    for (var att in v) o['$' + att] = v[att];
    delete o['@'];
  } else if (k == '#') { // ['#'] -> $
    o.$ = v;
    delete o['#'];
  } else if (v && typeof(v) == 'object') {
    if (v.concat && v.length) {
      for (var i = 0; i < v.length; i++) flattenXML(v,  i);
    } else {
      for (var i in v) flattenXML(v,  i);
    }
  }

  return o;
}

For example, if the object that xml2js returns looks like this ...

{
  "InputEncoding": "UTF-8",
  "@": {
    "xmlns": "http://www.mozilla.org/2006/browser/search/"
  },
  "Url": [
    {
      "@": {
        "method": "GET",
        "type": "application/x-suggestions+json"
      }
    },
    {
      "@": {
        "method": "GET",
        "template": "http://www.google.com/search",
        "type": "text/html"
      },
      "Param": [
        {
          "@": {
            "name": "q",
            "value": "{searchTerms}"
          }
        }
      ]
    }
  ],
  "Image": {
    "@": {
      "height": "16",
      "width": "16"
    },
    "#": "data:..."
  },
  "ShortName": "Google"
}

flattenXml(obj) will convert it to this:

{
  "InputEncoding": "UTF-8",
  "$xmlns": "http://www.mozilla.org/2006/browser/search/",
  "Url": [
    {
      "$type": "application/x-suggestions+json",
      "$method": "GET",
    },
    {
      "$type": "text/html",
      "Param": [
        {
          "$value": "{searchTerms}",
          "$name": "q"
        }
      ],
      "$method": "GET",
      "$template": "http://www.google.com/search"
    }
  ],
  "Image": {
    "$height": "16",
    "$": "data...",
    "$width": "16"
  },
  "ShortName": "Google"
}

I'm curious what you think about this sort of thing. Thanks!

remove the need for proto.js

it's quite bad to change the native object prototypes like done in proto.js

to avoid it (you seem to need only forEach)

you could transform the code like that:

    var keys = Object.keys(node.attributes);
    var length = keys.length;

    for (var i = 0; i < length; i++) {
        if(typeof obj['@'] === 'undefined') {
            obj['@'] = {};
        }
        obj['@'][keys[i]] = node.attributes[keys[i]];
    }

cheers

Newline characters are not preserved within nodes

I don't know if this is a bug or simply proper handling of the XML spec, but it seems that newlines within a node are not preserved during parsing. Submitting just in case it falls into the former category.

For instance if a node was:

this
is
content

The resulting object contains "this is content" without breaks.

How do i see values in s:Body?

How can i loop each object in GetProvidersResult and GetProvidersResponse

My Return:

{ '@': { 'xmlns:s': 'http://schemas.xmlsoap.org/soap/envelope/' },
's:Body': { GetProvidersResponse: { '@': [Object], GetProvidersResult: [Object] } } }

Convert js2xml (bidirectional conversion)

It would be great if there were bidirectional conversion.

I started working on this, but if there's already something that works for the particular format that xml2js uses, then I'd go with that.

Cheers,
David

explicitArray should default to false

I find it very confusing and a lot of the time unnecessary that every child node is treated as an array. Why is explicitArray set to true by default?

"Simple as pie usage" first example, events don't work

In the readme.md on section "Simple as pie usage" the first example( the one w/out event ) doesn't work. The callback is never called.

npm publish this bad boy!

We ran into a bug where extending the Object.prototype was causing some weird interaction errors in some other third-party modules. It was awesome to see that you've already removed this behavior and even bumped the version number. Would you mind npm publishing now?

Thanks in advance and great work!

Tabs not preserved

I'm parsing an XML file (a RETS document) which has some tab delimited content in a node. After parsing, some tabs are not preserved. Something like this:

25763619 Residential No M 10 245,000 5114900003

DATA:
[ '25763619\tResidential\tNo M\t10\t245,000\t5114900003]

Notice a tab between No and M is gone. This is just a snippit of the doc but it's happening a multiple places. Any idea as to what is happening?

Edit: I think I see the problem. When there are two tabs in a row (No\t\tM) it doesn't recognize it.

Not pass error object, but emit error event

<xml><ToUserName><![CDATA[gh_d3e07d51b513]]></ToUserName>
<FromUserName><![CDATA[oPKu7jgOibOA-De4u8J2RuNKpZRw]]></FromUserName>
<CreateTime>1361374891</CreateTime>
<MsgType><![CDATA[text]]></MsgType>
<Content><![CDATA[/:8-)]]></Content>
<MsgId>5847060634540564918</MsgId>
</xml>

events.js:73
        throw new Error("Uncaught, unspecified 'error' event.");
              ^
Error: Uncaught, unspecified 'error' event.
    at Parser.EventEmitter.emit (events.js:73:15)
    at Parser.exports.Parser.Parser.parseString (/home/pi/wwwroot/api-doc-service/node_modules/wechat/node_modules/xml2js/lib/xml2js.js:223:21)
    at Parser.__bind [as parseString] (/home/pi/wwwroot/api-doc-service/node_modules/wechat/node_modules/xml2js/lib/xml2js.js:6:61)
    at Object.exports.parseString (/home/pi/wwwroot/api-doc-service/node_modules/wechat/node_modules/xml2js/lib/xml2js.js:247:19)
    at getMessage (/home/pi/wwwroot/api-doc-service/node_modules/wechat/lib/wechat.js:70:12)
    at IncomingMessage.<anonymous> (/home/pi/wwwroot/api-doc-service/node_modules/wechat/node_modules/bufferhelper/lib/bufferhelper.js:29:5)
    at IncomingMessage.EventEmitter.emit (events.js:93:17)
    at IncomingMessage._emitEnd (http.js:366:10)
    at HTTPParser.parserOnMessageComplete [as onMessageComplete] (http.js:149:23)
    at Socket.socket.ondata (http.js:1769:22)

version: 0.2.4

NPM package is out-dated

I installed from npm, and got version 0.0.1, which does not work. Downloading this package from github works perfectly.

xml root not parsed

<root></root> results in an empty object. But I would prefer { root: {} } instead so I could verify the xml root node.

xml2js does not preserve element order within a level

Here's the input XML

<nj:entitySection>
    <nj:entity entityName="Student">
        <nj:subEntity subEntityName="Contact" >
            <nj:fieldGroup displayName="group1">
                <nj:inModelField displayName="field1"  />
                <nj:foobar>x</nj:foobar>
                <nj:inModelField  displayName="field2"  />
            </nj:fieldGroup>
            <nj:fieldGroup displayName="group2">
                <nj:inModelField displayName="field3"/>
            </nj:fieldGroup>
        </nj:subEntity>
    </nj:entity>
</nj:entitySection>
</nj:ReportingStandard>

And here's what xml2js reads in:

{ 'nj:entitySection': 
  [ { 'nj:entity': 
       [ { '@': { entityName: 'Student' },
           'nj:subEntity': 
            [ { 'nj:fieldGroup': 
                 [ { 'nj:foobar': [ 'x', [length]: 1 ],
                     '@': { displayName: 'group1' },
                     'nj:inModelField': [ [Object], [Object], [length]: 2 ] },
                   { '@': { displayName: 'group2' },
                     'nj:inModelField': [ [Object], [length]: 1 ] },
                   [length]: 2 ],
                '@': { subEntityName: 'Contact' } },
              [length]: 1 ] },
         [length]: 1 ] },
    [length]: 1 ],

The key thing to notice is that in the xml its:
Originally the order was:

inModelField
foo
inModelField

but in the result, it's more like:

foo
inModelField
inModelField

it basically combines all tags with the same element name at a particular level, and in so doing, BLOWS away the (XML proscribed) preservation of tag order. And there's no way to re-establish the original order.

So if you tried to represent your XHTML with this, all your p, h, a and such tags would all scrunched with identical tags

H P P A H A P P H   would become
A A H H H P P P P   !!!!!

Note: I'm not trying to read in XHTML - i just used those tags as an example of why it might be a problem for some schemas.

xml2js does have this sexy sounding option:

explicitArray (default: false): Always put child nodes in an array if true;    otherwise an array is created only if there is more than one.

While it does indeed "put child nodes in an array", it doesn't preserve the document order.

Callback Delay

I've been experiencing over 2 min delay in the callback regardless of the xml file used for parsing. This delay happens in random but very frequent using the same or different xml files.

so, the same file that is parsed in less than 30 sec could take over 2 min.

This is the process I'm using:

A list of xml file names.
Click the file name to read content from the server using fs.
Parse content with xml2js:

Sample test code:

var parser = new xml2js.Parser();
parser.on('end', function(result) {
console.dir(result);
}

fs.readFile(fileName, function (err, data)){
console.dir(data);
if( parser.parseString(data) ){
console.log('xml2js: successfully parsed file.');
}else{
console.error('xml2js: parse error: "%s"', err);
//or
console.error('xml2js: parse error: "%s"', parser.getError());
}
}

Troubleshooting the issue identified the callback to be the culprit. No errors were captured, only a delay in the callback.

Your feedback is appreciated -Thanks.

Empty XML tag results in empty Object

When an empty XML tag is converted, it is converted to have a value of an empty Object. I looked at the test file and there is no test case for an empty tag, so I'm not sure if this was intentional or not. I would expect an empty tag to have the value null.

<root>
  <empty/>
</root>

Results in

{ "empty": {} }

But I think this represents it better:

{ "empty": null }

inconsistent format of results

If xml to be parsed is as follows:

<root> <item attr1="foo" attr2="bar"/> <item attr1="rab" attr2="oof"/> </root>

you get something like this (with explicitRoot):

{ "root":
"item":
[ { @:
{ attr1: 'foo',
attr2: 'bar'} },
{ @:
{ attr1: 'rab',
attr2: 'oof'} } ]
}

However, if there is only one "item", that array doesn't exist. It just:

{ "root":
"item":
{ @:
{ attr1: 'foo',
attr2: 'bar'} }
}

Thus, if you are guaranteed more than one item in the xml that you are parsing, you have to check if it's an array and then if not have a special case to handle that. Wouldn't it make more sense to always create the array and only one item gets pushed to it then it's just an array with a single item?

EDIT: Apparently I don't know how to correctly post code so this will have to do for now

Lost span?

I have an element that has content looking like...

Looks like the content inside the span is lost during parsing so that the result looks like

'#' blah blah blah blah blah

Is there any thing I can do in this case? I need a way to ignore the 'span' so that it is considered text.

please pin the sax dependency

https://github.com/Leonidas-from-XIV/node-xml2js/blob/master/package.json#L36

If sax ever updates in a breaking way, future installs of versions of this module will break for your users. The best approach is to pin the exact version which you have tested with and published with. If in the future you wish to investigate issues with a version, not knowing what version of sax was being used will make very difficult.

Elements Array Problem

Following elements array

<role name="anonymous" hash="..."/>
<role name="user" hash="..."/>
<role name="cheater" hash="..."/>
<role name="admin" hash="..."/>

parsed like this.

role: 
   [ { '@': [Object] },
     { '@': [Object] },
     { '@': [Object] },
     { '@': [Object] } ],

But, I want to be parsed like this.

role:  [ 
{ name: 'anonymous', hash: '...' },
{ name: 'user', hash: '... },
{ name: 'cheater', hash: '...' },
{ name: 'admin',  hash: '...' } 
],

Is it possible ?

parseString no longer blocking

In the latest release (0.2.5), Parser.parseString is no longer blocking. This makes it not possible synchronously parse an XML string that is already loaded into memory. Was this change intentional?

Empty CDATA will be parsed as an object({})

  'test empty CDATA': (test) ->
    xml = '<xml><Label><![CDATA[]]></Label><MsgId>5850440872586764820</MsgId></xml>'
    xml2js.parseString xml, (err, parsed) ->
      equ parsed.xml.Label[0], ''
      test.finish()

Empty string is expected, but actual is empty object({})

Please put the change of attrkey and charkey in the readme

I don't know the rationale behind changing the attrkey from "@" to "$" in version 0.2 but
mongodb throws an error if I try to save the document as it is.
[Error: key $ must not start with '$']

I found out that it could be overridden by

xml2js.defaults["0.2"].attrkey='@';

But it would be nice, not having to read the source could to figure it out.

XML not well formed: parseString callback is called twice

var xml2js = require('xml2js');
var fs = require('fs');

fs.readFile('testFile', 'utf-8', function(err, fileContent) {
  var parser = new xml2js.Parser();
  console.log("before");
  parser.parseString(fileContent, function(err, results) {
    console.log('after');   
    console.log(results);
  });
});

testFile Content: ( test2 is not well formed )

<test>
    <test2 type="3">
    /test2>
</test>

Output:

malletjo$ node test
before
after
undefined
after
{ test2: { '#': '/test2>', '@': { type: '3' } } }

As you see, "after" is called twice. So this result in really weird behavior

================== Version ==================

malletjo$ node -v
v0.4.11

malletjo$npm list
...
└─┬ [email protected]
└── [email protected]

parseString callback never called

I am using xml2js to parse XML-data from an external source I do not control.

I recently encountered a piece of invalid XML that caused parseString to not call the callback, ever. Not with a result, not even with an error.

Here's a sample node app and 2 XML-files that cause the described behavior:

test.js:

var fs = require( "fs" ), xml2js = require( "xml2js" );

var parser = new xml2js.Parser();
fs.readFile( __dirname + "/foo.xml", function( err, data ) {
    if( err ) {
        console.log( "file io error: " + err );
        return;
    }
    parser.parseString( data, function( err, res ) {
        // never reached
        if( err ) {
            console.log( "parse error: " + err );
            return;
        }
        console.dir( res );
        console.log( "done" );
    } );
} );

foo.xml:

<?xml version="1.0"?>

foo2.xml

<?xml version="1.0"?>
This is a string.
This is another string.

No longer works with latest npm

npm changed how it finds the script. I think you need to provide the exact path to the js file instead of just the directory.

Incomplete conversion

XML : (I know its probably not 'valid', but its what a payment gateway in India sends back.. and I have to parse it... i am obfuscating the domain name)

<result>ENROLLED</result>
<url>https://netsafeuat.somedomain.com/ACSWeb/com.enstage.entransact.servers.AccessControlServerSSL?ty=V</url>
<PAReq>eJxVUttS2zAQ/RWP37HkO82sxaQEijsNUDBtedTIwhFJZFuSg/P3lZyEi/SgPburs1e4GLcbb8eVFq0s/DDAvscla2shm8J/qq7Pzv0LAtVKcb545GxQnMCSa00b7om68C9fWZl1yfPVXY56ZYY5povxbuzXoix8AvfzB94TOAYglj+IAJ2gZVJsRaUhQFn/vbwlSRxHeQLoCGHLVbkgL1p7n87ZN4xxHOaW6WAHSbec/BK734NYexVnK9lu2mbvlbIW1AM02YG1gzRqT+I0A3QCMKgNWRnT6RlCeqqwawzXJrBRJTcBawMhUde80T1qqOHupXU9pR686g6QYwD0Ucv94CRtI46iJo9ZrfW/v9Huabkcf/a0v2ZXcv9j+edlXQByHlBbWhLhMLI39nA6S/AszQFNeqBblyp50FGUBRjb5hwU0Lk48wOwNmf6rAJbi7LDPBV8QsDHrpXcfQL0LgP6SPvyxg2EGdva5K3Cu5uwq3D7HIdxmiYYZ2l2nqduSJOL4xO2k2GCD4QOAHIk6Dh/dFwZK31Zpf9vtNKc</PAReq>
<paymentid>6756271401723481</paymentid>
<trackid>QVT-1355400371710</trackid>
<udf1></udf1>
<udf2></udf2>
<udf3></udf3>
<udf4></udf4>
<udf5></udf5>

I only get the first element i.e 'result'

What i get ;

{ result: 'ENROLLED' }

I need to get all the elements like

{result: 'ENROLLED', url: 'http://....', PAReq : '.....' .... etc}

Thanks in advance.

Call back not called when input string is empty or only blank spaces

Hello

In the code below, the listener on 'end' never gets called.

It seems that if the input string is empty or only blank spaces, the listener is not called.

var sys = require('sys'),
    fs = require('fs'),
    xml2js = require('xml2js');

var parser = new xml2js.Parser();
parser.addListener('end', function(result) {
    console.log(sys.inspect(result));
    console.log('Done.');
});
parser.parseString("  ");

Thanks!

Code not working with deep XML

test with sample.xml provided in test/fixtures.
Code:
var fs = require('fs'),
xml2js = require('xml2js');

var parser = new xml2js.Parser();
fs.readFile('./sample.xml', function(err, data) {
parser.parseString(data, function (err, result) {
    console.dir(result);
    console.log('Done');
});
});

Out:

{ sample:
{ chartest: [ [Object] ],
cdatatest: [ [Object] ],
nochartest: [ [Object] ],
whitespacetest: [ [Object] ],
listtest: [ [Object] ],
arraytest: [ [Object] ],
emptytest: [ {} ],
tagcasetest: [ [Object] ],
ordertest: [ [Object] ],
validatortest: [ [Object] ],
'pfx:top': [ [Object] ] } }
Done

doesn't work for me

Hi, I followed the instructions of the Readme but something seems broken :

jan@jan:~/pro/test$ node -v
v0.5.6-pre
jan@jan:~/pro/test$ npm -v
v                        1.0.27
jan@jan:~/pro/test$ npm install xml2js
[email protected] ./node_modules/xml2js 
└── [email protected]
jan@jan:~/pro/test$ ls node_modules/
xml2js
jan@jan:~/pro/test$ node
> var xml2js = require('xml2js');
> var parser = xml2js.Parser();
> parser.addListener('end', function(result){return result;});
TypeError: Cannot call method 'addListener' of undefined
    at repl:1:8
    at Interface.<anonymous> (repl.js:168:22)
    at Interface.emit (events.js:67:17)
    at Interface._onLine (readline.js:153:10)
    at Interface._line (readline.js:408:8)
    at Interface._ttyWrite (readline.js:585:14)
    at ReadStream.<anonymous> (readline.js:73:12)
    at ReadStream.emit (events.js:88:20)
    at ReadStream._emitKey (tty_posix.js:306:10)
    at ReadStream.onData (tty_posix.js:69:12)
>

Can you help me please? :)

broken ?!

It only parses the root, and then stops.

Env

node v0.10.4
CoffeeScript version 1.6.2
[email protected] (tried both the published version and the master branch)

xml2js = require 'xml2js'
util = require 'util'

body = '<sample><chartest desc="Test for CHARs">Character data here!</chartest></sample>'
console.log util.inspect xml2js.parseString(body), false, null

###

{ comment: '',
  sgmlDecl: '',
  textNode: '',
  tagName: '',
  doctype: '',
  procInstName: '',
  procInstBody: '',
  entity: '',
  attribName: '',
  attribValue: '',
  cdata: '',
  script: '',
  c: '',
  q: '',
  bufferCheckPosition: 65536,
  opt:
   { trim: false,
     normalize: false,
     xmlns: false,
     lowercase: undefined },
  looseCase: 'toUpperCase',
  tags: [],
  sawRoot: true,
  closedRoot: true,
  closed: false,
  error: null,
  tag: { name: 'sample', attributes: {}, isSelfClosing: false },
  strict: true,
  noscript: true,
  state: 1,
  ENTITIES: {},
  attribList: [],
  trackPosition: true,
  column: 80,
  line: 0,
  position: 80,
  onerror: [Function],
  onopentag: [Function],
  onclosetag: [Function],
  oncdata: [Function],
  ontext: [Function],
  startTagPosition: 72 }

###

Changed owner causes issues

I see that maqr removed the original directory, so now it displays a 404 error when I try to access it. It's no big deal to start using the one at this path (Leonidas'), but a lot of sites out there are linking to the maqr source

errors parsing

I'm trying to parse this xml http://api.chartlyrics.com/apiv1.asmx/SearchLyricDirect?artist=beck&song=loser

but get this error message

{ stack: [Getter/Setter],
arguments: undefined,
type: undefined,
message: 'Non-whitespace before first tag.\nLine: 0\nColumn: 1\nChar: c' }

is there anything I can do to get a proper result ?

just return object from parser.parseString

Is there a need to have a callback here at all? There is no IO happening and the callback just adds closures and indentation for no reason.

I would like to use this without Node.js on Windows

How can I use this library on Windows and not using Node? If not can you recommend another library?

Thanks

Failing on an apostrophe

Using xml2js to parse RSS feeds, I came across this issue:

var xml2js = require('xml2js');
var xml = '<?xml version="1.0" encoding="UTF-8"?>' +
  '<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" ' +
  'xmlns:wfw="http://wellformedweb.org/CommentAPI/" ' +
  'xmlns:dc="http://purl.org/dc/elements/1.1/" ' +
  'xmlns:atom="http://www.w3.org/2005/Atom" ' +
  'xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" ' +
  'xmlns:slash="http://purl.org/rss/1.0/modules/slash/" version="2.0">' +
    '<test>Apostrophe: &#039;</test>' +
  '</rss>';

var parser = new xml2js.Parser();
parser.addListener('error', function(err) {
  console.error('Parsing error: ' + err);
});
parser.addListener('end', function(result) {
  console.log('Done: ' + require('util').inspect(result));
});
parser.parseString(xml);

Outputs:

Parsing error: Error: Invalid character entity
Line: 0
Column: 381
Char: ;
Done: { '@': 
   { 'xmlns:content': 'http://purl.org/rss/1.0/modules/content/',
     'xmlns:wfw': 'http://wellformedweb.org/CommentAPI/',
     'xmlns:dc': 'http://purl.org/dc/elements/1.1/',
     'xmlns:atom': 'http://www.w3.org/2005/Atom',
     'xmlns:sy': 'http://purl.org/rss/1.0/modules/syndication/',
     'xmlns:slash': 'http://purl.org/rss/1.0/modules/slash/',
     version: '2.0' },
  test: 'Apostrophe: Apostrophe: &#039;' }

Evented processing support..

It would be nice, if you could have an opening element tag name you are looking for that is repeated...

Given the following...

<root>
  <subroot>
    <target>...</target>
    <target>...</target>
    ...
  </subroot>
</root>

You pass the xml2js object a stream of the input... and you get a 'data' event for each node... so that you can process larger files, without taking up a lot of memory.

List items with properties

I noticed xml2js misses properties from list items. I modified the test xml so it looks like this:

<sample>
    <chartest desc="Test for CHARs">Character data here!</chartest>    
    <cdatatest desc="Test for CDATA" misc="true"><![CDATA[CDATA here!]]></cdatatest>
    <nochartest desc="No data" misc="false" />
    <listtest>
        <item foo="boom">
            This <subitem>Foo(1)</subitem> is
            <subitem>Foo(2)</subitem>
            character
            <subitem>Foo(3)</subitem>
            data!
            <subitem>Foo(4)</subitem>
        </item>
        <item foo="bar">Qux.</item>
        <item foo="baz">Quux.</item>
    </listtest>
</sample>

Running that through xml2js, it returns this:

{
    "chartest": {
        "#": "Character data here!",
        "@": {
            "desc": "Test for CHARs"
        }
    },
    "cdatatest": {
        "#": "CDATA here!",
        "@": {
            "desc": "Test for CDATA",
            "misc": "true"
        }
    },
    "nochartest": {
        "@": {
            "desc": "No data",
            "misc": "false"
        }
    },
    "listtest": {
        "item": [{
            "#": "This is character data!",
            "subitem": ["Foo(1)", "Foo(2)", "Foo(3)", "Foo(4)"]
        },
        "Qux.", "Quux."]
    }
}

Notice how the "foo" properties for each "item" element are missed.

Can a parser instance be reused?

Its unclear from the readme if a parser instance can be used multiple times. The fact that there is a constructor in the first place seems to indicate that you need to create one from scratch each time you call parse.

Would be nice to have that clarified in the documentation.