Git Product home page Git Product logo

renjin-xml2's Introduction

Build system Build status
travis-ci.org (Travis) Build Status

renjin-xml2

A drop-in replacement for the xml2 package in Renjin. Note that this replacement package is currently by no means fully functional. Check the NAMESPACE file to get an impression of which functions are available. In the remainder of this README, the term xml2 refers to the original R package authored by Wickham et al.

Some technical details

S3 classes

The xml2 package uses S3 classes to represent an XML document and its nodes: xml_document and xml_node. There is a third class xml_nodeset to represent a list of nodes.

Objects of class xml_document inherit from xml_node therefore objects from these two classes have almost the same structure: both are a list with two named elements: node and doc. An XML document is essentially represented by the root node. The node element will be the pointer to the node and the doc element will be a pointer to the document. The latter is a way to keep track of the document which 'owns' the node, but Java allows you to obtain a reference to this document from a node.

Blank nodes

Before version 1.0.0 of the xml2 package, the read_xml() function passed the XML_PARSE_NOBLANKS option to libxml2 by default. Since version 1.0.0, the function has an options argument to control the parser options and options="NOBLANKS" is the default value. The effect of this option is to remove blank nodes, but the exact definition used by libxml2 for a blank node is not clear, see e.g. this discussion.

Java has an option to remove certain 'ignorable whitespace' using setIgnoringElementContentWhitespace when the parser is created, but the documentation clearly states that this option can only be used in validating mode. At a minimum, this requires a DTD to be present in the XML document.

All this means that behavior between xml2 and renjin-xml2 may be different when dealing with blank nodes.

HTML

The xml2 package includes a function read_html() to parse HTML files. HTML looks like XML, but browsers will accept HTML documents which are invalid or malformed XML documents. The package uses the HTMLparser module from the libxml2 C library to parse (and fix) HTML documents. Java built-in XML processors do not have such an HTML parser, therefore the renjin-xml2 package uses the jsoup Java library. In particular, we use jsoup to parse the document and convert it to a well-formed XML document using the outerHtml() method.

License

The xml2 package is licensed as GPL version 2 or later and this replacement package has the same license. See the LICENSE file for the full text of the license.

renjin-xml2's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

pernyfelt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.