Git Product home page Git Product logo

Comments (3)

renggli avatar renggli commented on August 25, 2024

Thank you for the GIST allowing me to quickly reproduce the observed issue.

In your example, re-encoding creates a different output because ">" is valid XML text and doesn't normally need to be encoded.

Same problem exists in various other places where the XML nodes are unable to keep the exact source of the parsed input, e.g. the delimiter types of attributes, whitespace within elements, and whitespace at document level.

While it is technically possible to solve the reported and the various related problems, I am not sure if it is worth the trouble? Most XML libraries I know don't do it.

Would it help to be able to specify a custom encoder function?

from dart-xml.

daniel-v avatar daniel-v commented on August 25, 2024

Yes, allowing me to specify a custom encoding/decoding function will possibly solve this specific issue, however I am not entirely certain we are on the right track.

Question: is it a valid requirement that I should be able to retrieve contents of an XmlElement as XML without any modification if

  1. the XML is valid
  2. the textual content may contain such character sequences that are decodable

I saw in a previous issue that you brought up PHP's implementation as reference. I created yet another gist (my PHP skills are terrible).

If the answer to the question is yes, the default behavior of this library should reflect it. Until such time arrives, I would be happy with the custom function you mentioned.


My use-case

The following wall of text is only loosely related to the issue at hand; I wanted to offer you some context as to how I'd like to use this library.
I am using it to parse an XML based format, called XLIFF.
A small snippet from an XLIFF file:

<trans-unit id="some_trans_unit_id" datatype="xml">
    <source><bpt ctype="x-g" equiv-text="[" id="_0">&lt;g id=&quot;_0&quot;&gt;</bpt>3. helyezett: </source>
</trans-unit>

From my application's point of view, it is crucial, that the contents of source can be retrieved as-is, without any kind of encoding/decoding.
If I retrieve the text contents of the source sourceElement.text I get <g id="_0">3. helyezett:. It was devised that such fragments be wrapped in bpt tags and encoded to produce valid XML.

from dart-xml.

renggli avatar renggli commented on August 25, 2024

With c2fb10d single and double quotes in attributes are preserved.

With b2482cd whitespaces, processing instructions and comments at the document level are preserved.

from dart-xml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.