Git Product home page Git Product logo

sxd-document's Introduction

SXD-Document

An XML library in Rust.

crates.io Documentation Build Status

Overview

The project is currently broken into two crates:

  1. document - Basic DOM manipulation and reading/writing XML from strings.
  2. xpath - Implementation of XPath 1.0 expressions.

There are also scattered utilities for playing around at the command line.

In the future, I hope to add support for XSLT 1.0.

Goals

This project has two goals, one more achievable than the other:

  1. Help me learn Rust.
  2. Replace libxml and libxslt.

Contributing

  1. Fork it ( https://github.com/shepmaster/sxd-document/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Add a failing test.
  4. Add code to pass the test.
  5. Commit your changes (git commit -am 'Add some feature')
  6. Ensure tests pass.
  7. Push to the branch (git push origin my-new-feature)
  8. Create a new Pull Request

sxd-document's People

Contributors

carols10cents avatar cryze avatar danieldulaney avatar draivin avatar flying-sheep avatar leoschwarz avatar ljedrz avatar onelson avatar shepmaster avatar vky avatar wimh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sxd-document's Issues

Parsing files with `xml:space`

I am getting UnknownNamespacePrefix when trying to parse a file which contains a xml:space attribute. It seems that the xml namespace should be always bounded. Is that the case? Is there a way to achieve it (i.e. to tell the parser that certain namespaces are defined elsewhere)

(Moved from @hgrecco's issue shepmaster/sxd-xpath#112)

The root is not the root element?

While trying to work with a parsed document I found something strange and rather counter-intuitive: The root does not seem to be the root element of the parsed document so when I iterate over the children of the root I will actually see all top level elements of the document including the root element itself.

let package = parser::load_xml(file);
let doc = package.as_document();
println!("{:?}", doc.root ()); // Yields "Root"

for elem in doc.root().children() {
    println!("Got {:?}", elem);  // Prints 2 elements: A comment and the root element
}

Permit unsized Write in `format_document`?

Right now sxd_document::writer::format_document requires a Sized writer.
Using it with &mut Write produces a error:

src/crates/by_ad/../../by_ad.rs:901:3: 901:40 error: the trait `core::marker::Sized` is not implemented for the type `std::io::Write`
src/crates/by_ad/../../by_ad.rs:901   sxd_document::writer::format_document (&doc, out);
                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/crates/by_ad/../../by_ad.rs:901:3: 901:40 `std::io::Write` does not have a constant size known at compile-time
src/crates/by_ad/../../by_ad.rs:901   sxd_document::writer::format_document (&doc, out);

(The code is along these lines:)

pub fn foo (out: &mut Write) {
    // ...
    sxd_document::writer::format_document (&doc, out);
}

Could you make the W unsized (?Sized)?

How to (code-)efficiently traverse a DOM?

I do have a rather simple DOM which I'd like to traverse but the regular DOM implementation makes it rather tedious to actually navigate around in the parsed tree. To get to the root element of the document I'm using this unsightly code at the moment (there may or may not be a comment before the root element so I have to filter that):

    let root = doc.root()
        .children()
        .into_iter()
        .find(|&x|
            if let dom::ChildOfRoot::Element(_) = x {
                true
            } else {
                false
            }
        )
        .unwrap()
        .element()
        .unwrap();

The next level (of interest) is a <model> which I'm getting at like:

    let model = root.children()
        .into_iter()
        .find(|&x| {
            if let Some(name) = x.element() {
                name.name().local_part() == "model"
            } else {
                false
            }
        })
        .unwrap()
        .element()
        .unwrap();

and so on and so on.

It seems tinydom would provide more convenient access to the DOM but that looks unfinished and under-documented at the moment.

Is there a more elegant way to traverse the DOM, like a direct iterator over all children as elements directly so I can skip all the naughty element-ification and unwrapping?

sxd-document cannot parse document containing a UTF-8 BOM

Disclaimer: I've been working with XML and UTF-8 for a long time and this is the first time I ran into such a problem so I had to do a bit of research to figure out what's going on...

So what I'm trying to do is sort of naive approach to writing an application reading an XML document. The problem is also reproducible using sxd-xpath/evaluate so I'll use that for the sake of easier access and to demonstrate the problem I'll use https://www.broadband-forum.org/cwmp/tr-069-biblio.xml.

This file uses a UTF-8 BOM which read_to_string ()gladly integrates into the resulting String which fails the parser because it expects the beginning of the document to literally be <xml:

# cargo run -- --xpath / tr-069-biblio.xml
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/evaluate --xpath / tr-069-biblio.xml`
Unable to parse input XML
 -> Expected("<?xml")
 -> ExpectedElement
 -> ExpectedWhitespace
 -> ExpectedComment
 -> ExpectedProcessingInstruction
thread 'main' panicked at 'At:
<?xml version=', src/main.rs:52

I'm not sure what the expected behaviour is supposed to be and do see a couple of approaches to address this particular problem:

  1. Have std automatically strip irrelevant magic from file content turned into Strings
  2. Have std provide a normalising read function
  3. Have each application (including sxd-document) specifically deal with this variant

output whitespace is unexpected

Some whitespace from the input is preserved and other whitespace isn't.

Example input:

<?xml version="1.0"?>
<foo>
  <bar baz="blah" />
</foo>

Current output:

$ ./target/release/open foo.xml
<?xml version='1.0'?><foo>
  <bar baz='blah'/>
</foo> $

Feature request: functions for deleting elements and attributes

When using sxd-document for manipulating XML document I am missing functions for
deleting/removing elements and attributes.

Is there any direct way to achieve this?
(I am a Rust beginner, so far I have used Perl and XML:Twig).

It seems one can remove an element by appending it to other element, which is not attached to the main tree:

let waste = document.create_element("waste");

and later:

waste.append_child(element_to_delete.unwrap());

but it seems rather obscure (and will not work for attributes).

With regards
Josef

format_document - optional <xml> tag?

Would you consider making the <xml> tag generation in format_document optional?

When working with XML fragments the tag is not needed.

Right now one has to stream into a temporary buffer first (or something like that) to cut it out.

Add an option to bubble up all namespace declarations to the root

Loading and saving Flat LibreOffice Text document makes it unreadable:

fn main() {
    let xml = std::fs::read_to_string("template.fodt").expect("Failed to open");
    let doc = sxd_document::parser::parse(&xml).expect("Failed to parse");
    
    let mut output = Vec::new();
    sxd_document::writer::format_document(&doc.as_document(), &mut output).expect("unable to output XML");
    std::fs::write("output.fodt", &output).expect("Failed to write");
}

The very minimal template file is attached (but zipped because of Github).

template.fodt.zip

Use types to maintain invariants about valid strings

The Char grammar production restricts which characters are valid XML, arbitrary UTF-8 is not allowed. Beyond that, components have other restrictions:

  • element and attribute names must match NCName,
  • comments may not contain --
  • processing-instruction targets may not be xml, must be a Name
  • processing-instruction values may not contain ?>

Add at least rudimentary DTD handling

sxd-document does not work if the document has internal DTD.

Trying to parse the example document (taken from here):

<?xml version="1.0"?>
<!DOCTYPE note [
<!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend</body>
</note>

I get the following error: (37, [Expected("SYSTEM")]). 37 is the position of the first [.
Adding full parsing of internal DTD might be quite a big deal, but, for my purposes (parsing JMdict) just ignoring the DTD content would be fine.

No way to write subsection of document?

sxd_document::writer::format_element is private. Once one has selected an element via xpath a chunk of a document there doesn't appear to be a way to write it on its own? Am I missing something or is this an accurate current limitation?

(Loving your work)

With kind regards,

Giles

Namespace prefix inheritance produces inefficient output

It looks like prefix handling is slightly off when children use a prefix mapping inherited from an ancestor. For example, take the XML document:

<?xml version='1.0'?>
<root xmlns:x='urn:very-long-urn'>
  <x:child/>
  <x:child/>
  <x:child/>
</root>

If you parse and output the document with default writer settings, you end up with something like this:

<?xml version='1.0'?>
<root>
  <x:child xmlns:x='urn:very-long-urn'/>
  <x:child xmlns:x='urn:very-long-urn'/>
  <x:child xmlns:x='urn:very-long-urn'/>
</root>

This output seems sub-optimal. First, it's longer than the input. Second, it is more difficult to edit later because the prefixes all refer to different URN mappings. It would be better to maintain the single URN mapping wherever it is defined.

Additionally, it is difficult to see what is happening because Element doesn't have any way to see existing prefix-to-namespace mappings.

I propose two separate changes:

  • Add a method on Element to see what prefixes have already been mapped there
  • Always put the prefix definition on the element it was originally defined, even if it is only used in ancestor nodes

If there's buy-in, I'm happy to work on PRs that address both issues.

Example code (fails assertion):

let xml = "<?xml version='1.0'?><root xmlns:x='urn:very-long-urn'><x:child/><x:child/><x:child/></root>";

let package = sxd_document::parser::parse(xml).unwrap();
let doc = package.as_document();

let mut output = Vec::new();
sxd_document::writer::Writer::new().format_document(&doc, &mut output).unwrap();
let output_str = String::from_utf8(output).unwrap();

assert_eq!(output_str, xml);

XML declaration with encoding attribute fails to parse

Parsing XML declarations with an encoding attribute produces an unexpected InvalidProcessingInstructionTarget error.

Test Case

let xml_parser = sxd_document::parser::Parser::new();
let toc_package = match xml_parser.parse("<?xml version=\"1.0\" encoding=\"UTF-8\" ?><root/>") {
  Ok(_) => println!("OK"),
  Err((offset, errs)) => {
      println!("failed to parse TOC. error at location {}: {:?}", offset, errs);
  }
};

Expected output: "OK"
Actual output: "failed to parse TOC. error at location 2: [InvalidProcessingInstructionTarget]"

Give line number and column number of error instead of single location value on parse error

This:-

fn main() {
    let xml = r#"<?xml version="1.0"?>
<bookshop>
  <books>
    <book>
      <title>The Illuminatus Trilogy</title>
    </book>
  </books>
</bookshop1>"#;

    let result = parser::parse(xml);

    match result {
        Ok(_) => println!("parsed ok"),
        Err((loc, errors)) => println!("parse failed; location = {}, errors = {:?}",
            loc, errors),
    }
}

will print:-

parse failed; location = 124, errors = [MismatchedElementEndName]

It would be nicer to give the line number and column number of the error, as all other XML parsers I've played with do.

Use after Free when parsing this XML Document

Found by cargo-fuzz:

crash-52cdb28f04f0c80d84609394d18ed2c0b8fedb7f.zip

Caused at:

...
<sxd_document::string_pool::InternedString as core::cmp::PartialEq>::eq
...
std::collections::hash::map::search_hashed
...
sxd_document::string_pool::StringPool::intern
sxd_document::raw::Storage::intern
sxd_document::raw::Storage::create_attribute
sxd_document::dom::Element::set_attribute_value
sxd_document::parser::DomBuilder::finish_opening_tag
sxd_document::parser::DomBuilder::consume
sxd_document::parser::parse

Freed at:

...
alloc::raw_vec::RawVec<...>::dealloc_buffer
RawVec<...>::drop
core::ptr::drop_in_place
core::ptr::drop_in_place
core::ptr::drop_in_place
core::ptr::drop_in_place
sxd_document::parser::DomBuilder::finish_opening_tag
sxd_document::parser::DomBuilder::consume
sxd_document::parser::parse

Allocated at:

...
<alloc::vec::Vec<T>>::extend_from_slice
alloc::string::String::push_str
sxd_document::parser::AttributeValueBuilder::ingest
sxd_document::parser::DomBuilder::finish_opening_tag
sxd_document::parser::DomBuilder::consume
sxd_document::parser::parse

Appending elements across documents allows for memory unsafety in safe client code

extern crate sxd_document;

use sxd_document::Package;

fn main() {
    let p1 = Package::new();

    {
        let p2 = Package::new();

        {
            let d1 = p1.as_document();
            let d2 = p2.as_document();

            let e1 = d1.create_element("hi");
            d2.root().append_child(e1);

            let e2 = d2.create_element("bye");
            d1.root().append_child(e2);
        }
    }

    {
        let d1 = p1.as_document();
        for c in d1.root().children() {
            println!("{:?}", c);
        }
    }
}

When p2 goes out of scope, it takes its children with it, even though p1 retains a reference. Related to #8, but a critical safety issue.

Missing close tag causes parser to hang indefinitely

The closing tag for the root element being missing will do it:-

fn main() {
    let xml = r#"<?xml version="1.0"?>
<bookshop>
  <books>
    <book>
      <title>The Illuminatus Trilogy</title>
    </book>
  </books>"#;

    let _result = parser::parse(xml);
}

There may be other cases that do it too. The above is the only case I've found so far.

Concurrent thin documents allowed, which can break memory safety

fn double_thin_document() {
    let p1 = Package::new();

    let (s1, mut c1) = p1.as_thin_document();
    let (s2, mut c2) = p1.as_thin_document();

    for _ in 0..10 {
        let c = s1.create_comment("hi");
        c1.append_root_child(c);
    }

    for c in c1.root_children() {
        let e = s2.create_comment("bye");
        c2.append_root_child(e);

        println!("{:?}", c);
    }
}

serde integration

are you planning on adding serde integration to this crate, would you be interested in a PR for it ?

Miri reports undefined behaviour triggered by test suite

For any crate that doesn't use #[forbid(unsafe_code)], the very first thing I do before considering it for use it to git clone --depth=1 it and run cargo +nightly miri test on it... this crate didn't pass.

% git clone --depth=1 https://github.com/shepmaster/sxd-document.git
Cloning into 'sxd-document'...
remote: Enumerating objects: 24, done.
remote: Counting objects: 100% (24/24), done.
remote: Compressing objects: 100% (20/20), done.
remote: Total 24 (delta 1), reused 6 (delta 0), pack-reused 0
Unpacking objects: 100% (24/24), done.
Checking connectivity... done.
% cd sxd-document
% cargo +nightly miri test
    Updating crates.io index
  Downloaded peresil v0.3.0
  Downloaded typed-arena v1.7.0
  Downloaded 2 crates (23.5 KB) in 0.63s
   Compiling typed-arena v1.7.0
   Compiling peresil v0.3.0
   Compiling sxd-document v0.3.2 (/home/ssokolow/src/sxd-document)
    Finished test [unoptimized + debuginfo] target(s) in 20.41s
     Running target/x86_64-unknown-linux-gnu/debug/deps/sxd_document-2ca61747ef9df327

running 194 tests
test dom::test::attributes_belong_to_a_document ... ok
test dom::test::attributes_can_be_iterated ... ok
test dom::test::attributes_can_be_removed ... ok
test dom::test::attributes_can_be_removed_from_parent ... ok
test dom::test::attributes_can_be_reset ... ok
test dom::test::attributes_know_their_element ... ok
test dom::test::can_return_a_populated_package ... error: Undefined Behavior: trying to reborrow for SharedReadWrite at alloc260646, but parent tag <695071> does not have an appropriate item in the borrow stack
    --> src/raw.rs:521:9
     |
521  |         parent_r.children.push(child);
     |         ^^^^^^^^^^^^^^^^^ trying to reborrow for SharedReadWrite at alloc260646, but parent tag <695071> does not have an appropriate item in the borrow stack
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
             
     = note: inside `raw::Connections::append_root_child::<raw::ChildOfRoot>` at src/raw.rs:521:9
note: inside `dom::Root::append_child::<dom::Element>` at src/dom.rs:174:9
    --> src/dom.rs:174:9
     |
174  |         self.document.connections.append_root_child(child.as_raw());
     |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `dom::test::can_return_a_populated_package::populate` at src/dom.rs:1608:17
    --> src/dom.rs:1608:17
     |
1608 |                 doc.root().append_child(element);
     |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
note: inside `dom::test::can_return_a_populated_package` at src/dom.rs:1614:23
    --> src/dom.rs:1614:23
     |
1614 |         let package = populate();
     |                       ^^^^^^^^^^
note: inside closure at src/dom.rs:1601:5
    --> src/dom.rs:1601:5
     |
1601 | /     fn can_return_a_populated_package() {
1602 | |         fn populate() -> Package {
1603 | |             let package = Package::new();
1604 | |             {
...    |
1617 | |         assert_qname_eq!(element.name(), "hello");
1618 | |     }
     | |_____^
     = note: inside `<[closure@src/dom.rs:1601:5: 1618:6] as std::ops::FnOnce<()>>::call_once - shim` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `test::__rust_begin_short_backtrace::<fn()>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:516:5
     = note: inside closure at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:507:30
     = note: inside `<[closure@test::run_test::{closure#2}] as std::ops::FnOnce<()>>::call_once - shim(vtable)` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send> as std::ops::FnOnce<()>>::call_once` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1328:9
     = note: inside `<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>> as std::ops::FnOnce<()>>::call_once` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:322:9
     = note: inside `std::panicking::r#try::do_call::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
     = note: inside `std::panicking::r#try::<(), std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
     = note: inside `std::panic::catch_unwind::<std::panic::AssertUnwindSafe<std::boxed::Box<dyn std::ops::FnOnce() + std::marker::Send>>, ()>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14
     = note: inside `test::run_test_in_process` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:538:18
     = note: inside closure at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:449:39
     = note: inside `test::run_test::run_test_inner` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:474:13
     = note: inside `test::run_test` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:504:28
     = note: inside `test::run_tests::<[closure@test::run_tests_console::{closure#2}]>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:283:13
     = note: inside `test::run_tests_console` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/console.rs:289:5
     = note: inside `test::test_main` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:121:15
     = note: inside `test::test_main_static` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/test/src/lib.rs:140:5
     = note: inside `main`
     = note: inside `<fn() as std::ops::FnOnce<()>>::call_once - shim(fn())` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
     = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<fn(), ()>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125:18
     = note: inside closure at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:66:18
     = note: inside `std::ops::function::impls::<impl std::ops::FnOnce<()> for &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>::call_once` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:259:13
     = note: inside `std::panicking::r#try::do_call::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:379:40
     = note: inside `std::panicking::r#try::<i32, &dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:343:19
     = note: inside `std::panic::catch_unwind::<&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe, i32>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:396:14
     = note: inside `std::rt::lang_start_internal` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:51:25
     = note: inside `std::rt::lang_start::<()>` at /home/ssokolow/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/rt.rs:65:5
     = note: this error originates in an attribute macro (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

error: test failed, to rerun pass '--lib'

Miri isn't exhaustive, but I consider it to be the bare minimum that the tests for a crate which uses unsafe or has it in its transitive dependencies must pass and, once you've fixed this, I'd strongly recommend adding it to your CI runs, similar to how I'd want a C or C++ codebase to run their tests under LLVM's various sanitizers (i.e. ASan, UBSan, etc.)

Rename Root::append_child to Root::set_child?

    {
        let doc = package.as_document();
        let hello = doc.create_element("hello");
        let world = doc.create_element("world");
        doc.root().append_child(hello);
        doc.root().append_child(world);

        let stdout = &mut io::stdout();
        format_document(&doc, stdout).ok().expect("unable to output XML");
        stdout.flush();
    }

expected output: <?xml version='1.0'?><hello/><world/>
actual output as of v. 0.2.6. : <?xml version='1.0'?><world/>

Is this a bug in the code or in my expectations?

parsing fails with &amp;

Fantastic crate! Thank you so much!

I am parsing XML 1.0 documents and found that it fails to parse anything containing &amp;, even though it seems to be the recommended way to escape & in the XML 1.0 spec.

Are raw pointers and unsafe actually needed?

Hello,

This is probably more of an open question than an actual issue.

There seems to be quite a lot of unsafe code in this crate (mostly pointer handling) and there are some issues that mention memory unsafety. So I was wondering: Is this unsafe code actually needed?

Let me know if I am wrong but I guess the use of pointers was preferred over Rc/Weak references to get better performances. But, on the other hand, this crate offers a DOM interface which is expected to be a bit slower and heavier than SAX, and we usually expect memory safety from rust crates.
So, do you think replacing the unsafe code would have so much of an impact on performance that it is worth the potential safety/security issues of dealing with complex pointers relationships?

I am in no way an expert in Rust (or XML parsing) by the way so do not hesitate to let me know if I am totally wrong.

As a completely unrelated side note, you seem to have worked quite a lot on this crate (thank you for sharing it with us!), so I was wondering if you consider the DOM part of the project to be production ready or if you think it still needs more work?

Thanks

Can we avoid re-interning strings?

In the following code, we know that name is already interned by this document. It would be nice to avoid re-hashing it when we go back to create a new element.

let name = element.name();
document.create_element(name);

Should investigate if there's any performance to be gained here though.

How to dump out CDATA element

Is there some way to prevent < > substitution for CDATA text element?

add a text node like <![CDATA[Data]]> will give:

<?xml version=\'1.0\'?><FILE><CONTENT_TYPE>&lt;![CDATA[Data]]&gt;</CONTENT_TYPE></FILE>

In this case I'll get back <![CDATA[Data]]> as CONTENT_TYPE content instead of simply Data.

Can I work around on this behaviour or where I should take a look to implement it?

Thanks

Move away src/bin/open.rs

Hello,

I think that should be moved to examples directory because otherwise people can do cargo install sxd-document and it will override open binary which they most likely already have installed.

Discriminator methods for enums

There are some enums like ChildOfRoot or ChildOfElement that basically just differentiate if this particular child is an Element, a Text, a Comment or a ProcessingInstruction.
What do you think about adding discriminator functions, á la:

fn is_element(&self) -> bool
fn is_comment(&self) -> bool
...

?

I'd volunteer to implement this ;)

InvalidProcessingInstructionTarget for standalone="no"

parsing this:

extern crate sxd_document;

use sxd_document::parser::Parser;

fn main() {
    let s = "<?xml version='1.0' encoding='UTF-8' standalone='no'?><doc/>";

    let parser = Parser::new();
    let doc = parser.parse(s);
    println!("{:?}", doc)
}

yields

Err((2, [InvalidProcessingInstructionTarget]))

yup.

Parsing an external DTD fails

Sorry for the lengthy blob of xml, but I'm sort of at a loss for why this might be failing:

#[cfg(test)]
mod tests {
    use self::sxd_document::parser;

    #[test]
    fn test_parse_sample() {
        let xml = r##"<?xml version="1.0"?>
<!DOCTYPE cXML SYSTEM "http://xml.cxml.org/schemas/cXML/1.2.014/cXML.dtd">
<cXML xml:lang="en-US"
      payloadID="933695160894"
      timestamp="2002-08-15T08:47:00-07:00">
    <Header>
        <From>
            <Credential domain="DUNS">
                <Identity>83528721</Identity>
            </Credential>
        </From>
        <To>
            <Credential domain="DUNS">
                <Identity>65652314</Identity>
            </Credential>
        </To>
        <Sender>
            <Credential domain="workchairs.com">
                <Identity>website 1</Identity>
            </Credential>
            <UserAgent>Workchairs cXML Application</UserAgent>
        </Sender>
    </Header>
    <Message>
        <PunchOutOrderMessage>
            <BuyerCookie>1CX3L4843PPZO</BuyerCookie>
            <PunchOutOrderMessageHeader operationAllowed="edit">
                <Total>
                    <Money currency="USD">763.20</Money>
                </Total>
            </PunchOutOrderMessageHeader>
            <ItemIn quantity="3">
                <ItemID>
                    <SupplierPartID>5555</SupplierPartID>
                    <SupplierPartAuxiliaryID>E000028901</SupplierPartAuxiliaryID>
                </ItemID>
                <ItemDetail>
                    <UnitPrice>
                        <Money currency="USD">763.20</Money>
                    </UnitPrice>
                    <Description xml:lang="en">
                        <ShortName>Excelsior Desk Chair</ShortName>
                        Leather Reclining Desk Chair with Padded Arms
                    </Description>
                    <UnitOfMeasure>EA</UnitOfMeasure>
                    <Classification domain="UNSPSC">5136030000</Classification>
                    <LeadTime>12</LeadTime>
                </ItemDetail>
            </ItemIn>
            <ItemIn quantity="1">
                <ItemID>
                    <SupplierPartID>AM2692</SupplierPartID>
                    <SupplierPartAuxiliaryID>A_B:5008937A_B:</SupplierPartAuxiliaryID>
                </ItemID>
                <ItemDetail>
                    <UnitPrice>
                        <Money currency="USD">250.00</Money>
                    </UnitPrice>
                    <Description xml:lang="en-US">ANTI-RNase (15-30 U/ul)</Description>
                    <UnitOfMeasure>EA</UnitOfMeasure>
                    <Classification domain="UNSPSC">41106104</Classification>
                    <ManufacturerName/>
                    <LeadTime>0</LeadTime>
                </ItemDetail>
            </ItemIn>
        </PunchOutOrderMessage>
    </Message>
</cXML>
"##;
        parser::parse(xml).unwrap();

    }
}

The parse call fails with

panicked at 'called `Result::unwrap()` on an `Err` value: (50, [ExpectedClosingQuote("\"")])', /checkout/src/libcore/result.rs:916:5

which by my math is somewhere inside the dtd uri. It parses successfully if I remove the doctype tag entirely, but I can't imagine where the quotes should be tripping this up.

Derive Clone for Package

Consider a case where a template XML is used to produce several other XMLs with different modifications; it seems that at this moment this requires the template to be parsed as many times as the number of desired target XMLs, because Package doesn't implement Clone and cloneing a Document merely copies the references to the parent Package, which doesn't provide a fresh copy of the parsed structure. Having to parse the same file several times is pretty inefficient - cloning Packages would be much faster.

I tried to #[derive(Clone)] for Package myself, but I encountered an issue: typed_arena::Arena isn't Cloneable either. Is there any way around this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.