wokket / rust-hl7 Goto Github PK
View Code? Open in Web Editor NEWLearning rust by playing with a HL7 parser. Use for real at your own risk!
Learning rust by playing with a HL7 parser. Use for real at your own risk!
(just a minor nit ;-):
Line 214 in e1b8456
let segments: Result<Vec<Segment>, E> = possible.into_iter().collect()?;
:-)
While on a call about things that might eventually lead to the use of this to actually help people, i figured i might collect some performance data via cargo flamegraph --bench simple_parse --features string_index -- --bench
- results are interesting.
We're eating up less time than the threading libraries and benchmark suite by a good margin (expected, simple operations). Somehow you're beating me out pretty consistently on the query
vs index
bits :-p.
Unfortunately, GH won't lemme attach the flamegraph SVG by itself, but here's a screenshot of where we are today on an i9HK (gen9):
The pink (? sorry, kinda colorblind) parts are results for the rusthl7
search within the SVG, about 1/3 of the runtime measured.
Mostly filing this issue to plant a flag for baseline - now its here for posterity in case those numbers change for the worse.
HL7 escapes special chars in various ways, eg \Cxxyy\
for arbitary unicode bytes, \F\
for the field separator char etc.
I'd like to add support for decoding these chars, but keep it separate from the current structs initially:
Querying into the MSH doesn't work well due to constraints around running iterators and ownership required to index into the fields - i tried a match pattern, its pretty gross.
What if we keep the MSH data structure as a GenericSegment
but implement functional accessors to the well-known fields? Separators would move to the Message
struct to be referenced or copied (its bytes) by its segments, and then every data element is consistent in both structure and impl. Thoughts?
Thanks for getting this to build on stable releases - big win.
As i'm digging through using the library, i'm running into some odd visibility issue that i can't quite explain. When modifying the Message struct to expose fields,
#[derive(Debug, PartialEq)]
pub struct Message<'a> {
pub source: &'a str,
pub segments: Vec<Segment<'a>>,
}
and using the code from that file's test section in evcxr
(with one minor change of ?
to unwrap()
), i'm getting private field errors on access attempts to the struct:
>> let hl7 = "MSH|^~\\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01|CNTRL-3456|P|2.4\rOBR|segment";
>> let msg = Message::from_str(hl7).unwrap();
>> msg.segments
^^^^^^^^ private field
field `segments` of struct `Message` is private
>> msg.source
^^^^^^ private field
field `source` of struct `Message` is private
Debug prints show that the structure is populated correctly, but i'm unable to actually interact with the segments.
The segment structs have public members already, so i figure its not some implication derived by the compiler that a potential element may be private, unless its deeper in the stack than a GenericSegment
.
@wokket: Any chance you might have some insights to what i'm doing wrong here?
Using the Index<str>
trait to implement the awesome path based querying support doesn't make a lot of sense to me.
I'm proposing that we:
Edit, I'm going to make pushing this into a separate module a future problem for now,
2. possibly move it into it's own module within this crate.
~~JsonPath and others keep the selector/query logic separate from the data they're working on, and I like the fact that would keep the logic and tests separate, as I think there's going to have to be a lot of tests/docs to go with this. ~~
I think I like the naming of creating a Selector
, which can be used to query()
or query_path
a Message/segment etc but am option to better options.
Currently Message
is constructed ufing fn from_str(blah)
which triggers the following warning in clippy:
warning: method `from_str` can be confused for the standard trait method `std::str::FromStr::from_str`
--> src\message.rs:19:5
|
19 | / pub fn from_str(source: &'a str) -> Result<Self, Hl7ParseError> {
20 | | //TODO: Try and get this as a std::str::FromStr impl (lifetimes)
21 | |
22 | | let delimiters = str::parse::<Separators>(source)?;
... |
34 | | Ok(msg)
35 | | }
| |_____^
|
= note: `#[warn(clippy::should_implement_trait)]` on by default
= help: consider implementing the trait `std::str::FromStr` or choosing a less ambiguous method name
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#should_implement_trait
Update Message to implement TryFrom<'a str> instead. Note that this will be a breaking API change but simple to resolve by ensuring std::convert::TryFrom
is in scope, and replacing the from_str
call with try_from
.
As part of moving to a 'toolbox' approach rather than all-in-one framework move the querying into it's own area (like EscapeSequence
).
JsonPath uses a Selector
struct with new()
and find()
funcs which I quite like.
The current deep module structure made sense when I was considering having many variants of each struct to handle the different scenarios (repeating/non-repeating, componentised etc etc) but is really just annoying when trying to work with library due to the large number of use
statements required.
It would be a breaking change, but I'm tempted to move the core structs (Message, Segment, Field, Separators) into either the top level module, or a core
module for simplicity.
HAving recently learnt about Into
and read about writing a function that accepts both String
and &str
this seems like a good thing to look into to simplify the API surface area.
A quick spike makes it look doable :)
As of rust-lang/rust#85482, Rust nightly does not include the original try_trait bit including NoneError. Is there any chance you might have time to convert the implementation to current try_trait v2 mechanisms or possibly even stable builds? Thanks
See https://matklad.github.io/2021/09/04/fast-rust-builds.html for inspiration
No support for handling multiple repeats (~
delim by default)
A number of functions break when Clone isn't implemented for a Type, including things like to_owned() due to:
the method `to_owned` exists for struct `rusthl7::message::Message<'_>`, but its trait bounds were not satisfied
method cannot be called on `rusthl7::message::Message<'_>` due to unsatisfied trait bounds
note: the following trait bounds were not satisfied:
`rusthl7::message::Message<'_>: std::clone::Clone`
which is required by `rusthl7::message::Message<'_>: std::borrow::ToOwned`rustc(E0599)
message.rs(9, 1): doesn't satisfy `_: std::borrow::ToOwned`
message.rs(9, 1): doesn't satisfy `rusthl7::message::Message<'_>: std::clone::Clone`
main.rs(53, 80): method cannot be called on `rusthl7::message::Message<'_>` due to unsatisfied trait bounds
Now that we can extract pieces of messages as owned elements, i think we can leverage that to clone new messages from existing ones by extracting to a new object.
Thoughts?
The original implementation using String references is exceptionally good at read-oriented processing - data isn't copied, all segments and sub-segments are accessible, and we have good API to do so.
However, since we carry a reference, a lot of Rust's idiomatic standards dont play nice - FromStr
is not a thing here.
What do you think about a follow-up MessageMut and descendants for owned String with all those conveniences. The string ref bit is super useful for large quantities of messages being read and parsed out, so i think we want to keep it working that way. However, if we mirror a mutable implementation, it should give less discerning consumers a very standard API, mutability of messages (nice for an HL7 gen - my internal one is gross: string subs and message::from), and fit the standard trait paradigms from the ecosystem.
Thoughts?
If it's going to be used in prod the docco needs to be stepped up as well.
value
vs as_str
etc)The MSH impl is largely stolen from the other library, is complicating seg handling, triggering clippy warning about wildly varying enum sizes etc.
Move the MSH Segment out to an exmaple of how to use the library to build strongly-typed segments as a consuming app, and simplify the core library.
I've implemented the basic structure for parsing out component
and subcomponent
&str
s into vectors:
#[derive(Debug, PartialEq, Clone)]
pub enum Field<'a> {
Generic(&'a str),
Component(Vec<&'a str>),
Subcomponent(Vec<Vec<&'a str>>),
}
impl<'a> Field<'a> {
/// Convert the given line of text into a field.
pub fn parse(input: &'a str, delims: &Separators) -> Result<Field<'a>, Hl7ParseError> {
match input.find(delims.component) {
Some(_x) => {
match input.find(delims.subcomponent) {
Some(_x) => {
let components = input.split(delims.component).collect::<Vec<&str>>();
Ok(Field::Subcomponent(
components
.iter()
.map(|c| c.split(delims.subcomponent).collect::<Vec<&str>>())
.collect::<Vec<Vec<&str>>>()
))
},
None => Ok(Field::Component(input.split(delims.component).collect())),
}
}
None => Ok(Field::Generic(input))
}
}
...
However, getting data out of this construct is... non-trivial?
pub fn value(&self) -> &'a str {
let component = String::from(Separators::default().component);
let subcomponent = String::from(Separators::default().subcomponent);
match &self {
Field::Generic(g) => g,
Field::Component(c) => &c.join(&component),
Field::Subcomponent(s) => {
&s.into_iter().map(|sc|
&sc.join(&subcomponent)
).collect::<Vec<&str>>()
.join(&component).as_str()
}
}
}
because making an &str out of vectors of them or vectors of vectors of them requires some sort of borrowing function that i'm not quite getting.
@wokket - happen to have any insight to how i can implement that without
error[E0515]: cannot return value referencing temporary value
--> src/fields/mod.rs:64:9
|
64 | / match &self {
65 | | Field::Generic(g) => g,
66 | | Field::Component(c) => &c.join(&component),
| | ------------------ temporary value created here
67 | | _ => ""
... |
73 | | // }
74 | | }
| |_________^ returns a value referencing data owned by the current function
or
error[E0277]: a value of type `Vec<&str>` cannot be built from an iterator over elements of type `&String`
--> src/fields/mod.rs:70:19
|
70 | ).collect::<Vec<&str>>()
| ^^^^^^^ value of type `Vec<&str>` cannot be built from `std::iter::Iterator<Item=&String>`
|
= help: the trait `FromIterator<&String>` is not implemented for `Vec<&str>`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0277`.
error: could not compile `rust-hl7`
biting me in the arse?
The alternative approach would be to keep the field data as a string and perform components()
and subcomponents()
calls as functions against that, i think.
Thoughts?
I'm in the process of writing some convenience methods to access data and format it somewhat, but ran into this little gem along the way:
error[E0599]: the method `map` exists for struct `Vec<Field<'_>>`, but its trait bounds were not satisfied
--> src/main.rs:45:76
|
45 | let pid: &GenericSegment = generics.iter().find_map(|s| match s.fields.map(|f| f.value).collect().first() {
| ^^^ method cannot be called on `Vec<Field<'_>>` due to unsatisfied trait bounds
|
::: /home/username/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:374:1
|
374 | pub struct Vec<T, #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global> {
| ------------------------------------------------------------------------------------------------ doesn't satisfy `Vec<Field<'_>>: Iterator`
|
= note: the following trait bounds were not satisfied:
`Vec<Field<'_>>: Iterator`
which is required by `&mut Vec<Field<'_>>: Iterator`
`[Field<'_>]: Iterator`
which is required by `&mut [Field<'_>]: Iterator`
error: aborting due to previous error
For more information about this error, try `rustc --explain E0599`.
error: could not compile `hl7test`
Having a bunch of lifetime related fun too - writing functions outside the library for subsequent import into it is a bit harder than one would think because everything holds lifetime references to pieces of the overall struct. Good learning experience though :).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.