m4rw3r / chomp Goto Github PK

View Code? Open in Web Editor NEW

242.0 242.0 19.0 7 MB

A fast monadic-style parser combinator designed to work on stable Rust.

License: Apache License 2.0

Rust 100.00%

chomp's People

Contributors

Stargazers

Watchers

Forkers

soderstroff matejlach rseymour vincent23 llogiq stuecklberger dashed yanns jsdw dario23 tempbottle koute alex-shapiro ct075 ajunlonglive standardgalactic jedisct1

chomp's Issues

Provide alternatives preventing overflow and unbounded parsing

Problem

The integer-types are limited in what values they can represent, the decimal parser does not have an upper bound on the number of digits to parse or the maximum allowed value which can cause an overflow. It is also sometimes desirable to parse numbers in a certain range or with a specific number of digits.

Combinators like many, many1 and similar also pose an issue as eg. a DoS vector since some patterns might cause the parser using many (or similar) to allocate a lot of memory for the T: FromIterator instance.

Solution

To still make it easy to use the parsers and combinators the unbounded versions should still remain and be the default, but bounded versions need to be provided. Most of the issues (except for the overflow in ascii::decimal) can be avoided or limited by using a buffer::FixedSizeBuffer or setting a limit on buffer::GrowingBuffer. Additional measures can be added on top of the buffer::Source or a buffer::Stream instance.

ascii::decimal: can result in overflow since there is no limit to the number of digits it parses
combinators::many: bounded::many exists
combinators::many1: bounded::many exists
combinators::many_till: bounded::many_till exists
combinators::sep_by: essentially bounded::many but is lacking a bounded version
combinators::sep_by1: essentially bounded::many but is lacking a bounded version
buffer::GrowingBuffer: has a limit option

run_scanner state can't depend on last token

I'm trying to parse one utf8 character. I tried run_scanner and std::char::from_u32, but it doesn't work because when I get a whole character, the way to signal it is to return None, which throws away the state.

Adapter for Nom

Feature which enables an adapter-function to call out to a Nom parser.

impl<'a, I: 'a> Input<'a, I> {
    // ...
    fn nom_parser<T, E, F>(self, F) -> ParseResult<'a, I, T, E>
      where F: FnOnce(&'a [I]) -> nom::IResult<&'a [I], T, E>;
}

Use a closure if extra parameters need to be passed to the Nom parser.

Also needs to support an error type allowing for both Nom's Err type as well as Chomp's parsers::Error.

Maybe use quickcheck to simplify some tests

https://github.com/BurntSushi/quickcheck

HTML extract link parser.

Do you think is it possible to extract attributs Link (<a> tag element) from HTML document?
If yes, can you write/explain an example parser?

Parse utilities

Problem

Currently users have to either use the provided buffer::IntoStream trait or Input::new(&[I]). The first one might not be so useful, and the second one needs a replacement.

Proposed solution

fn parse_only<'i, I, T, E, D, P>(D, P) -> Result<T, ParseError<E>>
  where I: 'i,
            D: Into<&'i [I]>,
            I: 'i,
            T: 'i,
            E: 'i,
            P: FnOnce(Input<'i, I>) -> ParseResult<'i, I, T, E>;

Should work like Attoparsec's parseOnly.

Accessing numbering::InputPosition::position via map_err

I have a usecase where I'd like to somehow pass numbering::InputPosition::position to an Error type as a way of reporting parsing errors at a location (e.g. line/column location).

The issue is that I'm unable to access numbering::InputPosition::position from within chomp::types::ParseResult::map_err function.

I adapted map_err into map_err2 as follows: dashed@3f1998b

This enables me to do this:

type ESParseResult<I, T> = ParseResult<I, T, ParseError>;

fn some_parser<I: U8Input>(i: InputPosition<I, CurrentPosition>)
    -> ESParseResult<InputPosition<I, CurrentPosition>, ()> {
    parse!{i;

        let _var = (i -> {
            string(i, b"var")
                .map_err2(|_, i| {
                    let loc = i.position();
                    ParseError::Expected(loc, "Expected var here.")
                })
        });

        // ...

        ret {()}
    }
}

I'd love to hear any feedback on this, especially for any better alternative approaches. 👍

Appendix

CurrentPosition type for reference:

#[derive(Debug, Copy, Clone, PartialEq, Eq, Ord, PartialOrd, Hash)]
pub struct CurrentPosition(
    // The current line, zero-indexed.
    u64,
    // The current col, zero-indexed.
    u64
);

impl CurrentPosition {
    // Creates a new (line, col) counter with zero.
    pub fn new() -> Self {
        CurrentPosition(0, 0)
    }
}

impl Numbering for CurrentPosition {
    type Token  = u8;

    fn update<'a, B>(&mut self, b: &'a B)
        where B: Buffer<Token=Self::Token> {
            b.iterate(|c| if c == b'\n' {
                self.0 += 1; // line num
                self.1 = 0;  // col num
            } else {
                self.1 += 1; // col num
            });
    }

    fn add(&mut self, t: Self::Token) {
        if t == b'\n' {
            self.0 += 1; // line num
            self.1 = 0;  // col num
        } else {
            self.1 += 1; // col num
        }
    }
}

pub trait Input: Sized {

    // ...

    #[inline]
    pub fn map_err2<V, F>(self, f: F) -> ParseResult<I, T, V>
      where F: FnOnce(E, &I) -> V {
        match self {
            ParseResult(i, Ok(t))  => ParseResult(i, Ok(t)),
            ParseResult(i, Err(e)) => {
                let err = f(e, &i);
                ParseResult(i, Err(err))
            },
        }
    }

    // ...
}

EOF and non-finite parsers

Problem

Currently end of file (EOF) poses an issue for many parsers which allow for an undetermined amount of input data (many, many1, take_while, take_while1, take_till). There is also the issue of matching end of file conditionally in some cases.

many, many1:

If a segment stops at the end of the current input slice many will always report incomplete, which is the correct behaviour if the current input slice can be extended with additional input. The issue arises whenever the user tries to parse something which ends with a many parser:
```
// pretty naive parser which splits everything into chunks of 2 and
// stores it in a Vec:
many(parser!{take(2)})
```
The parser above will always report incomplete, even when end of input has been reached. The correct behaviour would be to end the repetition provided by many exactly at EOF and then report success instead of incomplete. The take parser should report incomplete, which is propagated through many if take encounters EOF before having read the 2 requested tokens.
take_while, take_while1:

Current behaviour will only end parsing when the predicate becomes false, but the parameter to the predicate can not represent EOF and therefore take_while reports incomplete whenever the end of the slice is encountered.

Correct behaviour when EOF is encountered would be to succeed with what has been matched so far and then let any following parser report incomplete if they require a minimum number of tokens to parse.
eof:

The current way of determining EOF is to split the parser into several parts and then let a separate piece of code control the invocation of the parsers as well as check for EOF.

A specific eof parser would be desireable:
```
parse!{i;
    let r = record();
    or(parser!{eof(); ret r},
       parser!{second_part()})}
```

Proposed solution

Store a flag in Input which lets the parsers know if the current input slice is finite or if there might be more data "after the end". This flag will enable parsers like eof (matches end of input) and many to work properly when encountering an end of a slice which is end of input.

`parse!` does not support inline `map`, `map_err` or `bind` for named actions

Problem

The parse! macro does not allow anything but a standard function call as a named action in its grammar:

NAMED_ACTION  = $ident '(' ($expr ',')* ','? ')'

It is desirable to be able to write code like this:

fn make_it_cool(n: u32) -> u32 { n + 2 }
fn validate_number<I>(i: Input<I>, n: u32) -> ParseResult<I, u32, NumberError> {
    if n == 10 { i.ret(n) } else { i.err(NumberError) }
}

parse!{i;
    let x = decimal().map(make_it_cool);
    let y = decimal().bind(validate_number);
    ret x + y
}

Possible solutions

Explicitly add `bind`, `map` and `map_err` to the grammar

This will enable the syntax above, but only for bind, map and map_err it will not enable any method to be called on the return value of the called function. This comes at the price of a much more complex macro for the actions part because of the explicit parts of the grammar.

Do nothing and instruct people to use the inline-action form

// using the same functions from above
parse!{i;
    let x = i -> decimal(i).map(make_it_cool);
    let x = i -> decimal(i).bind(validate_number);
    ret x + y
}

Drawbacks:

The Input<I> type gets exposed needlessly inside of the macro
More characters to write for the user

TODO: Any more possible solutions?

`impl Trait` and making Chomp require nightly

Using conservative impl Trait we can change Chomp from being a monad-like parser-combinator into a full-fledged monad. Current development can be found in the https://github.com/m4rw3r/chomp/tree/experiment/impl_trait branch.

The monad has several benefits; macro is simpler, parsers are easier to compose due to not being forced to pass around the input state, simpler type-signatures for more complex parsers (associated types) and so on. The downside is that it currently only works on nightly rust and is incompatible with the current (0.2.6) version of Chomp.

The question here is what to do with the monad-like version of Chomp? I am committed to finishing up version 0.3.0 of Chomp using the monad-like parser using an Input trait which is compatible with the Input trait of the monad version (if you change a few method names). The contents of the buffer module are also largely unchanged between the two.

I see two choices here:

Let 0.x.y versions of Chomp be monad-like, and release a monad-like 1.0.0 when it is equal to the functionality of Attoparsec, leaving 2.0.0 and up for the monad version which will require nightly until impl Trait has been stabilized.
Release the monad version under a new crate-name (eg. chomp2)

Essentially this boils down to the question: How important is support for stable rust for users of Chomp?

Reorganize types

Currently Chomp's module-tree is a bit deep for what it does. Would be more ergonomic with a flatter tree.

The most common types should be in the crate root:

chomp::Input
chomp::ParseResult (Parser in the case of 2.0)
chomp::Buffer <- Rename

Should be renamed, conflicts with chomp::buffer as well as chomp::buffer::Buffer and is more like a window into the token-stream, it is not guaranteed to be a buffer or slice. Maybe Window?

Be able to unwrap Input from InputPosition.

Hi!

I have a usecase where I'd need to adapt chomp::buffer::InputBuf to chomp::types::numbering::InputPosition (via wrapping) and back again (via unwrapping).

However, I'm unable to unwrap Input from InputPosition.

I simply add impl IntoInner trait to InputPosition: dashed@e5dfe7e

My use case is as follows; I'd love any feedback or better alternative approaches:

struct SourceFile {
    input: chomp::buffer::Source<
        chomp::buffer::data_source::ReadDataSource<std::fs::File>,
        chomp::buffer::FixedSizeBuffer<u8>>,
    position: CurrentPosition
}

impl SourceFile {
    fn new(path_to_file: &Path) -> Self {

        let reason = format!("Failed to open file: {:?}", path_to_file);
        let file: File  = File::open(path_to_file).expect(&reason);

        let input = chomp::buffer::Source::new(file);

        SourceFile {
            input: input,
            position: CurrentPosition::new()
        }
    }

    fn __parse_token<'a>(&'a mut self) -> Result<Token, StreamError<&'a [u8], ParseError>>  {

        let mut this_position = self.position;

        let result = {
            let m = |old_input: chomp::buffer::InputBuf<'a, u8>| {

                let input_override = InputPosition::new(old_input, this_position);
                let (input_override, parse_result_raw) = token_parser(input_override).into_inner();
                let (old_input, new_position) = input_override.into_inner();

                this_position = new_position;

                old_input.from_result(parse_result_raw)
            };

            self.input.parse(m)
        };

        self.position = this_position.clone();

        result
    }

    fn parse(&mut self) {
        loop {
            match self.__parse_token() {
                Ok(t) => {
                    print!("{:?}", t);
                    continue;
                },
                Err(StreamError::Retry) => {
                    // Needed to refill buffer when necessary
                    continue;
                },
                Err(StreamError::EndOfInput) => {
                    break;
                },
                Err(StreamError::ParseError(_buf, err)) => {
                    panic!("{}", err);
                },
                Err(StreamError::Incomplete) => {
                    panic!("Parser failed to complete with the available data.");
                },
                Err(StreamError::IoError(err)) => {
                    panic!("{:?}", err);
                }
            }

            // NOTE: should continue from above
            unreachable!();
        }
    }
}

`string` can cause parsing to be slow with `verbose_error`

Problem

string allocates a copy of the string it was trying to match on error in verbose_error mode. This can cause really bad performance when eg. trying to match multiple different tags using nested or-combinators (used in the mp4-benchmark from nom_benchmarks).

`or` with many branches

I need a parser with multiple branches. My first attempt was:

or(i, parser! { string(b"ERROR"); ret LogLevel::Error }, |i| {
    or(i, parser! { string(b"WARN"); ret LogLevel::Warn }, |i| {
        or(i,
           parser! { string(b"INFO"); ret LogLevel::Info },
           parser! { string(b"DEBUG"); ret LogLevel::Debug })
    })
})

It works, but it feels too verbose, so I tried to write this macro:

macro_rules! alt {
    ($i:expr, $a:expr) => { $a };
    ($i:expr, $a:expr, $b:expr) => { or($i, $a, $b) };
    ($i:expr, $a:expr, $($b:expr),*) => { or($i, $a, |i| alt!(i, $($b),*)) };
}

Now, the parser looks much better:

alt!(i,
    { parser! { string(b"ERROR"); ret LogLevel::Error } },
    { parser! { string(b"WARN");  ret LogLevel::Warn } },
    { parser! { string(b"INFO");  ret LogLevel::Info } },
    { parser! { string(b"DEBUG"); ret LogLevel::Debug } }
)

I have some questions:

Is this a good solution? Is there any alternative?
Can you include something like this in the crate? I think that more people will need a multi-or combinator.

Thanks!

Remove type parameter default on functions and methods

See rust-lang/rust#30724 and https://gist.github.com/nikomatsakis/760c6a67698bd24253bf

These are warnings in the nightly.

ForEach macro/method/function to make it easier to use Stream

Problem

Cannot use a normal for loop with Source or Stream, the Iterator interface is not compatible with either.

Preferably there should be a for_each method on the Stream trait:

pub trait Stream<'a, 'i> {
    type Item: 'i;

    fn for_each<P, F, T, E>(&'a mut self, mut parser: P, mut body: F)
      where P: FnMut(Input<'i, Self::Item>) -> ParseResult<'i, Self::Item, T, E>,
            F: FnMut(T),
            T: 'i,
            E: 'i;
}

NOTE: Does not work as written above.

Design considerations

The current stream implementation necessitates a mutable borrow for each iteration, since Source might need to fill/reallocate/move data in the internal buffer as well as modify the state of the reader. This makes each return value from Stream::parse prevent any more attempts to parse data until after the first return value has gone out of scope.

Copy types or cloning circumvents this, but then we are not doing zero-copy parsing, and the signature cannot be very general either.

let a = iterator.next();
let b = iterator.next();
println!("a: {:?}, b: {:?}", a, b);

vs:

{
    let a = stream.parse(my_parser);

    print!("a: {:?},", a);
}
{
    let b = stream.parse(my_parser);

    println!("b: {:?},", b);
}

Function calls are also an issue, any call passing stream-data to a generic closure will borrow the return from Stream::parse until the variable storing the function goes out of scope which also borrows the stream itself. This prevents any attempt to write a function or method managing this.

Macro

Writing it as a macro has a few merits, it will avoid most of the issues with generics as all types are known.

The issue with macros is that sometimes some function-calls still cause issues with borrowck, and the panic! macro is among them. So no exit of the loop using panic! with a dump of the error (ParseError carries a reference to the input buffer and panic! seems to be borrowing that for the rest of the function, making it impossible to loop.

The macro also needs to be aware of the autofill feature of any possible Stream, if it needs to invoke a method causing it to prepare more data for the parser (of course the user has to want it to automatically fill itself while looping).

Proper parsers tests and examples for using `&str` as a parse source

Problem

Currently all examples and tests are using byte-slices since that is most likely the most common source-input of a parser. But there are situations where an already valid UTF-8 string might need parsing, this requires some specialized parsers (the combinators should work fine), associated tests as well as examples on how to parse a &str.

Request: readable str error messages

My errors print out as:

FAIL: Error([10, 97, 99, 108, ...], Unexpected)

So I have to manually decode each number using the ascii table. This makes for a long/painful edit/compile/test cycle.

I need the ParseErrors to be printed as a human readable strings. If I can get access to the underlying &[u8] I'm happy to convert to a &str myself (ParseError does not provide access?)

Example code:

match parse_only(test, TEST_SIMPLE.as_bytes()) {
    Ok(p) => assert_eq!(p, r),
    Err(e) => panic!("FAIL: {:?}", e)
};

I enabled verbose_error, but I still only see encoded bytes.

[dependencies.chomp]
features = ["verbose_error"]
version = "*"

Any thoughts / suggestions on how I can use chomp better would be much appreciated.
Thanks!

Attoparsec parsers

Attoparsec has a lot of good parsers and combinators, would be a good idea to implement most if not all of them.

`Data.Attoparsec.ByteString`

Individual bytes

Lookahead

peekWord8 -> peek
peekWord8' -> peek_next

Byte classes

inClass
notInClass

Efficient string handling

Consume all remaining input

takeByteString -> take_remainder

Combinators

State observation

endOfInput -> eof

`Data.Attoparsec.ByteString.Char8`

Special character parsers

digit -> ascii::digit
letter_iso8659_15
space -> predicate ascii::is_whitespace

Fast predicates

isDigit -> ascii::is_digit
isAlpha_iso8859_15
isAlpha_ascii -> ascii::is_alpha
isSpace -> ascii::is_whitespace
isHorizontalSpace -> ascii::is_horizontal_space
isEndOfLine -> ascii::is_end_of_line

Efficient string handling

stringCI -> ascii::string_ci
skipSpace -> ascii::skip_whitespace

Numeric parsers

decimal -> ascii::decimal
hexadecimal
signed -> ascii::signed
double -> ascii::float
rational can probably be merged with double
scientific can probably be merged with double

`Data.Attoparsec.Combinator`

lookAhead -> combinators::look_ahead

`Data.Attoparsec.Text`

asciiCI -> ascii::string_ci

Non `Copy` `Token`s?

As the title says, any way we can remove the Copy requirement on the Token associated type?

The main drawback of removing it is that most filter-functions will need to operate on borrows (satisfy, take_while and so on), which will introduce some extra typing (one or two extra & or * per filter).

The benefit of removing Copy would be that we can easily do multi-stage parsing (commonly lexer + parser in separate parts) solely in Chomp, relying on the previous parser to construct a new token stream, and this could then refer to the initially parsed data all the way through (depending of course on lifetime requirements of the source itself).

This would be a breaking change, probably 1.0.0 and/or 0.4.0.

Split InputModify and ParseResultModify into more specialized traits

Problem

internal::InputModify and internal::ParseResultModify contain too many methods, making it hard to see what a part is actually doing with the Input and ParseResult types.

Proposed solution

Remove ParseResultModify and InputModify and replace with the following traits:

Note: ParseResultModify::modify does not need any corresponding trait, it is unused.

Replace all uses of InputModify::modify with buffer() + replace(buffer).

`IntoInner` trait

pub trait IntoInner {
    type Inner;

    fn into_inner() -> Self::Inner;
}

Input -> (InputState, &[I])
ParseResult -> State

`ParseResultChain` for `ParseResult`

Is this really useful? It is used in two spots in combinators.rs.

pub trait ParseResultChain {
    fn chain<F, T, E>(self, F) -> ParseResult<'a, Self::Input, T, E>
      where F: FnOnce(State<'a, Self::Input, Self::Data, Self::Error>)
               ...;
}

`InputClone`

pub trait InputClone {
    fn clone_input(&self) -> Self;
}

`InputBuffer`

pub trait InputBuffer<'a> {
    fn buffer(&self) -> &'a [Self::Type]
      where <Self as InputModify<'a>>::Type: 'a;
    fn replace(self, &'a [Self::Type]) -> Self
      where <Self as InputModify<'a>>::Type: 'a;
    fn is_last_slice(&self) -> bool;
}

replace is never used without buffer (except for in one case combined with parse), and same goes for is_last_slice.

`InputIncomplete`

pub trait InputIncomplete {
    fn incomplete<T, E>(self, usize) -> ParseResult<'a, Self::Type, T, E>;
}

Applicative parsing

Problem

Some somewhat common constructions using monadic composition are more complex than they have to be. Applicative solves that by being less powerful and lacking context sensitivity (commonly):

do
    a <- digitP
    b <- digitP
    c <- digitP
    d <- digit
    return $ IP a b c d
  where digitP = digit >>= \x -> skip '.' >>= \_ -> return x

-- vs

pure IP <$> digitP <*> digitP <*> digitP <*> digitP <*> digit
  where digitP = digit <* (skip '.')

The default implementation of Applicative for a Monad also works for Chomp:

instance Applicative X where
    pure    = return
    d <*> e = do
      b <- d
      a <- e
      return (b a)
    (*>)    = (>>)
    x <* y  = x >>= \a -> y >> return a

But the issue here is that to be able to write things like

i.ret(IP).apply(digitP).apply(digitP).apply(digitP).apply(digit)

IP must be a function which is partially applied and is invoked like IP(192)(168)(0)(23) which is both very strange to use in Rust (which does not support partial application natively) and very slow since it forces boxed closures to be used to store the intermediate state.

Implementation

pure

Same implementation as Input::ret, use Input::ret instead of pure.
<*>

Problematic due to evaluation order and no partial application, see [Implementation] section.
*>

Does not need any implementation, just use then.
<*

Name skip, simple and no special considerations: self.bind(|i, t| rhs(i).map(|_| t))

Input::ret is kept as is to provide a way to lift values into the Applicative. then is also left as is as a method on ParseResult. skip is added to impl ParseResult since it is very useful even in monadic chaining.

apply is either implemented on a trait in a separate module or directly on the ParseResult.

Variants

There are multiple ways of implementing apply

Following Haskell's default Applicative

i.ret(capitalize).apply(any)

impl ParseResult<T, E> {
    pub fn apply<F, U, V, W>(self, F) -> ParseResult<W, V>
      where T: FnOnce(U) -> W,
            F: FnOnce(Input) -> ParseResult<U, V>,
            V: From<E>;
}

Pros

Simple
Follows a direct translation of Applicative laws
Does not consider values and functions differently

Cons

Only allows for 1-arity functions
Requires partial application to compose well
Requires utility functions or closures adapting the results if functions of n-arity are to be used
Unstable features are used if boxing of functions are needed

Monad laws

// Identity:
lhs_m.ret(|x| x).apply(f) === f(rhs_m)
// Composition:
lhs_m.ret(compose).apply(u).apply(v).apply(w) === u(rhs_m).apply(|i| v(i).apply(w))
// Homomorphism:
lhs_m.ret(f).apply(|i| i.ret(x)) === rhs_m.ret(f(x))
// Interchange:
u(lhs_m).apply(|i| i.ret(y)) === rhs_m.ret(apply_arg(y)).apply(u)

Tuple as an argument

fn digitP(i: Input<u8>) -> U8Result<u8> { digit(i).skip(|i| token(i, b'.')) }
i.ret(IP).apply((digitP, digitP, digitP, digit))

trait ApplyArgs<F> {
    type Output;
    type Error;
    fn apply(self, Input, F) -> ParseResult<Self::Output, Self::Error>;
}

impl<A, AT, AE, T, F> ApplyArgs<F> for A
  where F: FnOnce(AT) -> T,
          A: FnOnce(Input) -> ParseResult<AT, AE> {
    type Output = T;
    type Error = AE;

    fn apply(self, i: Input, f: F) -> ParseResult<Self::Output, Self::Error> {
        self(i).map(f)
    }
}

// Impls for tuples, eg. (A, B) where F: FnOnce(A, B) -> Out, A: FnOnce(Input) -> ParseResult<A, AE>, ...

impl ParseResult<T, E> {
    pub fn apply<A>(self, rhs: A) -> ParseResult<A::Output, A::Error>
      where A: ApplyArgs<T>,
             A::Error: From<E> {
        self.bind(|i, f| rhs.apply(i, f))
    }
}

Pros

Allows for arbitrary number of parameters
Compact syntax in usage
Does not treat functions and values differently
Applicative laws still apply if translated to tuples for multiple arguments

Cons

Fails to properly infer for closures in tuples like (|i| i.ret("foobar"), ...) (It cannot determine that i is Input) since it is using several traits, struct ParseResult -> trait ApplyArgs -> trait FnOnce -> concrete closure
Not very ergonomic at times since types have to be declared for all parameters.
Not possible to stack any values conditionally, the specific expression has to be known at compile time
Not as composable as the default since the tuples do not lend themselves well to partial application (which is kind of necessary to be able to express Applicative laws in a simple manner).
Horrible type impls for the tuples

Applicative laws

// Identity:
lhs_m.ret(|x| x).apply(f) === f(rhs_m)
// Composition:
lhs_m.ret(compose).apply((u, v)).apply(w) === u(rhs_m).apply(|i| v(i).apply(w))
// Homomorphism:
lhs_m.ret(f).apply(|i| i.ret(x)) === rhs_m.ret(f(x))
// Interchange:
u(lhs_m).apply(|i| i.ret(y)) === rhs_m.ret(apply_arg(y)).apply(u)

Stacking tuples

The idea here is to stack values into a tuple and then invoke a function with the given values:

i.ap(digitP).ap(digitP).ap(digitP).ap(digit).apply(IP)

Pros

Allows for somewhat conditional stacking of values
Follows Applicative laws if rewritten in Reverse Polish Notation with the stack-operator ´ap`

Cons

Treats values and functions differently
Treats tuples and non-tuples differently (cannot make a blanket impl for any T since it will conflict with all the impls for tuples).
Does not allow a smooth integration between monadic composition and applicative composition

Applicative laws

This version reverses the order of the apply operator arguments while still keeping the left to right ordering of the arguments supplied to ap:

// Identity:
lhs_m.ap(f).apply(|x| x) === f(rhs_m)
// Composition:
lhs_m.ap(w).apply(|i| i.ap(u).ap(v).apply(compose)) === rhs_m.ap(w).apply(v).apply(u)
// Homomorphism:
i.ap(|i| i.ret(x)).apply(f) === rhs_m.ret(f(x))
// Interchange:
i.ap(|i| i.ret(y)).apply(u) === i.ap(|i| i.ret(u)).apply(apply_arg(y))

Note that it does not play well with existing values in the Monad/Applicative since they are of type T which is not a tuple while the Applicative functions (ap and apply) only work on tuples in the applicative. IE. Input + ParseResult<T, E> is always a Monad, but only an Applicative if T is a tuple of some kind.

Use Humpty Dumpty to verify linearity of `Input` and `ParseResult`

Use https://github.com/Manishearth/humpty_dumpty to verify, optional dependency since it only works on nightly.

Use std::str::from_utf8_unchecked instead of std::mem::transmute

I noticed this in ascii.rs -- transmute is not necessarily future-proof, whereas from_utf8_unchecked is. Thought I'd mention it. :-)

Is there a way to get current position?

Hi! I'm wondering if it would be possible to add a function that could provide the current position in the file (or stream)?

In my case, I'm parsing from a file and would like to capture the line number in particular.

I haven't had a chance to dig through the code much yet, but if I were to take a stab at adding it, I'd definitely appreciate a few pointers! I'm guessing it would have to bubble up from the buffer...

Proper `prelude` module

More comprehensive examples

I've looked at a bunch of libraries for parsing in Rust and chomp's API feels most intuitive based on small examples. However, I've found myself struggling to make even a simple parser work in practice. It's pretty easy to mess up a macro and have the compiler complain, for example, and I'm still not sure how to parse something like "1.0" into an f64 efficiently and correctly.

Having a more complete example would be super helpful to people approaching the library. Any format with a reasonably wide variety of data types (strings, ints, floats) would be great - maybe JSON?

I'll submit a PR for something like this if/when I make enough progress.

`parse!`: Support for typehints for generic functions.

https://gist.github.com/euantorano/de9f1f90eece4785edf3

Remove `unwrap*` and `expect` from `ParseResult`, and `Input::new()`

Problem

ParseResult is an approximation of a linear type, the other half to Input. Allowing unwrap, unwrap_err and expect directly on the ParseResult violates the basic rules of the linear type, allowing it to be destroyed prematurely.

ParseResult has one opt-in trait which lets the user expose the internal state to build primitives, but unwrap, unwrap_err and expect are always present on the type and encourages uses to treat it like a Result and calling unwrap whenever they see fit. If a user uses unwrap and friends in a parser it may not be composable with other parsers and combinators.

Arbitrary construction of an Input is also an issue, preferably the parsing context should be isolated to encourage users to create composable parsers (ie. Input -> ParseResult). It will also prevent accidental loss of input state (eg. deconstructing Input through one of the traits provided and then using Input::new).

Proposed solution

Deprecate ParseResult::unwrap, ParseResult::unwrap_err, ParseResult::expect and Input::new once adapters to use any parser with an arbitrary input type become good enough. Then remove them in the next major version (hopefully this can be done before 1.0.0).

unwrap, unwrap_err and expect will be provided by Rust's standard library Result, which will be the primary return value of these adapters.

For the case where an Input actually needs to be constructed (eg. in a data-source feeding parsers) a separate new function is provided which is not exposed in the same module as Input and carries a warning about when to use it as well as enables the user to set the input state.

Streaming utilities and generalized input management

Since Chomp is a slice-based parser it cannot use an Iterator- or Read-based input directly, it has to work on slices since that is the most efficient method of allowing zero-copy parsing¹.

Buffers

Buffers generic over std::io::Read. These should automatically fill the buffer when necessary if configured to do so. If they failed to parse with the data acquired from the Read source but managed to read more than zero a Retry error should be returned to indicate that another attempt to read and parse the data will be necessary.

Growable buffer

Useful for parsing trusted data, or where amount of memory allocated for the parser-buffer does not matter. Should have an optional maximum buffer size, should error in the same way as the fixed size buffer if this limit is hit and it still fails to parse.

TODO: More spec

Fixed size buffer

Useful when parsing data from an unknown source, should only allocate a single slab on construction and then attempt to parse with at most the full size of the fixed size buffer. If a parser still wants more data than the fixed-size buffer can provide it should return an error indicating that the operation could not complete (with the total amount of data the parser requested (ie. including the currently used part of the fixed buffer)).

Manually managed buffer

A buffer where the user has control over when to attempt to fill the buffer. Useful for eg. cooperative multitasking, where an input could completely saturate a parser preventing any other operation from running.

Should probably be a configuration option on the existing buffers, will cause the buffer to skip filling automatically and only return a Retry until the user has called a method asking the buffer to fill itself.

`Source` trait

Trait which enables different buffer implementations to be treated the same.

The internal storage might not be necessarily tied to the source itself (eg. &[u8] supplied by user), so it has to have one lifetime for the struct implementing Source and one for the data itself (which the created T and E depend on).

pub trait Source<'a, 'i, I> {
    fn parse<F, T, E>(&'a mut self, f: F) -> Result<T, ParseError<'i, I, E>>
      where F: FnOnce(Input<'i, I>) -> ParseResult<'i, I, T, E>,
            T: 'i,
            E: 'i;
}

This trait is not compatible with for-loops since self is mutable and the resulting T and E prevent at least the internal storage from being modified (which prevents another mutable borrow of self). loop+match, while or macros will have to be used.

A matching Into-style trait is probably needed.

`ParseError` type

The generic parse error needs to be able to report the user defined parse errors, parse failures due to not enough data, indication to retry, any IO error and finally that there is nothing more to parse (ie. successful end).

1: A rope datastructure where pieces are yielded as they become available can enable us to write a pretty efficient "zero-copy" parser with possibility to resume. But it is "zero-copy" as in not copying the input to the parser, the data in the rope data-structure still needs to be allocated on some arena/heap/buffer before being passed to the parser.

Improve parse! macro documentation

Currently the parse! macro documentation does not detail exactly what operators like <* expand to in terms of normal code. Having access to this is useful to debug certain issues which can arise in macro usage.

Many_till combinator seems to end up in an endless loop

I wrote a naivish S-expression parser. I had a problem with the many_till combinator.

Here's the code: https://gist.github.com/anonymous/8a6fe19ab4e418eefad383455ae87a33 -- notice lines 56-60 where I would have wanted instead to do

many_till(parse_svalue, eof)

, but that just ended up in an endless loop with the simple "(a)" input. It's as if many_till did not notice that none of the three <|> -separated parsers matched. Is this a bug in chomp or am I misusing / missing something?

Update bitflags dependency

How do I examine success/fail?

let parse_result = parse!{i;
..
};

// I now have to execute some Rust code to see what parser I should call next.
let input2: Input<'i, u8> = match parse_result.into_inner() {
    // stuck here.
    // Ok(o) => o,
    // Err(e) => return parse_result
};
```rust

I'm a bit lost walking through the types. I simply want to continue with an Input, or return the parse_result.
Any help would be appreciated.
Thanks!

what type of argument to supply for i.err()?

What do I give as the argument?

I can't figure out what to use in place of 0:

fn expr(i: Input<u8>) -> U8Result<ExprRef> {
    or(i,
       literal(i),
       // TODO: all the other sorts of Expr
       i.err(0))  // <- here
}

I can't understand the diagnostics:

cargo test
   Compiling monte-rs v0.1.0 (file:///home/connolly/projects/monte-rs)
src/mast.rs:24:5: 24:7 error: the trait `core::ops::FnOnce<(chomp::input::Input<'_, u8>,)>` is not implemented for the type `chomp::parse_result::ParseResult<'_, u8, Box<alloc::rc::Rc<kernel::Expr>>, chomp::parsers::Error<u8>>` [E0277]
src/mast.rs:24     or(i,
                   ^~
src/mast.rs:24:5: 24:7 help: run `rustc --explain E0277` to see a detailed explanation
src/mast.rs:24:5: 24:7 note: required by `chomp::combinators::or`
src/mast.rs:24:5: 24:7 error: the trait `core::ops::FnOnce<(chomp::input::Input<'_, u8>,)>` is not implemented for the type `chomp::parse_result::ParseResult<'_, u8, _, _>` [E0277]
src/mast.rs:24     or(i,
                   ^~
src/mast.rs:24:5: 24:7 help: run `rustc --explain E0277` to see a detailed explanation
src/mast.rs:24:5: 24:7 note: required by `chomp::combinators::or`
src/mast.rs:24:5: 24:7 error: the trait `core::ops::FnOnce<(chomp::input::Input<'_, u8>,)>` is not implemented for the type `chomp::parse_result::ParseResult<'_, u8, Box<alloc::rc::Rc<kernel::Expr>>, chomp::parsers::Error<u8>>` [E0277]
src/mast.rs:24     or(i,
                   ^~
src/mast.rs:24:5: 24:7 help: run `rustc --explain E0277` to see a detailed explanation
src/mast.rs:24:5: 24:7 note: required by `chomp::combinators::or`
src/mast.rs:24:5: 24:7 error: the trait `core::ops::FnOnce<(chomp::input::Input<'_, u8>,)>` is not implemented for the type `chomp::parse_result::ParseResult<'_, u8, _, _>` [E0277]
src/mast.rs:24     or(i,
                   ^~
src/mast.rs:24:5: 24:7 help: run `rustc --explain E0277` to see a detailed explanation
src/mast.rs:24:5: 24:7 note: required by `chomp::combinators::or`
error: aborting due to 2 previous errors

Size hint for internal iterator used by `many*` and `count`

Implementing Iterator::size_hint will enable more efficient allocation for the containers. According to profiling a lot of time is spent allocating and reallocating in some situations, a better size_hint would improve performance.

count should yield (n, Some(n)) since it will always result in n elements on success.
many1 and sep_by1 should yield (1, None) since they have no upper bound
many, many_till and sep_by can use the default value of (0, None)

It might also be feasible to implement a combinator which is a hybrid of count and many, where specifying both a lower and upper bound is possible. This would make it a lot more efficient to allocate some parts (and by using monadic composition it can even reserve space for a certain known number of elements which was specified earlier in the message).

Spec for bounded `many`

fn many(Input<I>, R, Parser) -> ParseResult<I, T, E>
  where R:      BoundedRange,
        Parser: FnMut(Input<I>) -> ParseResult<I, U, E>,
        T:      FromIterator<U>

trait BoundedRange { ... }

impl BoundedRange for Range { ... }
impl BoundedRange for RangeFull { ... }
impl BoundedRange for RangeFrom { ... }
impl BoundedRange for RangeTo { ... }

Iteration should stop once the max value is reached (if it is specified by the range), no more than n items should be emitted unless the range is lacking an upper bound
A size_hint based on the range should be provided
If an error or incomplete is encountered outside of the range (ie. if fewer items than the lower bound have been emitted), the error should be propagated
If an error is encountered inside of the range the parser should be considered complete and return the resulting FromIterator value
If an incomplete is encountered inside of the range of the parser it should be considered complete if the input is END_OF_INPUT and input.len is 0.

TODO

Maybe move the core of Chomp into an inner crate?

To separate the core and the provided combinators, parsers and buffers it might be a good idea to move the core of Chomp into an inner crate. It would still be the same repository and the crate people use would still be chomp, but for the cases where people want a bare-bones parser-combinator, chomp-core could be used directly from the repository.

Items considered `chomp-core`

struct Input;
struct ParseResult;

macro_rules parse!;
macro_rules parser!;

mod primitives {
    trait InputBuffer;
    trait InputClone;
    trait IntoInner;

    struct State;

    mod input {
        const DEFAULT;
        const END_OF_INPUT;

        fn new;
    }

    mod parse_result {
        fn new;
    }
}

Items not considered core

(Despite looking like they belong)

SimpleResult, U8Result and mod parsers

These are just provided parsers, users might want to provide their own. The error type involved in SimpleResult and U8Result is only specific to the parsers in the parsers module.
ascii module

Just utilities
buffer module

Same, but for reading from Read and Iterator sources.
combinators module

Pretty generic, since they do not have a fixed error type of any kind. The user might want to provide their own though.

TOML example parser

Write a TOML example parser.

string parser (and possibly others internally using consume_while) force unnecessary stream reads

problem

the chomp::parsers::string parser (and possibly others internally using consume_while) might force unnecessary stream reads. example code:

#[macro_use]
extern crate chomp;

use chomp::prelude::*;
use chomp::buffer::{Source, Stream};

use std::net::TcpStream;


fn main() {
    let tcp = TcpStream::connect("faumail.fau.de:143").unwrap();
    let mut src = Source::new(tcp);

    // IMAP lines end in b"\r\n", so the real text is everything up to b'\r',
    // but we have to read the line ending nonetheless before reading any future stuff
    let p = src.parse(parser!{take_till(|c| c == b'\r') <* string(b"\r\n")});
    println!("{:?}", p);
}

expected output: Ok(<some bytes from the imap server welcome line>)

actual output: Err(Retry)

cause

the string parser (src/parsers.rs:378) uses consume_while(f), which first reads the next token from the input stream, and only after that inspects it (using f) for whether to consume it or not. note this is not a bug in consume_while, but its perfectly fine expected behaviour. the problem with using it the way it currently is for string(s) is that after len(s) tokens have been consumed, we could return successfully, but consume_while waits for the next token to call its decider function on (which then determines that it has read len(s) tokens already and tells consume_while to quit), which in some cases can force a read on the underlying stream when actually the answer would be clear.

solution

i wrote a (very hackish) fix for the string parser at https://github.com/dario23/chomp/tree/fix_string but (without having checked in depth) i'm expecting more parsers to be affected. probably a more exhaustive fix would include adding consume_while_max_n(f, usize).

i'd be happy to propose changes and submit a PR, but only after hearing your opinion on the matter :-)

Infinite loop?

skip_many() and many() do not seem to be propagating the incomplete state.
Or maybe the or combinator is always resetting the stream position and not propagating the error?

I expect the flow to be:

skip_many(all)
all OR tests b() and c() - both fail
all returns fail
skip_many returns fail <-- this does not happen ... infinite loop ...

Ideas?

Thanks!

i == "fffff".as_bytes(); // will never match any token...
parse!{i;
    skip_many(all);
    ...

pub fn c<'a>(i: Input<'a, u8>) -> U8Result<()> {
    parse!{i;
        take_while(is_whitespace);
        ret () } }

pub fn b<'i, 'a>(i: Input<'i, u8>, s: &'a str) -> U8Result<'i, ()> {
    parse!{i;
        take_while(is_whitespace);
        ret () } }

pub fn all<'a>(i: Input<'a, u8>) -> U8Result<()> {
    let s = String::new();
    parse!{i;
            b(&s) <|>
            c();
        ret () } }
```rust

FAQ and help for debugging `parse!` macro

This is an issue collecting problems related to the parse! macro. Since the error messages are not very helpful from rustc when dealing with complex macros I will try my best to help here.

Q: What does error: unexpected token:@__parse_internal ! { @ ACTION_NONTERM ( $ i , $ v : $ v_ty ) ; $ ( $ t ) * } } mean?

A: An error has been encountered in a let-statement. Possible causes:
- Syntax error on that line
- Last line of a parse! macro-invocation. let-statements are not allowed as the final row of a parse! invocation since that prevents any return. Use an action without let (returns the value of the action itself) or use ret or err to return a success or error value respectively.
Q: What does error: this function takes 1 parameter but 2 parameters were supplied [E0061] @ CONCAT ( $ f ( $ i , $ ( $ p ) , * ) , $ ( $ v ) * ) ; $ ( $ t ) * } } ; ( mean?

A: An invocation of an action has too many or too few parameters listed in its argument list. parse! always prepends the input-context of the parser to every function invocation within parse!.
Q: Are type-hints in function calls supported?

A: No, but there is an issue

Declare the type of the resulting variable instead:
```
let var: MyDataType<Something> = my_action();
```

Implement alternative `parse!` macro using compiler plugin

The purpose of this would be to avoid the macro recursion limit (even though library users can increase the limit if need be) but mainly to provide good error messages for syntax errors. The intention is NOT to replace the existing parse! macro at this point.

Possible ways of implementing this

`libmacro`

A crate in the standard distribution. It is intended to be stabilized and become part of stable Rust. Will require nightly until stable enough.

`syntex`

Separate library working as a macro-preprocessor. Can work on stable. The downside here is that any errors still present in the generated code will not point back to the original code which might make debugging errors not directly related to parse! syntax annoying.

Rename `internal` to something more appropriate

It is not a real internal module since it is supposed to be used by third party code to construct additional primitive parsers if needed.

Improving the `buffer` module

Rename module to stream, since it provides tools to deal with streaming data, not just buffering.

Renames

stream::Source <- data_source::DataSource
stream::IteratorSource <- data_source::IteratorDataSource
stream::ReadSource <- data_source::ReadDataSource
stream::FixedSizeBuffer
stream::GrowingBuffer
stream::BufferedInput <- InputBuf
stream::Slice <- SliceStream
stream::BufferedSource <- Source
stream::Error <- StreamError
stream::Buffer
stream::Stream

Fixes/Updates

Source::read should either be unsafe or should always be provided with zeroed memory.
Error only use Retry where the stream itself will attempt to refill before trying the parser again, and Incomplete should be used otherwise if the parser requires more data.
Look at how Tokio manages buffers and see if we can get that to mesh with Chomp

build script causes extra rebuilds

Please see https://users.rust-lang.org/t/do-you-have-a-build-script-build-rs-use-rerun-if-changed-by-default/7572

Adapter for working with Result and Option

Problem

Currently the user has to manually match a Result to exit using either ret or err depending on the branch:

bind(|i, s| match some_op(s) {
    Ok(o)  => i.ret(o),
    Err(e) => i.err(e),
}

Proposed solution

Additional methods on Input (ret_result and ret_option) as well as additions for those to the parse! macro.

Status

Input::from_result<T, E>(self, Result<T, E>) -> ParseResult<_, T, E>
Input::from_option<T>(self, Option<T>) -> ParseResult<_, T, ???>

How to handle the error value? () could be annoying since it requires From<()> implementation on every error coming in contact with from_option.

Returning the entire slice matched by a chain of parsers

Is there a clean way to use the parse! macro and return the entire slice that was matched? Currently, I do something like this:

// An identifier is an alphanumeric string that doesn't start with a digit.
fn identifier<I: U8Input>(i: I) -> SimpleResult<I, ()> {
    parse!{i;
        satisfy(is_alpha);
        take_while(is_alphanumeric);

        ret ()
    }
}

// An alias definition is two identifiers separated by an equals sign, e.g. "foo=bar".
fn alias<I: U8Input>(i: I) -> SimpleResult<I, (I::Buffer, I::Buffer)> {
    parse!{i;
        let (left, _)  = matched_by(identifier);
                        token(b'=');
        let (right, _) = matched_by(identifier);

        ret (left, right)
    }
}

It would be nicer if alias didn't have to use matched_by and could just say let left = identifier(). Does chomp provide a good way of doing this?

`or` fails with incomplete even though second parser could have succeeded on the input

combinators::or fails with Incomplete if the first parser reports incomplete, even if the second parser would have succeeded. This is not an issue when the input is not yet finite, since the first parser could be expecting a large piece of data and in that case asking for more input and then retrying is the correct action. But when the input is finite and parsing failed with incomplete on the first parser, the second parser should be attempted before giving up.

This will be a backwards incompatible change.

Make `Input` a trait

Problem

Currently the input type only allows for slices, and is special cased for situations where it may not be the whole of the input. I cannot provide any line/row/offset counting either since it is a concrete type and an extension with that functionality would impact all code.

This would provide a way to slot in position-aware wrappers to solve #38 neatly.

Proposed solution

Convert Input<I> into a trait, with ret and err as provided methods, the input-token type would be the associated type Token. All the primitive methods (currently provided by InputClone and InputBuffer) are also present but require an instance of the zero-sized type Guard which cannot be instantiated outside of the primitives module (note the private field). The primitives would be reachable through methods on a Primitives trait which has to be used separately (the blanket implementation for all Input makes it possible to easily use it once it is in scope).

use primitives::Guard;
pub use primitives::Primitives;

pub trait Input: Sized {
    type Token;
    type Marker;

    fn ret<T>(self, t: T) -> ParseResult<Self, T> {
        ParseResult(self, t)
    }

    fn _consume(self, usize, Guard)        -> Self;
    fn _buffer(&self, Guard)               -> &[Self::Token];
    fn _is_end(&self, Guard)               -> bool;
    fn _mark(&self, Guard)                 -> Self::Marker;
    fn _restore(self, Self::Marker, Guard) -> Self;
}

pub mod primitives {
    use Input;

    pub struct Guard(());

    pub trait Primitives: Input {
        fn consume(self, n: usize) -> Self {
            self._consume(Guard(()), n)
        }
        fn buffer(&self) -> &[Self::Token] {
            self._buffer(Guard(()))
        }
        fn is_end(&self) -> bool {
            self._is_end(Guard(()))
        }
        fn mark(&self) -> Self::Marker {
            self._mark(Guard(()))
        }
        fn restore(self, m: Self::Marker) -> Self {
            self._restore(Guard(()), m)
        }
    }

    impl<I: Input> Primitives for I {}
}

The mark method is the replacement for InputClone, it should be used with the restore method to restore the state of the Input to the old one.

Pros

Input can be implemented directly for slices, eliminating certain branches from parsers and combinators like many, take_while, eof and so on.
An Input implementation can be provided for line-counting which could be slotted in to provide line-counting in any existing parsers
The mark and restore methods would provide mechanisms allowing types which do not wholly consist of slices to work, though the buffer method is probably not the right choice for that, it will need a change to eg. support ropes.
All parsers need to be generic, before we could get away with only concrete types since Input<u8> is a concrete type. Input<Token=u8> will not be a concrete type.

Cons

Parser function signature change, very backwards incompatible:

// old
fn my_parser<'a, I>(i: Input<'a, I>, ...) -> ParseResult<'a, I, T, E>
// old, lifetime elision:
fn my_parser<I>(i: Input<I>, ...) -> ParseResult<I, T, E>
// new
fn my_parser<I: Input>(i: I, ...) -> ParseResult<I, T, E>

The type I: Input can no longer be guaranteed to be linear since the #[must_use] annotation cannot be put on the concrete type.

This is probably not an issue in practice since the I type is required by value to create a ParseResult and the ParseResult in turn is ultimately required by the functions which start the parsing.

Debug mode (feature) including backtraces in errors

Problem

Currently it can be a bit hard to debug parsers to see why they do not match the provided input. The fact that Chomp enables the easy splitting into functions, as well as ParseResult::inspect helps a bit, but being forced to add println! statements all over to find the exact part where the parser fails is annoying.

Proposed solution

Add another feature flag for errors which will use an error type which will retrieve a backtrace whenever instantiated. This will provide vaulable information to the user of the parser as he/she can immediately see which part of code the error comes from.

m4rw3r / chomp Goto Github PK

chomp's People

Contributors

Stargazers

Watchers

Forkers

chomp's Issues

Problem

Solution

Problem

Proposed solution

Appendix

Problem

Proposed solution

Problem

Possible solutions

Explicitly add bind, map and map_err to the grammar

Do nothing and instruct people to use the inline-action form

TODO: Any more possible solutions?

Problem

Problem

Design considerations

Macro

Problem

Data.Attoparsec.ByteString

Individual bytes

Lookahead

Byte classes

Efficient string handling

Consume all remaining input

Combinators

State observation

Data.Attoparsec.ByteString.Char8

Special character parsers

Fast predicates

Efficient string handling

Numeric parsers

Data.Attoparsec.Combinator

Data.Attoparsec.Text

Problem

Proposed solution

IntoInner trait

ParseResultChain for ParseResult

InputClone

InputBuffer

InputIncomplete

Problem

Implementation

Variants

Following Haskell's default Applicative

Pros

Cons

Monad laws

Tuple as an argument

Pros

Cons

Applicative laws

Stacking tuples

Pros

Cons

Applicative laws

Problem

Proposed solution

Buffers

Growable buffer

Fixed size buffer

Manually managed buffer

Source trait

ParseError type

Spec for bounded many

TODO

Items considered chomp-core

Items not considered core

problem

expected output: Ok(<some bytes from the imap server welcome line>)

actual output: Err(Retry)

cause

solution

Possible ways of implementing this

libmacro

Explicitly add `bind`, `map` and `map_err` to the grammar

`Data.Attoparsec.ByteString`

`Data.Attoparsec.ByteString.Char8`

`Data.Attoparsec.Combinator`

`Data.Attoparsec.Text`

`IntoInner` trait

`ParseResultChain` for `ParseResult`

`InputClone`

`InputBuffer`

`InputIncomplete`

`Source` trait

`ParseError` type

Spec for bounded `many`

Items considered `chomp-core`

`libmacro`

`syntex`