Git Product home page Git Product logo

earley's People

Contributors

amirlb avatar dagit avatar gabriella439 avatar kenkku avatar ollef avatar phadej avatar qrilka avatar sboosali avatar strake avatar tomjaguarpaw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

earley's Issues

Behavior of Semigroup/Monoid instances for Prod?

I noticed that the Monoid and Semigroup instances for Prod duplicate the Alternative instance. I found that unexpected. I was expecting them to lift the element instances over the Applicative instance, a la (<>) = liftA2 (<>). That's the behavior of those instances in many other parsing libraries.

Changing the instances now would be awkward, of course. People might not see the change and think that just because their grammars still compile, the code still is correct. I'm not really asking for a change. Just wondering why this design was chosen, and suggesting that a change is something you might consider if you ever make large-scale changes that would break things in other ways.

Parsing hangs even for simple ambiguous grammars

Consider the following toy example:

import Control.Applicative
import Text.Earley

data S = S [Char] Char [T] deriving Show
data T = T Char S deriving Show

a, b, c :: Prod r String Char Char
a = satisfy (=='a')
b = satisfy (=='b')
c = satisfy (=='c')

-- "abcb" -> S [a] b [T c (S [] b [])]
-- "abc" -> last 'c' should be unused
-- "abcc" -> parsing all of it without leaving unconsumed pieces should not be possible
-- "abcbcab" -> S [a] b [T c (S [] b [T c (S [a] b)])] OR S [a] b [T c (S [] b []), T c (S [a] b [])]
--s = S <$> many a <*> b <*> many t -- TODO
s = S <$> many a <*> b <*> pure []
t = T <$> c <*> s

main = print (report p "abc") >> print (fullParses p "abc")
    where p = parser $ rule s

As far as I can tell, this grammar should be parsable. Every string only has a finite number of possible parse trees.
Yet parsing fails if I enable the commented line (marked with a TODO).
Am I doing something wrong? Or is this a bug/limitation in Earley?

Generate all members of a grammar?

Intuitively, it feels like it should be possible for your library to generate all strings (as a potentially infinite list) that a grammar would parse, as well as all parsed values that those strings would generate. Is that something you've considered or would consider adding?

out-of-order results

below, edit has nullable rules, and insertion has a wildcard. the parse results are all present, but some alternatives that appear before other alternatives ("lexically" in the grammar), appear later in the results.

e.g. given (full script is below):

          Editing <$> edit
    <|> Insertion <$> insertion

we get:

["whole","line"]
Insertion ["whole","line"]
Editing (Edit Copy Whole Line)

rather than:

["whole","line"]
Editing (Edit Copy Whole Line)
Insertion ["whole","line"]

I don't know if early parsers specify the order of the results for an ambiguous grammar, but it's how other parser libraries behave. and either way, would be nice to have: my hack is to postprocess the set of results, and rank them. this is messier (as ranking can be "nonlocal"), and maybe slower, than the first result being the expected one.

to reproduce:

import Control.Applicative
import Text.Earley

data Edit   = Edit Action Slice Region     deriving (Show)
data Action = Copy | Delete | Cut          deriving (Show)
data Slice  = Whole | Backwards | Forwards deriving (Show)
data Region = Char | Word | Line           deriving (Show)

type Text = String

edit :: Grammar r (Prod r String Text Edit) 
edit = do

 action <- rule $ "action"
   <=> Copy        <$ symbol "copy"
   <|> Delete      <$ symbol "delete"
   <|> Cut         <$ symbol "cut"

 slice <- rule $ "slice"
   <=> Whole     <$ symbol "whole"
   <|> Backwards <$ symbol "backwards"
   <|> Forwards  <$ symbol "forwards"

 region <- rule $ "region"
   <=> Char <$ symbol "char"
   <|> Word <$ symbol "word"
   <|> Line <$ symbol "line"

 edit' <- rule $ "edit"
   <=> Edit <$> action            <*> (slice -?- Whole) <*> (region -?- Line)
   <|> Edit <$> (action -?- Copy) <*> slice             <*> region
   -- <=> Edit <$> action <*> slice <*> region

 return edit'

data Command
 = Editing Edit
 | Insertion [Text]
 deriving (Show)

command :: Grammar r (Prod r String Text Command) 
command = do

  edit' <- edit

  insertion' <- rule $ some anyWord

  command' <- rule $ "command"
    <=> Editing <$> edit'
    <|> Insertion <$> insertion'

  return command'

anyWord :: Prod z String Text Text
anyWord = satisfy (const True) 

parseCommand :: [Text] -> ([Command], Report String [Text])
parseCommand s = fullParses (parser command) s 

(<=>) :: e -> Prod r e t a -> Prod r e t a 
(<=>) = flip (<?>)
infix 2 <=> 

(-?-) :: Prod r e t a -> a -> Prod r e t a 
(-?-) p x = maybe x id <$> optional p 

printParses :: (Show a, Show e, Show ts) => (ts -> ([a], Report e ts)) -> ts -> IO ()
printParses theParser theSentence = do 
 let (theResults, theReport) = theParser theSentence 
 putStrLn"" 
 print theSentence 
 _ <- traverse print theResults 
 print theReport 

main :: IO () 
main = do

 putStrLn"" 
 printParses parseCommand (words "line")
 printParses parseCommand (words "copy")
 printParses parseCommand (words "copy whole")
 printParses parseCommand (words "whole line")
 printParses parseCommand (words "copy whole line")
 printParses parseCommand (words "not an edit")

{- outputs: 

["line"]
Insertion ["line"]
Report {position = 1, expected = [], unconsumed = []}

["copy"]
Editing (Edit Copy Whole Line)
Insertion ["copy"]
Report {position = 1, expected = ["region","slice"], unconsumed = []}

["copy","whole"]
Insertion ["copy","whole"]
Editing (Edit Copy Whole Line)
Report {position = 2, expected = ["region"], unconsumed = []}

["whole","line"]
Insertion ["whole","line"]
Editing (Edit Copy Whole Line)
Report {position = 2, expected = [], unconsumed = []}

["copy","whole","line"]
Editing (Edit Copy Whole Line)
Editing (Edit Copy Whole Line)
Insertion ["copy","whole","line"]
Report {position = 3, expected = [], unconsumed = []}

["not","an","edit"]
Insertion ["not","an","edit"]
Report {position = 3, expected = [], unconsumed = []}

-} 

Capture result as well as matched tokens

It would be nice to be able to capture the result of a production as well as the text matched by it, for example:

match :: Prod r e t a -> Prod r e t ([t], a)

Unusable {-# UNPACK #-} pragmas.

[2 of 4] Compiling Text.Earley.Parser ( Text/Earley/Parser.hs, dist/build/Text/Earley/Parser.o )

Text/Earley/Parser.hs:83:3: Warning:
    Ignoring unusable UNPACK pragma on the third argument of ‘State’
    In the definition of data constructor ‘State’
    In the data declaration for ‘State’

Text/Earley/Parser.hs:92:3: Warning:
    Ignoring unusable UNPACK pragma on the second argument of ‘Cont’
    In the definition of data constructor ‘Cont’
    In the data declaration for ‘Cont’

Text/Earley/Parser.hs:92:3: Warning:
    Ignoring unusable UNPACK pragma on the fourth argument of ‘Cont’
    In the definition of data constructor ‘Cont’
    In the data declaration for ‘Cont’

In all cases, those arguments are Arg values, and Arg is a type alias for a function type. Function types can't be unpacked, since they aren't thin wrappers around primitives.

Crashes if grammar is infinitely ambiguous on given input

Consider this code:

{-# LANGUAGE Haskell2010 #-}
{-# LANGUAGE RecursiveDo #-}

import Text.Earley
import Control.Applicative

data L = A | B | C deriving (Eq, Show)

lang :: Grammar r (Prod r () L ())
lang = mdo {
  r1 <- rule $ (pure () <* token A) <|> r1;
  return r1;
}

main :: IO ()
main = do {
  putStrLn $ show $ fullParses (parser lang) [A];
}

This program crashes with stack overflow if I run it in ghci. And freezes if I compile and run it. I cannot even test whether list of parses is null, i. e. null $ fst $ fullParses (parser lang) [A] crashes in ghci.

I want some function, which can test whether given token list has multiple parses. I. e. I don't want list of parses or even count of parses. I simply want one of 3 results: 0 parses, 1 parse, more than 1 parse. And this function should terminate in case of infinitely ambiguous grammar.

I use Earley 0.13.0.1

Kind error trying examples/expr.hs at GHCi (7.10.2 with Stack)

Trying out the mentioned example, I encountered this error in stack lts-3.15 GHC 7.10.2 using GHCi:

Expecting one more argument to ‘Grammar r (Prod r String String Expr)’
Expected a type,
  but ‘Grammar r (Prod r String String Expr)’ has kind ‘* -> *’
In the type signature for ‘expr’:
  expr :: Grammar r (Prod r String String Expr)

Sorry to ask here, I'm assuming it's something I've done wrong, but I'm not sure what it is.

I made sure to copy the example verbatim from the github raw representation. Any help would be greatly appreciated!

Wrong(?) result on some grammars with infinite results

grammar = mdo
  r <- rule $ (pure 1 <|> ((+1) <$> r))
  pure r

produces just 1, not [1,2,3...]

I don't think it's possible to have this work and

grammar = mdo
  r <- rule $ (pure 1 <|> r)
  pure r

not produce an infinite list of results

I think this is a sensible choice because it currently can't deal with infinite results, but it should be documented somewhere. Theoretically it should be possible to return an infinite lazy list.

INDENT and DEDENT

Is there a way to match INDENT/DEDENT or make parsers take an indentation level?

func foo(a: A, b: B): C =
  theresSomeIndentRequiredHere(a, b)

func noMoreIndentMeansNoMoreFoo: D = ???

Add a `eof` terminal

Is it possible to have

eof :: Prod r e t ()

that matches the end of input?

I have a grammar that accepts expressions in parentheses:

    exprInBrackets <- rule $
      ExprInBrackets
        <$> leftBrack
        <*> ntExpr
        <*> rightBrack
        <?> "parenthesized expression"

And, given ([2]) it will produce the following value:

let inner = ExprInBrackets "[" (ExprNum 2) "]"
in ExprInBrackets "(" inner ")"

I'd like to modify it like this:

    exprInBrackets <- rule $
      ExprInBrackets
        <$> leftParen
        <*> ntExpr
        <*> (rightParen <|> ("" <$ eof))
        <?> "parenthesized expression"

so that expressions with unbalanced parentheses can be parsed. Given ([2 it would produce

let inner = ExprInBrackets "[" (ExprNum 2) ""
in ExprInBrackets "(" inner ""

cc @artemohanjanyan

Control.Applicative.optional not working

Hei Olle!

It drove me crazy to remove left recursion in Parsec. Therefore a big thank you for writing Earley.
Maybe it will become a thing in the future. ;)

I am trying to parse a toy programming language but I had some problems with "optional" from Control.Applicative.

As an example, I want to parse an optional "a" before the letter "b".

{-# LANGUAGE RecursiveDo #-}
import Text.Earley
import Control.Applicative

grammar = mdo
    test <- rule $ (,) <$> (optional $ namedSymbol 'a') <*> namedSymbol 'b'
    return test

parse xs = fullParses $ parser grammar xs

Now parse "b" results in a Report with expected "a", unconsumed "b". But my desired result would be (Nothing, 'b').

test-suite compilation failure when building from tarball

From building stackage nightly. I'll add this test-suite to expected failures for now.

Some modules seem to be missing from other-modules:

reprocessing library Earley-0.11.0.0...
[1 of 6] Compiling Text.Earley.Grammar ( Text/Earley/Grammar.hs, dist/build/Text/Earley/Grammar.o )
[2 of 6] Compiling Text.Earley.Derived ( Text/Earley/Derived.hs, dist/build/Text/Earley/Derived.o )
[3 of 6] Compiling Text.Earley.Internal ( Text/Earley/Internal.hs, dist/build/Text/Earley/Internal.o )
[4 of 6] Compiling Text.Earley.Parser ( Text/Earley/Parser.hs, dist/build/Text/Earley/Parser.o )
[5 of 6] Compiling Text.Earley      ( Text/Earley.hs, dist/build/Text/Earley.o )
[6 of 6] Compiling Text.Earley.Mixfix ( Text/Earley/Mixfix.hs, dist/build/Text/Earley/Mixfix.o )
In-place registering Earley-0.11.0.0...
Preprocessing test suite 'tests' for Earley-0.11.0.0...

tests/Main.hs:4:18:
    Could not find module ‘Empty’
    Use -v to see a list of the files searched for.

tests/Main.hs:5:18:
    Could not find module ‘Expr’
    Use -v to see a list of the files searched for.

tests/Main.hs:6:18:
    Could not find module ‘InlineAlts’
    Use -v to see a list of the files searched for.

tests/Main.hs:7:18:
    Could not find module ‘Issue11’
    Use -v to see a list of the files searched for.

tests/Main.hs:8:18:
    Could not find module ‘Issue14’
    Use -v to see a list of the files searched for.

tests/Main.hs:9:18:
    Could not find module ‘Mixfix’
    Use -v to see a list of the files searched for.

tests/Main.hs:10:18:
    Could not find module ‘Optional’
    Perhaps you meant
      Options (needs flag -package-key options-1.2.1.1@optio_CtVRCDNbaSzCLFqvgui815)
    Use -v to see a list of the files searched for.

tests/Main.hs:11:18:
    Could not find module ‘ReversedWords’
    Use -v to see a list of the files searched for.

tests/Main.hs:12:18:
    Could not find module ‘VeryAmbiguous’
    Use -v to see a list of the files searched for.

Please add tags "test ambiguity" etc

So far I was able to find one library on Hackage for testing ambiguity of arbitrary CFGs using brute force: "Earley". If you know any other library, please, let me know.

Unfortunately I discovered this library too late, so I had to roll out my own ugly solution for ambiguity checking. So, please, make this library discoverable. I want this library to show up if I type "test ambiguity" and similar queries in http://hackage.haskell.org/packages/search . As well as I understand you need to add tags to do this. So, please, add following tags: "test ambiguity of CFG", "test ambiguity of context-free grammar", "check ambiguity of CFG" etc. And mention this feature in README

Generalize `word`?

Should word be generalized to

word :: (Eq t, ListLike i t) => i -> Prod r e t i

instead of

word :: Eq t => [t] -> Prod r e t [t]

?
This would make it easier to use values of type Text in particular.

Change type of `satisfy`

Would it be possible to change the type of satisfy to something like

satisfy :: (t -> Maybe a) -> Prod r e t a

This would make it easier to have a type like

module Token (Token, mkToken, unToken) where

newtype Token = Token { unToken :: Char }

mkToken :: Char -> Maybe Token
mkToken x
  | x >= 'a' || x <= 'z' = Just (Token x)
  | otherwise = Nothing

and have satisfy work as expected. (You can do this now, but with partial function nastiness.) Currently this is what Parsec's tokenPrim looks like.

add IsString instance

like:

instance (IsString t, Eq t, a ~ t) => IsString (Prod r e t a) where
 fromString s = Terminal (== t) (pure id)

and/or even:

instance (IsString t) => IsList (Prod r e t a) where
 type Item (Prod r e t a) = Prod r e t a
 fromList ps = Alts ps (pure id)
 toList = \case
  Alts ps f -> fmap (f <*>) ps
  p -> [p]

(away from my computer, haven't type checked)

The IsString instance increases the readability of the DSL, has a single simple implemention (and thus avoids forcing orphans on users), and doesn't confuse user when they don't enable OverloadStrings.

Is it okey to use "many" and heavily use Applicative parsing techniques?

Is it normal to use Alternative's many? For example, x1 <- rule $ many x2? Is it normal to heavily use techniques known from Applicative parsing world, for example, to write something like the following code?

{-# LANGUAGE RecursiveDo #-}
import Control.Applicative
import Data.Char
import System.Environment
import Text.Earley

data Expr
  = Add Expr Expr
  | Mul Expr Expr
  | Var String
  deriving (Eq, Ord, Show)

sepBy :: Alternative f => f a -> f b -> f [b]
sepBy op x = (:) <$> x <*> many (op *> x)

expr :: Grammar r (Prod r String String Expr)
expr = mdo
  x1 <- rule $ foldl1 Add <$> sepBy (namedToken "+") (foldl1 Mul <$> sepBy (namedToken "*") ((Var <$> satisfy ident) <|> namedToken "(" *> x1 <* namedToken ")"))
  return x1
  where
    ident (x:_) = isAlpha x
    ident _     = False

main :: IO ()
main = do
  x:_ <- getArgs
  print $ fullParses (parser expr) $ words x

The code above is modified version of https://github.com/ollef/Earley/blob/e7de8b77b63e1ee889d98cb6dfc2994178f23e6d/examples/Expr.hs and I hope they are semantically equivalent. I tested my version and it works.

My question is not about performance. It is about semantics. I. e. is code above correct or it works simply by chance?

Choice with preference

It would be nice to be able to say a otherwise b such that only if a fails, b appears in the output.

Early seems to do an ad-hoc version of this for (Just <$> a) <|> pure Nothing, which returns [Just a] if a succeeds, not the expected [Just a, Nothing].

Is there any possible optimization for getting non-exhaustive outputs for ambiguous grammars

I'm parsing an ambiguous grammar and I'd like to distinguish the following cases (more quickly than dealing with a list of all results):

  • Failure to parse
  • Exactly one parse with result
  • Multiple parses (potentially with some small number of valid parses returned)

The use case is being able to tell the user: "Hey, you gave me ambiguous input, here are some places to check for ambiguity". Giving them 20000 results isn't helpful, especially as any large number is bound to be the result of some combinatorial explosion, and solving one ambiguity makes good progress.

I had a quick go at modifying fullParses in a few ways to prompt an earlier return but there wasn't a quick "no-understanding" fix.

rebuild the documentation with --hyperlinked-source

the feature is only available in haddock's master branch. for example, you can run:

cabal haddock --with-haddock="$HOME/haddock/.cabal-sandbox/bin/haddock" --haddock-options="--hyperlinked-source"

after building haddock from source, in a sandbox (https://github.com/haskell/haddock):

git clone https://github.com/haskell/haddock.git
cd haddock                                       # e.g. ~/haddock
cabal sandbox init
cabal sandbox add-source haddock-library
cabal sandbox add-source haddock-api
cabal install haddock-library haddock-api
cabal install -j4 --dependencies-only --enable-tests
cabal configure --enable-tests
cabal build -j4

or for now, you could just reupload the package with the dist/doc/ I built. I can send it to you if you send me an e-mail at redacted

(you don't really have to, but I've been using it to explore the code locally, and others might find it helpful too)

Something like Ruby Slippers parsing

Terminology from the Marpa parser.

I have no idea what Marpa's implementation is, but perhaps this would work for this package:

In the same way that names are specified for tokens at specific positions, we could maintain a set of slippers, which are utilized when no terminals match in order to rescue the parse.

Report doesn't include enough information

I was working on AoC problem (https://adventofcode.com/2021/day/10) recently and saw that recognizing whether an input matches a grammar was part of it. (I ended up going a different direction to solve it, but...)

Since I thought I only needed to identify whether something parsed or not, I wanted to use report to test acceptance without bothering with parse results. But.. that actually isn't sufficient. At least, not unless you add names to the productions, and in sufficient granularity. If you have skipped that portion for your grammar, the Report value doesn't give sufficient information to determine whether something parsed in its entirety - only whether it ran into an error or was able to consume the entire input without errors. The only way to determine if any parse ran to completion is checking whether fullParses returned any parses.

It would be really nice if report returned some way to determine whether there was a full parse without requiring sufficiently granular naming of productions.

Odd mixfix behavior

Just curious what you think about this behavior:

When defining an operator if _ _ with right associativity, then

if if x y z 

will be successfully parsed as

if (if x y) z

but

if x if y z

will fail to parse, requiring explicit parentheses.

Doesn't this seem backwards? I would expect the above behavior if the operator had left associativity.

So, does associativity even make sense for operators that are not of the form _a_b_c_d_? Are two Nothings in a row when defining a mixfix operator supposed to be supported?

document implementation

I want to contribute a few ideas I have for new features. but after reading the source, I still can't understand how parse works.

in http://www.nltk.org/book/ch08-extras.html (2 Chart Parsing and 2.7 The Earley Algorithm), the concepts seem to be:

  1. edges between tokens, complete or non-complete
  2. rules, which add new edges given the current edges
  3. a chart, a structure that holds edges
  4. strategies, this set of rules that differentiate chart parsers from each other

but I don't see these in the types (State/Cont/Conts/Result). if they're not really there, how should I think about the types? is there a blog post or paper I could read that more closely reflects the core ideas behind this package?

also, it helps to know about:

  1. any patterns this package uses, or tricks. e.g. why the nested reference in Conts:
     contsArgs :: !(STRef s (Maybe (STRef s (Results s a))))

is it for performance, for correctness? Everything is there for a reason, but it's hard to read for a noob like me.

thanks :-)

feature: constraint validation of production parse result

Hello.

I did not find how to implement some sort of constraints on production. For example if I wrote a rule to parse a series of digits and wanted to validate that if falls into a specific range, what would I need to do? Of course I can check the values after the parser ends, but the context is lost in such case like position where error occurs. I see no functions in current interface I can use to do that and also I have not find something similar in the examples.
It probably could be something function like:

validate :: (a -> Bool) -> Prod r e t a -> Prod r e t a

or

validate :: (a -> Maybe b) -> Prod r e t a -> Prod r e t b

so when the predicate fails or returns Nothing, the parser fails for that production. And we can use it like:

num <- rule $ validate (<= 1234) $ read <$> some digit

Or may be some sort of extended rule function with validation:

ruleValidate :: (a -> Bool) -> Prod r e t a -> Grammar r (Prod r e t a)
...
num <- ruleValidate (<= 1234) $ read <$> some digit

Use Data.Loc rather than Int

Would it be possible to tag parsed items with a Data.Loc.L rather than an Int? Having both the column and row number is very useful.

More specifically,

allParses :: (forall s. ST s (Result s e i a)) -> ([(a, Int)], Report e i)

would become

allParses :: (forall s. ST s (Result s e i a)) -> ([L a], Report e i)

Currently, I'm lexing with https://hackage.haskell.org/package/lexer-applicative, which returns a stream of L-tagged lexemes anyway, and then I'm awkwardly unpacking and re-packing the locations into the parsed items. Here's a small example:

-- Token
data Token 
    = TkInt Int
    | ...

-- Lexer
lexer :: String -> [L Token]

-- Syntax
data Expr 
    = EInt Int
    ...

-- Parser
expr :: Prod r String (L Token) (L Expr)
expr = (\L loc n -> L loc (EInt n)) <$> intLit
   <|> ...

intLit :: Prod r String (L Token) (L Int)
intLit = (\L loc (TkInt n) -> L loc n) <$> satisfy isIntLit <?> "int"
  where
    isIntLit (L _ (TkInt _)) = True
    isIntLit _ = False

(apologies if the above doesn't typecheck, I just wrote it by hand)

If Earley annotated parsed items with L then I could simply forget all the work that the lexer did by mapping unLoc over the stream of tokens, so I'd have productions like Prod r String Token Expr rather than Prod r String (L Token) (L Expr). I wonder if there's a better way to integrate these two libraries.

Parsing into a list delimited by a token

I'm a very novice Haskell programmer and I wanted to write something with parsers. My current solution for parsing abc.def.ghi uses the split package like so:

dparse :: (String -> Bool) -> Prod r String String [String]
dparse f = splitOn "." <$> satisfy f

I feel like, from my Scala+cats knowledge, that this is a very poor way to do this. How should I properly implement it?

Extensible languages?

Hi,

I know this might be impossible with this library directly, but I was wondering how to allow the user to declare their own mixfix identifiers in the document being parsed. In parsec, or something, because the parsers are monadic I can simply keep updating my table as I parse more information, but as this library is strictly only for context-free languages, it makes this approach not feasible.

I guess I'd have to partially parse the file and grab all the declarations first, and then parse the document properly with earley? Is there an easier way?

Adding another choice with <|> causes parse to fail

I'm writing a parser to parse either 0) a sequence of digits with no grouping characters, or 1) a sequence of digits with grouping characters like commas.

I'm getting some behavior I wouldn't expect, but I'm too ignorant to know if this is a bug. I have tried to boil this down to a simple example, as follows:

{-# LANGUAGE RecursiveDo, RankNTypes #-}
module Main where

import Control.Applicative
import Text.Earley
import System.Environment

data Zero = Zero deriving (Eq, Show)
data Comma = Comma deriving (Eq, Show)

list :: Prod r e t a -> Grammar r (Prod r e t ([a]))
list p = mdo
  r <- rule $ pure [] <|> (:) <$> p <*> r
  return r

comma :: Prod r String Char Comma
comma = Comma <$ symbol ',' <?> ","

zero :: Prod r String Char Zero
zero = Zero <$ symbol '0' <?> "0"

data Grouped = Grouped Zero [Zero] Comma Zero [(Comma, Zero, [Zero])]
  deriving (Eq, Show)

data Ungrouped = Ungrouped Zero [Zero] deriving (Eq, Show)

groupedOrUngrouped :: Grammar r (Prod r String Char (Either Grouped Ungrouped))
groupedOrUngrouped = mdo
  seqZero <- list zero
  seqGroups <- list $ (,,) <$> comma <*> zero <*> seqZero
  let l = Grouped <$> zero <*> seqZero <*> comma <*> zero <*> seqGroups
      r = Ungrouped <$> zero <*> seqZero
  rule $ Left <$> l <|> Right <$> r

grouped :: Grammar r (Prod r String Char Grouped)
grouped = mdo
  seqZero <- list zero
  seqGroups <- list $ (,,) <$> comma <*> zero <*> seqZero
  rule $ Grouped <$> zero <*> seqZero <*> comma <*> zero <*> seqGroups

main :: IO ()
main = do
  arg:[] <- getArgs
  print (allParses (parser groupedOrUngrouped) arg)
  putStrLn ""
  print (allParses (parser grouped) arg)

Here is some test output:

$ ./earleytest.hs 0,0
([(Right (Ungrouped Zero []),1)],Report {position = 1, expected = ["0"], unconsumed = ",0"})

([(Grouped Zero [] Comma Zero [],3)],Report {position = 3, expected = [","], unconsumed = ""})

I would expect that a Grouped value would appear when using the groupedOrUngrouped parser. I don't get a Grouped value there, but I do get one when using the grouped parser alone. Why? Any help is much appreciated. stack enabled source is also at

https://gist.github.com/massysett/b132c660c34955533795

Disambiguate/Filter productions

Once the parser finishes parsing some ambiguity, it would be really nice to expose this ambiguity to the user, rather than just multiplying the output by this ambiguity size.

For example disambiguate :: ([a] -> b) -> Prod a -> Prod b or more generally ([a] -> [b]) -> Prod a -> Prod b.

I had a look at naively changing some things in parse and following the types but was unable to get this to work without better understanding the algorithm.

The discussion here makes it seem as though this is actually non-trivial! #18

Perhaps some important reading: https://dinhe.net/~aredridel/.notmine/PDFs/Parsing/SCOTT%2C%20Elizabeth%20-%20SPPF-Style%20Parsing%20From%20Earley%20Recognizers.pdf

Suggestion: Make it so that `<$` rules return at most one result.

In my grammars, sometimes I'll having something along the lines of many (whitespace) *> optional (stuff) <* many (whitespace). The issue then is that if I am parsing, say, 4 spaces, many identical Nothing results are returned, based on the different ways the whitespace on the left and right could be matched. One solution is to rewrite it as many (whitespace) *> optional (stuff <* many (whitespace)), but when you have complicated grammar its hard to coordinate this.

I had an idea for how to modify the library to accommodate this problem. Basically, a new kind of production rule that returns at most one result. Which result? Well, the idea is that it would only work on rules such that all their results are identical. That is, they would only work on rules of the form a <$ r. That way, it does not matter what result is returned. Here are some examples of equations they would follow:

  • pure a = a <$ pure ()
  • fmap f (a <$ r) = (f a) <$ r
  • (f <$ r1) <*> (a <$ r2) = f a <$ (r1 *> r2)
  • r1 <* r2 = r1 <* (() <$ r2)
  • etc...

Then, in our many (whitespace) *> optional (stuff) <* many (whitespace) example, many (whitespace) *> pure Nothing <* many (whitespace) would reduce to Nothing <$ (many (whitespace) *> pure () <* many (whitespace)). Since this is what the 4 spaces are matching against, at most one Nothing result would be returned.

I was thinking about digging into the code and creating a pull request, but before that I wanted to see if it was seen as worthwhile, or if I was missing something obvious.

Diverges with sepEndBy from parser-combinator

Hello. Thank you for this great library.

I think I found a bug. The grammar below diverges if I use default sepEndBy from parser-combinators:

-- Does not work, but sepEndBy only requires an Applicative instance and we have it
grammar' = rule $ sepEndBy (satisfy (== OpenBracket)) (satisfy (== Dot))
-- but with this one everything is ok
sepEndBy :: Prod r e t a -> Prod r e t b -> Prod r e t [a]
-- btw sepBy can come from parser-combinators, and still no problem here
sepEndBy p sep = sepBy p sep <* optional sep

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.