ollef / earley Goto Github PK
View Code? Open in Web Editor NEWParsing all context-free grammars using Earley's algorithm in Haskell.
License: BSD 3-Clause "New" or "Revised" License
Parsing all context-free grammars using Earley's algorithm in Haskell.
License: BSD 3-Clause "New" or "Revised" License
I noticed that the Monoid and Semigroup instances for Prod duplicate the Alternative instance. I found that unexpected. I was expecting them to lift the element instances over the Applicative instance, a la (<>) = liftA2 (<>)
. That's the behavior of those instances in many other parsing libraries.
Changing the instances now would be awkward, of course. People might not see the change and think that just because their grammars still compile, the code still is correct. I'm not really asking for a change. Just wondering why this design was chosen, and suggesting that a change is something you might consider if you ever make large-scale changes that would break things in other ways.
Are there any recommendations on how I could do error recovery to report multiple errors?
Consider the following toy example:
import Control.Applicative
import Text.Earley
data S = S [Char] Char [T] deriving Show
data T = T Char S deriving Show
a, b, c :: Prod r String Char Char
a = satisfy (=='a')
b = satisfy (=='b')
c = satisfy (=='c')
-- "abcb" -> S [a] b [T c (S [] b [])]
-- "abc" -> last 'c' should be unused
-- "abcc" -> parsing all of it without leaving unconsumed pieces should not be possible
-- "abcbcab" -> S [a] b [T c (S [] b [T c (S [a] b)])] OR S [a] b [T c (S [] b []), T c (S [a] b [])]
--s = S <$> many a <*> b <*> many t -- TODO
s = S <$> many a <*> b <*> pure []
t = T <$> c <*> s
main = print (report p "abc") >> print (fullParses p "abc")
where p = parser $ rule s
As far as I can tell, this grammar should be parsable. Every string only has a finite number of possible parse trees.
Yet parsing fails if I enable the commented line (marked with a TODO).
Am I doing something wrong? Or is this a bug/limitation in Earley?
Intuitively, it feels like it should be possible for your library to generate all strings (as a potentially infinite list) that a grammar would parse, as well as all parsed values that those strings would generate. Is that something you've considered or would consider adding?
below, edit
has nullable rules, and insertion
has a wildcard. the parse results are all present, but some alternatives that appear before other alternatives ("lexically" in the grammar), appear later in the results.
e.g. given (full script is below):
Editing <$> edit
<|> Insertion <$> insertion
we get:
["whole","line"]
Insertion ["whole","line"]
Editing (Edit Copy Whole Line)
rather than:
["whole","line"]
Editing (Edit Copy Whole Line)
Insertion ["whole","line"]
I don't know if early parsers specify the order of the results for an ambiguous grammar, but it's how other parser libraries behave. and either way, would be nice to have: my hack is to postprocess the set of results, and rank them. this is messier (as ranking can be "nonlocal"), and maybe slower, than the first result being the expected one.
to reproduce:
import Control.Applicative
import Text.Earley
data Edit = Edit Action Slice Region deriving (Show)
data Action = Copy | Delete | Cut deriving (Show)
data Slice = Whole | Backwards | Forwards deriving (Show)
data Region = Char | Word | Line deriving (Show)
type Text = String
edit :: Grammar r (Prod r String Text Edit)
edit = do
action <- rule $ "action"
<=> Copy <$ symbol "copy"
<|> Delete <$ symbol "delete"
<|> Cut <$ symbol "cut"
slice <- rule $ "slice"
<=> Whole <$ symbol "whole"
<|> Backwards <$ symbol "backwards"
<|> Forwards <$ symbol "forwards"
region <- rule $ "region"
<=> Char <$ symbol "char"
<|> Word <$ symbol "word"
<|> Line <$ symbol "line"
edit' <- rule $ "edit"
<=> Edit <$> action <*> (slice -?- Whole) <*> (region -?- Line)
<|> Edit <$> (action -?- Copy) <*> slice <*> region
-- <=> Edit <$> action <*> slice <*> region
return edit'
data Command
= Editing Edit
| Insertion [Text]
deriving (Show)
command :: Grammar r (Prod r String Text Command)
command = do
edit' <- edit
insertion' <- rule $ some anyWord
command' <- rule $ "command"
<=> Editing <$> edit'
<|> Insertion <$> insertion'
return command'
anyWord :: Prod z String Text Text
anyWord = satisfy (const True)
parseCommand :: [Text] -> ([Command], Report String [Text])
parseCommand s = fullParses (parser command) s
(<=>) :: e -> Prod r e t a -> Prod r e t a
(<=>) = flip (<?>)
infix 2 <=>
(-?-) :: Prod r e t a -> a -> Prod r e t a
(-?-) p x = maybe x id <$> optional p
printParses :: (Show a, Show e, Show ts) => (ts -> ([a], Report e ts)) -> ts -> IO ()
printParses theParser theSentence = do
let (theResults, theReport) = theParser theSentence
putStrLn""
print theSentence
_ <- traverse print theResults
print theReport
main :: IO ()
main = do
putStrLn""
printParses parseCommand (words "line")
printParses parseCommand (words "copy")
printParses parseCommand (words "copy whole")
printParses parseCommand (words "whole line")
printParses parseCommand (words "copy whole line")
printParses parseCommand (words "not an edit")
{- outputs:
["line"]
Insertion ["line"]
Report {position = 1, expected = [], unconsumed = []}
["copy"]
Editing (Edit Copy Whole Line)
Insertion ["copy"]
Report {position = 1, expected = ["region","slice"], unconsumed = []}
["copy","whole"]
Insertion ["copy","whole"]
Editing (Edit Copy Whole Line)
Report {position = 2, expected = ["region"], unconsumed = []}
["whole","line"]
Insertion ["whole","line"]
Editing (Edit Copy Whole Line)
Report {position = 2, expected = [], unconsumed = []}
["copy","whole","line"]
Editing (Edit Copy Whole Line)
Editing (Edit Copy Whole Line)
Insertion ["copy","whole","line"]
Report {position = 3, expected = [], unconsumed = []}
["not","an","edit"]
Insertion ["not","an","edit"]
Report {position = 3, expected = [], unconsumed = []}
-}
It would be nice to be able to capture the result of a production as well as the text matched by it, for example:
match :: Prod r e t a -> Prod r e t ([t], a)
Similar to Parsec's makeTokenParser.
There might also be other utilities in existing parsing libraries that could be added to Earley.
[2 of 4] Compiling Text.Earley.Parser ( Text/Earley/Parser.hs, dist/build/Text/Earley/Parser.o )
Text/Earley/Parser.hs:83:3: Warning:
Ignoring unusable UNPACK pragma on the third argument of ‘State’
In the definition of data constructor ‘State’
In the data declaration for ‘State’
Text/Earley/Parser.hs:92:3: Warning:
Ignoring unusable UNPACK pragma on the second argument of ‘Cont’
In the definition of data constructor ‘Cont’
In the data declaration for ‘Cont’
Text/Earley/Parser.hs:92:3: Warning:
Ignoring unusable UNPACK pragma on the fourth argument of ‘Cont’
In the definition of data constructor ‘Cont’
In the data declaration for ‘Cont’
In all cases, those arguments are Arg
values, and Arg
is a type alias for a function type. Function types can't be unpacked, since they aren't thin wrappers around primitives.
Consider this code:
{-# LANGUAGE Haskell2010 #-}
{-# LANGUAGE RecursiveDo #-}
import Text.Earley
import Control.Applicative
data L = A | B | C deriving (Eq, Show)
lang :: Grammar r (Prod r () L ())
lang = mdo {
r1 <- rule $ (pure () <* token A) <|> r1;
return r1;
}
main :: IO ()
main = do {
putStrLn $ show $ fullParses (parser lang) [A];
}
This program crashes with stack overflow if I run it in ghci. And freezes if I compile and run it. I cannot even test whether list of parses is null, i. e. null $ fst $ fullParses (parser lang) [A]
crashes in ghci.
I want some function, which can test whether given token list has multiple parses. I. e. I don't want list of parses or even count of parses. I simply want one of 3 results: 0 parses, 1 parse, more than 1 parse. And this function should terminate in case of infinitely ambiguous grammar.
I use Earley 0.13.0.1
Trying out the mentioned example, I encountered this error in stack lts-3.15 GHC 7.10.2 using GHCi:
Expecting one more argument to ‘Grammar r (Prod r String String Expr)’
Expected a type,
but ‘Grammar r (Prod r String String Expr)’ has kind ‘* -> *’
In the type signature for ‘expr’:
expr :: Grammar r (Prod r String String Expr)
Sorry to ask here, I'm assuming it's something I've done wrong, but I'm not sure what it is.
I made sure to copy the example verbatim from the github raw representation. Any help would be greatly appreciated!
grammar = mdo
r <- rule $ (pure 1 <|> ((+1) <$> r))
pure r
produces just 1
, not [1,2,3...]
I don't think it's possible to have this work and
grammar = mdo
r <- rule $ (pure 1 <|> r)
pure r
not produce an infinite list of results
I think this is a sensible choice because it currently can't deal with infinite results, but it should be documented somewhere. Theoretically it should be possible to return an infinite lazy list.
Is there a way to match INDENT/DEDENT or make parsers take an indentation level?
func foo(a: A, b: B): C =
theresSomeIndentRequiredHere(a, b)
func noMoreIndentMeansNoMoreFoo: D = ???
Is it possible to have
eof :: Prod r e t ()
that matches the end of input?
I have a grammar that accepts expressions in parentheses:
exprInBrackets <- rule $
ExprInBrackets
<$> leftBrack
<*> ntExpr
<*> rightBrack
<?> "parenthesized expression"
And, given ([2])
it will produce the following value:
let inner = ExprInBrackets "[" (ExprNum 2) "]"
in ExprInBrackets "(" inner ")"
I'd like to modify it like this:
exprInBrackets <- rule $
ExprInBrackets
<$> leftParen
<*> ntExpr
<*> (rightParen <|> ("" <$ eof))
<?> "parenthesized expression"
so that expressions with unbalanced parentheses can be parsed. Given ([2
it would produce
let inner = ExprInBrackets "[" (ExprNum 2) ""
in ExprInBrackets "(" inner ""
Hei Olle!
It drove me crazy to remove left recursion in Parsec. Therefore a big thank you for writing Earley.
Maybe it will become a thing in the future. ;)
I am trying to parse a toy programming language but I had some problems with "optional" from Control.Applicative.
As an example, I want to parse an optional "a" before the letter "b".
{-# LANGUAGE RecursiveDo #-}
import Text.Earley
import Control.Applicative
grammar = mdo
test <- rule $ (,) <$> (optional $ namedSymbol 'a') <*> namedSymbol 'b'
return test
parse xs = fullParses $ parser grammar xs
Now parse "b"
results in a Report with expected "a", unconsumed "b". But my desired result would be (Nothing, 'b').
From building stackage nightly. I'll add this test-suite to expected failures for now.
Some modules seem to be missing from other-modules
:
reprocessing library Earley-0.11.0.0...
[1 of 6] Compiling Text.Earley.Grammar ( Text/Earley/Grammar.hs, dist/build/Text/Earley/Grammar.o )
[2 of 6] Compiling Text.Earley.Derived ( Text/Earley/Derived.hs, dist/build/Text/Earley/Derived.o )
[3 of 6] Compiling Text.Earley.Internal ( Text/Earley/Internal.hs, dist/build/Text/Earley/Internal.o )
[4 of 6] Compiling Text.Earley.Parser ( Text/Earley/Parser.hs, dist/build/Text/Earley/Parser.o )
[5 of 6] Compiling Text.Earley ( Text/Earley.hs, dist/build/Text/Earley.o )
[6 of 6] Compiling Text.Earley.Mixfix ( Text/Earley/Mixfix.hs, dist/build/Text/Earley/Mixfix.o )
In-place registering Earley-0.11.0.0...
Preprocessing test suite 'tests' for Earley-0.11.0.0...
tests/Main.hs:4:18:
Could not find module ‘Empty’
Use -v to see a list of the files searched for.
tests/Main.hs:5:18:
Could not find module ‘Expr’
Use -v to see a list of the files searched for.
tests/Main.hs:6:18:
Could not find module ‘InlineAlts’
Use -v to see a list of the files searched for.
tests/Main.hs:7:18:
Could not find module ‘Issue11’
Use -v to see a list of the files searched for.
tests/Main.hs:8:18:
Could not find module ‘Issue14’
Use -v to see a list of the files searched for.
tests/Main.hs:9:18:
Could not find module ‘Mixfix’
Use -v to see a list of the files searched for.
tests/Main.hs:10:18:
Could not find module ‘Optional’
Perhaps you meant
Options (needs flag -package-key options-1.2.1.1@optio_CtVRCDNbaSzCLFqvgui815)
Use -v to see a list of the files searched for.
tests/Main.hs:11:18:
Could not find module ‘ReversedWords’
Use -v to see a list of the files searched for.
tests/Main.hs:12:18:
Could not find module ‘VeryAmbiguous’
Use -v to see a list of the files searched for.
So far I was able to find one library on Hackage for testing ambiguity of arbitrary CFGs using brute force: "Earley". If you know any other library, please, let me know.
Unfortunately I discovered this library too late, so I had to roll out my own ugly solution for ambiguity checking. So, please, make this library discoverable. I want this library to show up if I type "test ambiguity" and similar queries in http://hackage.haskell.org/packages/search . As well as I understand you need to add tags to do this. So, please, add following tags: "test ambiguity of CFG", "test ambiguity of context-free grammar", "check ambiguity of CFG" etc. And mention this feature in README
Should word
be generalized to
word :: (Eq t, ListLike i t) => i -> Prod r e t i
instead of
word :: Eq t => [t] -> Prod r e t [t]
?
This would make it easier to use values of type Text
in particular.
Would it be possible to change the type of satisfy
to something like
satisfy :: (t -> Maybe a) -> Prod r e t a
This would make it easier to have a type like
module Token (Token, mkToken, unToken) where
newtype Token = Token { unToken :: Char }
mkToken :: Char -> Maybe Token
mkToken x
| x >= 'a' || x <= 'z' = Just (Token x)
| otherwise = Nothing
and have satisfy
work as expected. (You can do this now, but with partial function nastiness.) Currently this is what Parsec's tokenPrim
looks like.
like:
instance (IsString t, Eq t, a ~ t) => IsString (Prod r e t a) where
fromString s = Terminal (== t) (pure id)
and/or even:
instance (IsString t) => IsList (Prod r e t a) where
type Item (Prod r e t a) = Prod r e t a
fromList ps = Alts ps (pure id)
toList = \case
Alts ps f -> fmap (f <*>) ps
p -> [p]
(away from my computer, haven't type checked)
The IsString instance increases the readability of the DSL, has a single simple implemention (and thus avoids forcing orphans on users), and doesn't confuse user when they don't enable OverloadStrings.
Is it normal to use Alternative's many
? For example, x1 <- rule $ many x2
? Is it normal to heavily use techniques known from Applicative parsing world, for example, to write something like the following code?
{-# LANGUAGE RecursiveDo #-}
import Control.Applicative
import Data.Char
import System.Environment
import Text.Earley
data Expr
= Add Expr Expr
| Mul Expr Expr
| Var String
deriving (Eq, Ord, Show)
sepBy :: Alternative f => f a -> f b -> f [b]
sepBy op x = (:) <$> x <*> many (op *> x)
expr :: Grammar r (Prod r String String Expr)
expr = mdo
x1 <- rule $ foldl1 Add <$> sepBy (namedToken "+") (foldl1 Mul <$> sepBy (namedToken "*") ((Var <$> satisfy ident) <|> namedToken "(" *> x1 <* namedToken ")"))
return x1
where
ident (x:_) = isAlpha x
ident _ = False
main :: IO ()
main = do
x:_ <- getArgs
print $ fullParses (parser expr) $ words x
The code above is modified version of https://github.com/ollef/Earley/blob/e7de8b77b63e1ee889d98cb6dfc2994178f23e6d/examples/Expr.hs and I hope they are semantically equivalent. I tested my version and it works.
My question is not about performance. It is about semantics. I. e. is code above correct or it works simply by chance?
It would be nice to be able to say a otherwise b
such that only if a
fails, b
appears in the output.
Early seems to do an ad-hoc version of this for (Just <$> a) <|> pure Nothing
, which returns [Just a]
if a succeeds, not the expected [Just a, Nothing]
.
I'm parsing an ambiguous grammar and I'd like to distinguish the following cases (more quickly than dealing with a list of all results):
The use case is being able to tell the user: "Hey, you gave me ambiguous input, here are some places to check for ambiguity". Giving them 20000 results isn't helpful, especially as any large number is bound to be the result of some combinatorial explosion, and solving one ambiguity makes good progress.
I had a quick go at modifying fullParses
in a few ways to prompt an earlier return but there wasn't a quick "no-understanding" fix.
the feature is only available in haddock's master branch. for example, you can run:
cabal haddock --with-haddock="$HOME/haddock/.cabal-sandbox/bin/haddock" --haddock-options="--hyperlinked-source"
after building haddock from source, in a sandbox (https://github.com/haskell/haddock):
git clone https://github.com/haskell/haddock.git
cd haddock # e.g. ~/haddock
cabal sandbox init
cabal sandbox add-source haddock-library
cabal sandbox add-source haddock-api
cabal install haddock-library haddock-api
cabal install -j4 --dependencies-only --enable-tests
cabal configure --enable-tests
cabal build -j4
or for now, you could just reupload the package with the dist/doc/
I built. I can send it to you if you send me an e-mail at redacted
(you don't really have to, but I've been using it to explore the code locally, and others might find it helpful too)
Terminology from the Marpa parser.
I have no idea what Marpa's implementation is, but perhaps this would work for this package:
In the same way that names are specified for tokens at specific positions, we could maintain a set of slippers, which are utilized when no terminals match in order to rescue the parse.
I was working on AoC problem (https://adventofcode.com/2021/day/10) recently and saw that recognizing whether an input matches a grammar was part of it. (I ended up going a different direction to solve it, but...)
Since I thought I only needed to identify whether something parsed or not, I wanted to use report
to test acceptance without bothering with parse results. But.. that actually isn't sufficient. At least, not unless you add names to the productions, and in sufficient granularity. If you have skipped that portion for your grammar, the Report value doesn't give sufficient information to determine whether something parsed in its entirety - only whether it ran into an error or was able to consume the entire input without errors. The only way to determine if any parse ran to completion is checking whether fullParses
returned any parses.
It would be really nice if report
returned some way to determine whether there was a full parse without requiring sufficiently granular naming of productions.
Just curious what you think about this behavior:
When defining an operator if _ _
with right associativity, then
if if x y z
will be successfully parsed as
if (if x y) z
but
if x if y z
will fail to parse, requiring explicit parentheses.
Doesn't this seem backwards? I would expect the above behavior if the operator had left associativity.
So, does associativity even make sense for operators that are not of the form _a_b_c_d_
? Are two Nothing
s in a row when defining a mixfix operator supposed to be supported?
I want to contribute a few ideas I have for new features. but after reading the source, I still can't understand how parse
works.
in http://www.nltk.org/book/ch08-extras.html (2 Chart Parsing and 2.7 The Earley Algorithm), the concepts seem to be:
but I don't see these in the types (State/Cont/Conts/Result). if they're not really there, how should I think about the types? is there a blog post or paper I could read that more closely reflects the core ideas behind this package?
also, it helps to know about:
Conts
: contsArgs :: !(STRef s (Maybe (STRef s (Results s a))))
is it for performance, for correctness? Everything is there for a reason, but it's hard to read for a noob like me.
thanks :-)
Wouldn't it be nicer to have list = listLike
?
https://github.com/ollef/Earley/blob/master/Text/Earley/Derived.hs#L29
Hello.
I did not find how to implement some sort of constraints on production. For example if I wrote a rule to parse a series of digits and wanted to validate that if falls into a specific range, what would I need to do? Of course I can check the values after the parser ends, but the context is lost in such case like position where error occurs. I see no functions in current interface I can use to do that and also I have not find something similar in the examples.
It probably could be something function like:
validate :: (a -> Bool) -> Prod r e t a -> Prod r e t a
or
validate :: (a -> Maybe b) -> Prod r e t a -> Prod r e t b
so when the predicate fails or returns Nothing, the parser fails for that production. And we can use it like:
num <- rule $ validate (<= 1234) $ read <$> some digit
Or may be some sort of extended rule function with validation:
ruleValidate :: (a -> Bool) -> Prod r e t a -> Grammar r (Prod r e t a)
...
num <- ruleValidate (<= 1234) $ read <$> some digit
Would it be possible to tag parsed items with a Data.Loc.L
rather than an Int
? Having both the column and row number is very useful.
More specifically,
allParses :: (forall s. ST s (Result s e i a)) -> ([(a, Int)], Report e i)
would become
allParses :: (forall s. ST s (Result s e i a)) -> ([L a], Report e i)
Currently, I'm lexing with https://hackage.haskell.org/package/lexer-applicative, which returns a stream of L
-tagged lexemes anyway, and then I'm awkwardly unpacking and re-packing the locations into the parsed items. Here's a small example:
-- Token
data Token
= TkInt Int
| ...
-- Lexer
lexer :: String -> [L Token]
-- Syntax
data Expr
= EInt Int
...
-- Parser
expr :: Prod r String (L Token) (L Expr)
expr = (\L loc n -> L loc (EInt n)) <$> intLit
<|> ...
intLit :: Prod r String (L Token) (L Int)
intLit = (\L loc (TkInt n) -> L loc n) <$> satisfy isIntLit <?> "int"
where
isIntLit (L _ (TkInt _)) = True
isIntLit _ = False
(apologies if the above doesn't typecheck, I just wrote it by hand)
If Earley
annotated parsed items with L
then I could simply forget all the work that the lexer did by mapping unLoc
over the stream of tokens, so I'd have productions like Prod r String Token Expr
rather than Prod r String (L Token) (L Expr)
. I wonder if there's a better way to integrate these two libraries.
Any idea how hard this would be to do? If Earley could generate EBNF then users could pass it to tools like GrammKit to make nice diagrams for their grammar.
I'm a very novice Haskell programmer and I wanted to write something with parsers. My current solution for parsing abc.def.ghi
uses the split
package like so:
dparse :: (String -> Bool) -> Prod r String String [String]
dparse f = splitOn "." <$> satisfy f
I feel like, from my Scala+cats knowledge, that this is a very poor way to do this. How should I properly implement it?
Hi,
I know this might be impossible with this library directly, but I was wondering how to allow the user to declare their own mixfix identifiers in the document being parsed. In parsec, or something, because the parsers are monadic I can simply keep updating my table as I parse more information, but as this library is strictly only for context-free languages, it makes this approach not feasible.
I guess I'd have to partially parse the file and grab all the declarations first, and then parse the document properly with earley? Is there an easier way?
I'm writing a parser to parse either 0) a sequence of digits with no grouping characters, or 1) a sequence of digits with grouping characters like commas.
I'm getting some behavior I wouldn't expect, but I'm too ignorant to know if this is a bug. I have tried to boil this down to a simple example, as follows:
{-# LANGUAGE RecursiveDo, RankNTypes #-}
module Main where
import Control.Applicative
import Text.Earley
import System.Environment
data Zero = Zero deriving (Eq, Show)
data Comma = Comma deriving (Eq, Show)
list :: Prod r e t a -> Grammar r (Prod r e t ([a]))
list p = mdo
r <- rule $ pure [] <|> (:) <$> p <*> r
return r
comma :: Prod r String Char Comma
comma = Comma <$ symbol ',' <?> ","
zero :: Prod r String Char Zero
zero = Zero <$ symbol '0' <?> "0"
data Grouped = Grouped Zero [Zero] Comma Zero [(Comma, Zero, [Zero])]
deriving (Eq, Show)
data Ungrouped = Ungrouped Zero [Zero] deriving (Eq, Show)
groupedOrUngrouped :: Grammar r (Prod r String Char (Either Grouped Ungrouped))
groupedOrUngrouped = mdo
seqZero <- list zero
seqGroups <- list $ (,,) <$> comma <*> zero <*> seqZero
let l = Grouped <$> zero <*> seqZero <*> comma <*> zero <*> seqGroups
r = Ungrouped <$> zero <*> seqZero
rule $ Left <$> l <|> Right <$> r
grouped :: Grammar r (Prod r String Char Grouped)
grouped = mdo
seqZero <- list zero
seqGroups <- list $ (,,) <$> comma <*> zero <*> seqZero
rule $ Grouped <$> zero <*> seqZero <*> comma <*> zero <*> seqGroups
main :: IO ()
main = do
arg:[] <- getArgs
print (allParses (parser groupedOrUngrouped) arg)
putStrLn ""
print (allParses (parser grouped) arg)
Here is some test output:
$ ./earleytest.hs 0,0
([(Right (Ungrouped Zero []),1)],Report {position = 1, expected = ["0"], unconsumed = ",0"})
([(Grouped Zero [] Comma Zero [],3)],Report {position = 3, expected = [","], unconsumed = ""})
I would expect that a Grouped
value would appear when using the groupedOrUngrouped
parser. I don't get a Grouped
value there, but I do get one when using the grouped
parser alone. Why? Any help is much appreciated. stack
enabled source is also at
a user might want to observe sharing themselves (e.g. manual annotations, http://hackage.haskell.org/package/data-reify, their own monad, etc.), and pass it down to the parser.
We can put it in a .Internal
module.
Once the parser finishes parsing some ambiguity, it would be really nice to expose this ambiguity to the user, rather than just multiplying the output by this ambiguity size.
For example disambiguate :: ([a] -> b) -> Prod a -> Prod b
or more generally ([a] -> [b]) -> Prod a -> Prod b
.
I had a look at naively changing some things in parse
and following the types but was unable to get this to work without better understanding the algorithm.
The discussion here makes it seem as though this is actually non-trivial! #18
Perhaps some important reading: https://dinhe.net/~aredridel/.notmine/PDFs/Parsing/SCOTT%2C%20Elizabeth%20-%20SPPF-Style%20Parsing%20From%20Earley%20Recognizers.pdf
In my grammars, sometimes I'll having something along the lines of many (whitespace) *> optional (stuff) <* many (whitespace)
. The issue then is that if I am parsing, say, 4 spaces, many identical Nothing
results are returned, based on the different ways the whitespace on the left and right could be matched. One solution is to rewrite it as many (whitespace) *> optional (stuff <* many (whitespace))
, but when you have complicated grammar its hard to coordinate this.
I had an idea for how to modify the library to accommodate this problem. Basically, a new kind of production rule that returns at most one result. Which result? Well, the idea is that it would only work on rules such that all their results are identical. That is, they would only work on rules of the form a <$ r
. That way, it does not matter what result is returned. Here are some examples of equations they would follow:
pure a = a <$ pure ()
fmap f (a <$ r) = (f a) <$ r
(f <$ r1) <*> (a <$ r2) = f a <$ (r1 *> r2)
r1 <* r2 = r1 <* (() <$ r2)
Then, in our many (whitespace) *> optional (stuff) <* many (whitespace)
example, many (whitespace) *> pure Nothing <* many (whitespace)
would reduce to Nothing <$ (many (whitespace) *> pure () <* many (whitespace))
. Since this is what the 4 spaces are matching against, at most one Nothing
result would be returned.
I was thinking about digging into the code and creating a pull request, but before that I wanted to see if it was seen as worthwhile, or if I was missing something obvious.
Hello. Thank you for this great library.
I think I found a bug. The grammar below diverges if I use default sepEndBy
from parser-combinators:
-- Does not work, but sepEndBy only requires an Applicative instance and we have it
grammar' = rule $ sepEndBy (satisfy (== OpenBracket)) (satisfy (== Dot))
-- but with this one everything is ok
sepEndBy :: Prod r e t a -> Prod r e t b -> Prod r e t [a]
-- btw sepBy can come from parser-combinators, and still no problem here
sepEndBy p sep = sepBy p sep <* optional sep
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.