Consider this code: <div class="highlight highlight-source-haskell notranslate pos

Crashes if grammar is infinitely ambiguous on given input about earley HOT 11 OPEN

safinaskar commented on July 4, 2024

Crashes if grammar is infinitely ambiguous on given input

from earley.

Comments (11)

ollef commented on July 4, 2024

I think the reason for this is that when Earley builds its result lists, it will add later results to the start of the list.

Intuitively, we start with

r1_results = []

and then we parse token A, which sets

r1_results = [()]

Next, we process the other alternative, which yields

r1_results = r1_results ++ [()]

which is our final result. The stack overflow you see is from trying to force this value. I would expect the report function to work for this example because it doesn't force the result. (But it won't be useful to you since it doesn't say anything about ambiguity.)

We could write results to the end instead (but this has performance implications, discussed in #15), which would yield

r1_results = [()] ++ r1_results

This would work for your example since it's productive. But note that it would stop being productive again if you flipped the arguments to <|>.

So I can't see an easy fix that would always work. But if what I've written here is correct, your example would work if you flipped the arguments to <|>.

from earley.

safinaskar commented on July 4, 2024

But if what I've written here is correct, your example would work if you flipped the arguments to <|>

I just checked. No, null $ fst $ fullParses (parser lang) [A] still causes stack overflow in ghci

I would expect the report function to work for this example because it doesn't force the result

Yes, report (parser lang) [A] works. With both (pure () <* token A) <|> r1 and r1 <|> (pure () <* token A)

from earley.

ollef commented on July 4, 2024

Hmm, okay, it's been a while so I may be getting the details wrong.

from earley.

safinaskar commented on July 4, 2024

upTo 1 (generator lang [A]) gives stack overflow, too

from earley.

safinaskar commented on July 4, 2024

This motivated me to publish alternative library for checking ambiguity: https://hackage.haskell.org/package/check-cfg-ambiguity . My library does not freeze on this example. I hope you are not offended

from earley.

ollef commented on July 4, 2024

Cool! I wonder if you could implement the same on a grammar from this library. :)

from earley.

safinaskar commented on July 4, 2024

Cool! I wonder if you could implement the same on a grammar from this library. :)

What you mean?

from earley.

safinaskar commented on July 4, 2024

You mean implementing my algorithm on your Grammar type?

from earley.

ollef commented on July 4, 2024

Yes.

from earley.

safinaskar commented on July 4, 2024

I don't know. My type of grammar is very simple (Data.Map.Map n [[TerminalOrNonterminal t n]]). t and n are any kind of IDs for terminals and nonterminals (for example, strings with names). TerminalOrNonterminal is essentially Either. I will repeat explanation of my algorithm from haskell-cafe:

I generate all strings of symbols, which can be reached from start symbol by replacing nonterminals with their productions no more than "count" times. If I get duplicate, then the grammar is ambiguous. This strings are strings of any symbols, not necessary terminals. And "count" means count of replacements, i. e. count of productions applications.
This is simple brute force algorithm. It does not really checks grammar for ambiguity (this is impossible on Turing machine)

My algorithm is trivial, and you can easily get idea by looking at code of lowLevelTestAmbiguity: https://hackage.haskell.org/package/check-cfg-ambiguity-0.0.0.1/docs/src/CheckCFGAmbiguity.html#lowLevelTestAmbiguity .

My algorithm requires that nonterminals can be checked for equality. Because I generate all strings which I could get using count expansions, and then try to find duplicates. And strings contain both terminals and nonterminals.

Also, I am currently writing library, which gets grammar in some big bloated form, and then produces both Data.Map.Map n [[TerminalOrNonterminal t n]] and Earley's Grammar. Then this grammar checked for ambiguity using check-cfg-ambiguity and also used to actually parse something using Earley. Also I produce pretty-printer

from earley.

safinaskar commented on July 4, 2024

I wrote parsing library I talked about in #54 (comment) : https://mail.haskell.org/pipermail/haskell-cafe/2021-July/134217.html , it is based on Earley. Also, I recently published another parsing library, which is based on Earley, too: https://mail.haskell.org/pipermail/haskell-cafe/2021-July/134205.html

from earley.

Crashes if grammar is infinitely ambiguous on given input about earley HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent