Git Product home page Git Product logo

Comments (12)

kach avatar kach commented on August 17, 2024

Yeah, see Robin's comment about keyword/name separation.

from nearley.

rwindelz avatar rwindelz commented on August 17, 2024

as i was commenting on a closed PR i figured i would leave a ptr to it here: #41 and continue the discussion here

i spent some time with PEGs in general and OMeta a couple of years back - one of the ideas in OMeta is that strings are not the only thing it will parse - think of an AST that you would like to codify source to source translation rules - but i digress...

OMeta will parse arbitrary objects (strings being one particular type of object) and one of the ideas that supports this is the use of generalized predicates.
so, instead of predicates restricted to 'match this character', 'match this regex', 'match this nonterminal'; you can say 'match this object such that it satisfies this condition' . . . negation drops out of this as a special case as in 'match Var where Var not_a_member_of: keywords' . . . et voila

i believe fundamentally, boolean grammars have stronger theoretical underpinnings than 'let's add semantic predicates' . . . and yet having said that, Alessandro's pragmatic approach of generalized matching/predicates seems to work well in practice
(where's Two Face when you need him :-) )

so, you might consider a rule of the form:
A → αBρβ
where α, β are sequences, possible empty;
B is a symbol (terminal or non terminal)
ρ is a predicate that takes a sequence of parse results representing the parse of A up to B - which is already available as the partially (or completely if B is the final token in the sequence) constructed post-process array representing the results of the parse so far

lets say ρ is represented as {? ... ?}, the lua grammar for Name might look like:
Name -> _name {? function (d) { return !isKeyword(d[0]); } ?} {% function(d) {return {'name': d[0]}; } %}

predicates don't consume any input so they are invoked when 'B' completes and the completion code advances the parse only if the predicate succeeds

thoughts?

from nearley.

YafahEdelman avatar YafahEdelman commented on August 17, 2024

Actually, for almost all cases we don't need this stuff. JS has look ahead negation builtin to regexes so we can just allow :!? to be added to strings or something that makes them negative lookahead assertions. Alternatively we can change to a more complicated regex parser (I'm sure there is one out there or we could right one ourselves... nearley of course). As far as the OMeta like idea as far as it look it seems allow adding a function as additional constriants to the grammar. My concern with that would be that it might be heavy handed and it may be better to try to implement more advanced features. Possible we could add arbitary ebnf like tags (with the : prefixing) and make it easy to creat functions and assign symbols to them so adding something like a not keywork ebnf would be easy? Just a thought.

from nearley.

kach avatar kach commented on August 17, 2024

Ah, Robin, that's really cool stuff.

As it happens, nearley already supports parsing a list of arbitrary objects! In fact, we're cheating a bit by using the subscript syntax (something[5]) to get the nth character of a string. :-) Furthermore, if a rule's nonterminal is an object with a .test field, then instead of checking for equality, it runs the .test with the token as input. That's essentially how regex/charset tokens work—the JavaScript RegEx object has a built-in .test function.

The problem with using these features in compiled parsers is that you'll have to run a tokenizer first, which is sort of painful. It's the reason I shy away from (J | B)ison.

Anyhow.

Your proposal at the end is pretty exciting—I suggested something similar myself to Jacob on IRC. I'm going to look into implementing it this weekend. I'm not convinced of the {? ?} syntax, because for the sake of uniformity I want all included JS to be enclosed in {% %}. Perhaps

a -> word &{% isNotKeyword %} {% … %}

Here, & would be a pseudo-mnemonic for "it's a word and it follows this rule!". Thoughts?

(What interests me is that if I get negation working right, I'll have a parser that gracefully handles CFG intersection, because by DeMorgan we have !(!a || !b) -> a && b.)

from nearley.

rwindelz avatar rwindelz commented on August 17, 2024

i've been experimenting
in my fork of nearley https://github.com/rwindelz/nearley i've got two branches: https://github.com/rwindelz/nearley/tree/post-process-as-predicate and https://github.com/rwindelz/nearley/tree/predicates-in-parse

in post-process-as-predicate, if the post process function returns null it considers that as a fail and does not generate the subsequent parse state
this is evaluated once the rule is complete

in predicates-in-parse, i've added a predicate type of symbol - this is diffferent from the .test idea in that it inspects the post-processed result of the preceding token
changing it to your suggested &{% p %} is a simple matter of changing one line in the grammar (i may fix that in the next couple of minutes anyways)
this is evaluated immediately following the immediately preceding symbol is completed

cheers

from nearley.

rwindelz avatar rwindelz commented on August 17, 2024

k - predicates-in-parse now uses the syntax &{% js %}

from nearley.

rwindelz avatar rwindelz commented on August 17, 2024

application by way of example,

Before:
>node bin/nearleythere.js examples/js/lua.js --input "v = false"
Table length: 10
Number of parses: 2
Parse results:
[ { Block:
     [ { statement: 'assignment',
         body:
          [ [ { name: 'v' } ],
            [ { boolean: false } ] ] } ],
    Return: [] },
  { Block:
     [ { statement: 'assignment',
         body: [ [ { name: 'v' } ], [ { name: 'false' } ] ] } ],
    Return: [] } ]
After:
>node bin/nearleythere.js examples/js/lua.js --input "v = false"
Table length: 10
Number of parses: 1
Parse results:
[ { Block:
     [ { statement: 'assignment',
         body:
          [ [ { name: 'v' } ],
            [ { boolean: false } ] ] } ],
    Return: [] } ]

from nearley.

YafahEdelman avatar YafahEdelman commented on August 17, 2024

Once we get the JS parser working well we can get rid of {% and %} and just allow native js. Well be able to tell where there statements began and end. This should work at least for what it returns. It might be easier to implement by just replacing {% and %} with { and }.

from nearley.

kach avatar kach commented on August 17, 2024

I'm for post-process-as-predicates. My only concern is that somewhere, in either existing or soon-to-be-written grammar, null will inadvertently be returned and that bug will be pretty hard to track down. Is there a way to return a unique or at least sufficiently obscure value?

from nearley.

rwindelz avatar rwindelz commented on August 17, 2024

for the purpose of this experiment i was using nullas bottom - to indicate that the parse can not return anything, ie parse fails . . . which is different than returning the empty set/empty array as the result of a successful parse - i'm pretty sure i'm abusing the math notion of bottom but it's convenient

i agree, folks may very well decide to use null in spite of what theory says
so, perhaps the thing to do is to have a special value in the Parser object - eg.
Parser.fail = function () { return "fail"; }
nb. you never apply Parser.fail, just check for returnValue === Parser.fail

did you want to think it over some more before i send a PR?

from nearley.

kach avatar kach commented on August 17, 2024

Can't you just use an empty object? Parser.fail = {};

Feel free to file a PR for post-process-as-predicates, we can discuss further on there.

from nearley.

kach avatar kach commented on August 17, 2024

Recent pushes rectify this issue—now it's just a matter of carefully patching up javascript.ne. Closing.

from nearley.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.