Git Product home page Git Product logo

retsac's Issues

Add decorator support for SelectedAction.

Currently if we want to set data or callback for a selected action, we have to:

Action.from(/123/).data(...).then(...).kinds(...).map(...)

And we can't access the kinds defined in kinds.

Proposed:

Action.from(/123/).kinds(...).map(...).data(...).then(...)

It would be ncie if we can set different data type for different kinds:

Action.from(...).kinds(...).map(...).data('k1', ()=>...).data('k2', ()=>...)

Expectational lexing should be applied in grammar rule level, instead of global.

When we have many grammar rules, expectational lexing will be slow. Every grammar rule will try to do an expectational lexing, and the expectational lexings are unrelated. Without expectational lexing, all grammar rules can share one lexing result.

The value of expectational lexing is not to improve the performance, but to handle lexical errors, e.g. lex regex literal in JavaScript.

To avoid the overhead of the expectational lexing, we should add a property like expect in grammar rules when define the grammar rule. The field should indicate which grammar in this grammar rule should be lexed with expectation.

When there is no expectation, grammar rules should just use the lexing result without expectation (this can also be cached with the current caching mechanism). When there is an expectation, the grammar rule (actually the candidate) will use expectational lexing.

The DFA state should also maintain a map to record when the expectational lexing is needed.

Lexer's ActionExec should take `buffer` & `start` as the params to optimize the performance.

For now Lexer's ActionExec will take buffer as the param, which will cause frequent String.slice to create temp string, which is slow.

By applying buffer & start as the parameters, it's like we are creating a StringView, which is faster.

Many string methods support the start parameter, e.g. startsWith, indexOf.

For regex, we can set lastIndex to let the match start from a specified position if we enable the 'g' flag for the regex.

As the tradeoff, if an action needs a substring, it needs to call slice by itself. If there are many actions creating temp strings like this, there will still be many temp strings, and user need to manage those temp string by themselves to optimize the performance.

Complex conflicts handling if we need to peek more than 1 token.

E.g. when we parsing javascript:

f(({a, b}) => a + b);

with the following grammar rules:

exp := '(' '{' identifier (',' identifier)* '}' ')' '=>' exp
exp := '(' exp ')'
exp := object
object := '{' (object_entry (',' object_entry)*)? '}'
object_entry := identifier (':' exp)?

when we digest f(({ a we don't know whether the a is an object entry or an arrow function param. We have to peek maybe many tokens to judge that.

Solution for this issue:

  • Re-parse, see #19 . Not recommended.
  • Optimize grammar rules to prevent this to happen. Introducing more intermediate NT. Bad user experience.
  • Allow grammar rule to do more than reduce AST nodes. E.g. override parser buffer.

Re-parse for unresolved conflict?

In LR(1) we only check the next grammar to resolve the conflict but sometimes it's not enough.

Maybe we can add something like re-parse to rollback the parser, just like re-lex?

This may impact the performance, consider add a new build option reParse: boolean.

Drop tokens.

Currently we store token in ASTNode.token and in Lexer.errors. But for most cases, these tokens are not used, which will waste a lot of memory.

For the ASTNode, a better way is to define some transformer to transform a token into a terminator ASTNode then we can drop the token.

For Lexer.errors, we can define a callback to receive those error tokens.

More lexer utils.

E.g. blank chars and comments are common requirements when build a compiler.

const lexer = new Lexer.Builder()
  .ignore(/^\s/) // ignore blank chars
  .ignore(Lexer.from_to("//", "\n", true)) // single line comment
  .ignore(Lexer.from_to("/*", "*/", true)) // multiline comment

Specify token kind in Lexer.action's output?

Currently, every action can only map to a single token kind.

If 2 kinds have similiar lexing rule, we have to run these rules 2 times to yield the token.

To solve this, maybe Definition.kind should be a list of all possible kinds instead of a single string.

Clone lexer's context?

Usually lexer should be stateless. But sometimes lexer actions can depend on external states using closure. When we use parser the inner lexer may be cloned many times and thus the external states will be messed up.

So we should add a state/context for lexer (maybe also in parser), which should implement a LexerContext interface, and access it in actions.

interface LexerContext {
  clone(): this
}

// inject context in builder.
// the context type should be a part of the lexer's generic type.
// the builder should clone the context value and store as the default context value.
builder.context(...).define(...).build()

// access the context in actions
Lexer.Action.simple((input) => {
  console.log(input.context);
  return 0;
});

Usually the context is an object, maybe we also should implement a default clone method.

Add dist build

Then we can use CDN like jsdelivr to use this lib in browser.

Panic mode in ELR parser

If no token can be lexed (with expectation) when parsing, the parser should call lexer.take(1) to eat one char, then try to continue.

Be ware, when working with parser, you shouldn't use lexerBuilder.ignore(/^./) to implement lexing error handling, since it will be used in trimStart.

Make all actions RegExp?

RegExp is commonly used in downstream components like tmLanguage.

Maybe we should make all actions RegExp? Then maybe it would be easy to transform the lexer into tmLanguage.json?

Make re-lex optional?

Make re-lex optional to improve the runtime performance?
Is there a way to check whether the re-lex is required?

Serialize/Deserialize DFA

It's unnecessary to build DFA each time.

Ideas:

  • Assign unique id/index to each DFA candidate/state, store as JSON. Re-connect objects when load.
  • Follow sets, first sets, etc.

Issues:

  • What about user-defined functions?
  • Cache grammar rules to check if they are changed?

Placeholder conflict resolver generated by advanced parser builder maybe invalid.

Consider the following grammar rule:

{ exps: `exp (',' exp)* ','?` }

Expanded & generated grammar rule:

{ exps: `exp | exp ',' | exp __0 | exp __0 ','` }
{ __0: `',' exp | ',' exp __0` }

One of the generated conflict resolver:

{ __0: `',' exp` } vs { __0: `',' exp __0` }, next: *, accept: false

When parsing exp ',' exp ',', and we digested exp ',' exp, trying to reduce to exp __0, but rejected by the conflict resolver since we want greedy match, the grammar rule can't be accepted.

As the conclusion, grammar snippet decorated with +*? shouldn't be followed by some other grammar snippet, where the decorated grammar snippet starts with the follower grammar snippet.

Can we check this and auto generate correct resolvers?

Maybe #19 is a correct solution?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.