pfnet / rflex Goto Github PK

Fast lexer code generator for Rust

License: MIT License

Rust 100.00%

rflex's Issues

Introduce failure

Currently process function provides std::io::Error only.
So if I want to use rflex as library, I can't get internal error information like TranslateError.
Additionally the call of eprintf and std::process::exit in library can be stopped.

Because rflex is provided as library, it should provide an unified error type and remove the call of eprintf and std::process::exit.

I recommend to use failure crate to define error type.
The following documents may be helpful.

https://boats.gitlab.io/failure/error-errorkind.html
https://qiita.com/legokichi/items/d76b6aa5dac2ad781bda

Is it possible to consider a dual license model MIT/Apache 2.0 for rflex?

In the Rust community, a dual license under MIT/Apache 2.0 is common, and Rust itself is so.
A typical example is Bevy, which was initially MIT licensed, but they worked hard to relicense it.

The reason why the dual-license is reasonable is discussed in bevyengine/bevy#2373, and it says:

The MIT license (arguably) requires binaries to reproduce countless copies of the same license boilerplate for every MIT library in use. MIT-only engines like Godot have complicated license compliance rules as a result

The Apache-2.0 license has protections from patent trolls and an explicit contribution licensing clause.

The Rust ecosystem is largely Apache-2.0. Being available under that license is good for interoperation and opens the doors to upstreaming Bevy code into other projects (Rust, the async ecosystem, etc).

The Apache license is incompatible with GPLv2, but MIT is compatible.

Large allocation slowdown

I recently traced some slowness to this allocation:

        let mut cmap: Vec<usize> = Vec::with_capacity(0x110000);
        cmap.resize(0x110000, 0);

I understand that this is necessary to support unicode (max 0x10ffff), but I suspect that high code points (>0xff) are pretty rare when lexing e.g. programming languages. I sped up my lexing by roughly 100x by reducing allocation to the largest whitespace (0x202a), and then using

if zz_input < 0xff { self.cmap[zz_input as usize] } else { 0usize }

later for advancing the state. I wonder whether it might make sense to be smarter about this? E.g.

use sth like lazy_static! to allocate the table
replace the lookup table with a large match and let llvm sort it out
use HashMap for the sparse region > 0xff

Apply rustfmt

This is only my recommendation.
But I think code formatting by rustfmt is helpful when many contributors make PR.

Suggest providing byte position spans

I noticed that generated yytext is a freshly allocated String copying from the original '&'a str passed to the lexer, and also the internal token positions zz_start_read and zz_marked_pos are char indices. I suggest an additional interface in which byte offsets are also stored, by means of using https://doc.rust-lang.org/std/primitive.str.html#method.char_indices to keep track of byte offsets, and then it would be possible to provide a method that returns a Range https://doc.rust-lang.org/std/ops/struct.Range.html instead of yytext, and then the client could use that along with the original &'a str to read a substring if desired, all without any new allocation (except for a Range pair of byte offsets).

Output path of `process`

The output path of process is the same as the input.
But this is not match with the policy of cargo publish.

Now cargo publish verify that src dir wasn't modified by build.rs.
rust-lang/cargo#5584
So if src/test.rs in example1 is generated by build.rs, example1 can't be published.

The example of code generation by build.rs is below:
https://doc.rust-lang.org/cargo/reference/build-scripts.html#case-study-code-generation

To follow it, the output path of process should be specified.

pfnet / rflex Goto Github PK

rflex's Issues

Introduce failure

Is it possible to consider a dual license model MIT/Apache 2.0 for rflex?

Large allocation slowdown

Apply rustfmt

Suggest providing byte position spans

Output path of `process`

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent