pfnet / rflex Goto Github PK
View Code? Open in Web Editor NEWFast lexer code generator for Rust
License: MIT License
Fast lexer code generator for Rust
License: MIT License
Currently process
function provides std::io::Error
only.
So if I want to use rflex as library, I can't get internal error information like TranslateError
.
Additionally the call of eprintf
and std::process::exit
in library can be stopped.
Because rflex is provided as library, it should provide an unified error type and remove the call of eprintf
and std::process::exit
.
I recommend to use failure crate to define error type.
The following documents may be helpful.
https://boats.gitlab.io/failure/error-errorkind.html
https://qiita.com/legokichi/items/d76b6aa5dac2ad781bda
In the Rust community, a dual license under MIT/Apache 2.0 is common, and Rust itself is so.
A typical example is Bevy, which was initially MIT licensed, but they worked hard to relicense it.
The reason why the dual-license is reasonable is discussed in bevyengine/bevy#2373, and it says:
- The MIT license (arguably) requires binaries to reproduce countless copies of the same license boilerplate for every MIT library in use. MIT-only engines like Godot have complicated license compliance rules as a result
- The Apache-2.0 license has protections from patent trolls and an explicit contribution licensing clause.
- The Rust ecosystem is largely Apache-2.0. Being available under that license is good for interoperation and opens the doors to upstreaming Bevy code into other projects (Rust, the async ecosystem, etc).
- The Apache license is incompatible with GPLv2, but MIT is compatible.
I recently traced some slowness to this allocation:
let mut cmap: Vec<usize> = Vec::with_capacity(0x110000);
cmap.resize(0x110000, 0);
I understand that this is necessary to support unicode (max 0x10ffff), but I suspect that high code points (>0xff) are pretty rare when lexing e.g. programming languages. I sped up my lexing by roughly 100x by reducing allocation to the largest whitespace (0x202a), and then using
if zz_input < 0xff { self.cmap[zz_input as usize] } else { 0usize }
later for advancing the state. I wonder whether it might make sense to be smarter about this? E.g.
lazy_static!
to allocate the tablematch
and let llvm sort it outHashMap
for the sparse region > 0xffThis is only my recommendation.
But I think code formatting by rustfmt is helpful when many contributors make PR.
I noticed that generated yytext
is a freshly allocated String
copying from the original '&'a str
passed to the lexer, and also the internal token positions zz_start_read
and zz_marked_pos
are char
indices. I suggest an additional interface in which byte offsets are also stored, by means of using https://doc.rust-lang.org/std/primitive.str.html#method.char_indices to keep track of byte offsets, and then it would be possible to provide a method that returns a Range
https://doc.rust-lang.org/std/ops/struct.Range.html instead of yytext
, and then the client could use that along with the original &'a str
to read a substring if desired, all without any new allocation (except for a Range
pair of byte offsets).
The output path of process
is the same as the input.
But this is not match with the policy of cargo publish
.
Now cargo publish
verify that src dir wasn't modified by build.rs.
rust-lang/cargo#5584
So if src/test.rs
in example1 is generated by build.rs
, example1 can't be published.
The example of code generation by build.rs
is below:
https://doc.rust-lang.org/cargo/reference/build-scripts.html#case-study-code-generation
To follow it, the output path of process
should be specified.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.