peggyjs / peggy Goto Github PK

View Code? Open in Web Editor NEW

884.0 14.0 63.0 9.76 MB

Peggy: Parser generator for JavaScript

Home Page: https://peggyjs.org/

License: MIT License

JavaScript 84.35% CSS 5.70% HTML 0.10% Shell 0.22% TypeScript 6.47% PEG.js 3.17%

parser peg pegjs

peggy's People

Contributors

Stargazers

Watchers

peggy's Issues

Add prettier

Prettier makes it easier to maintain consistency in the codebase, especially when there's several different contributors. I'd like to enable it, but it's not as easy as it would normally be because several tests use formatting to convey intent and to make things more readable. We should either refactor those tests to convey similar intent in a way that prettier won't ruin, or we should use some prettier ignore comments.

AUTHORS file

Peg is no longer a single author project.

I recommend we create an AUTHORS file, leave David as the first name, and include anyone who's ever committed a non-trivial PR. There used to be a multi-author in package format, but nobody supports it, so it was taken out.

It would help this community to begin to grow again for people to feel some piece of ownership, both for their past and hopefully their future work.

I'd be happy to do the basics, but I'll want the community's help in adding their names

Update docs TODOs

depends on #26

Version floor conflict

Two libraries we've added - babel and nve - require core-js.

Core-js has a node floor of 12, but we offer a node floor of 10. As a result, our node 10 builds are breaking.

We are already getting rid of nve, and I think we're considering switching from babel to typescript for downcompiling

May I temporarily remove the 10.x claim from the build, with the goal of re-adding it once babel is out?

Can we somehow avoid eval?

compileCondition(eval(ast.consts[bc[ip + 1]]).length > 1

eval is no bueno. Can we remove it?

Expose es6 build in npm module; use module field in package

Most of this is already done in #107

Allows bundlers to select whether they want the umd or the es6 version (using the es6 version leads to smaller binaries from less machinery and better tree shaking)

Old benchmarks hit the public facing raw github URLs

(wrt https://github.com/peggyjs/peggy/pull/70/files#diff-16d6bdbe74a0f3ca14d8aaec0b1b68291b5e5486f3e3769e0d913a7ffbd4ebcaR76)

benchmarks should not be dependant on network access to external files from a third party, especially not one with an outage history like github's. this also prevents people on corporate networks from participation.

If the goal is to enforce that the public-facing examples in the repo run, or something, that should just be on the local copies. This means that the local test set enforces a different version of the source than you're actually running it against

Higher unicode

There's a long argument to be had here. It's been had several times.

One of the big problems with Javascript is that it actually mandates 16-bit max character storage. (lol what?) It does not mandate representation, but pretty much everybody is utf16le as a result.

This won't last forever. There's no good backcompat reason not to switch; the space outside the BMP is becoming increasingly important; et cetera.

There exists a patch for astral codepoints and a patch for uppercase supers.

There exist arguments against; some people feel that differing from contemporary Javascript is bad. However, this is almost certainly going into future Javascript, and this is an important topic now for many of us (dealing with surrogate pairs in PEG is nigh impossible.)

I believe that one or both (ideally at least the first) should go in.

Update vendor and unpkg code in online.html

Version updates for everything. Perhaps switch jsDump to node-inspect-extracted

host website on github pages

hey phpnode / codemix - thanks for fork and rollback : )

could we host the website on github pages?
https://pegjs.org/online looks like a static html-javascript page, so gh-pages should work

.. either by creating a branch gh-pages, or a separate repository, like pegjs-pages

hope the license says yeah

Untangle webpack bundles in docs

These files are unnecessarily-optimized, and make automating builds require extra tooling:

Figure out what's in each of them and do something less complicated.

VERSION not in peg.d.ts

Won't block release for this. Next time someone is in the file, please fix.

Browserify doesn't build under Windows

When commands are prefaced with a runner to resolve #94, we get here:

Not actually going to fix it; instead going to replace it with #83

When peg generators are in modules, use shorter names

Right now, peg generates methods peg$foo for methods properly named foo as a faked namespace

This doesn't minify well and wastes tremendous space when the parser is already isolated by other means, such as an iife, an es6 module, et cetera

This should be made dynamic

Opcodes should be constants, not class members

https://github.com/peggyjs/peggy/blob/main/lib/compiler/opcodes.js

These should just be dumb constants. Easier for the VM to optimize those away

Fixing this will be tedious but is almost certainly worthwhile

This should wait until after rollup is in, obviously

[Feature] Switch to a monorepo

Several packages in a pkg directory (it conflicts less often with package.json when doing completions on the command line than packages)

peggy (always/only published as a minified UMD which is tested for commonjs, es, ts, browser-script-src, and browser-import)
peggy-cli (which depends on peggy and commander, and is node12+ starting in 2 weeks)
sample programs (ts, etc), each marked private in their package.json

Superficial downstream Typescript support

You can fake support of Typescript for parsers without actually meaningfully supporting it, but still provide some value to downstream users

All that it takes in practice is adding a string for the output type

This could be supported as an argument in the options tuple, with basically no other changes

The end result of such a thing is that output from the parser would be easy to integrate into a typescript project, and would receive non-trivial (though still incomplete) typechecking

If I provided a PR that did this, and it was short and not awful, would it be considered?

Post-monorepo commentary: CI builds should be matrix builds

After the monorepo drops, the CI yaml should generate a matrix build at least across [operating systems, node versions].

I am happy to do the labor. This should not block the current PR.

Document `tracer`

It looks like tracer needs to be an instance of a class that has a trace(event) function. event is:

type: "rule.enter" | "rule.match" | "rule.fail"
rule: string // rule name
result?: number // result code
location

Post-monorepo commentary: Please consider not-mocha

mocha is mostly dead, and its coverage analysis is badly broken, as it's based on a heavily outdated version of istanbul which is patched to sorta-es6.

The istanbul people gave up and started over, creating nyc. Using a non-nyc based test engine on es6 is dangerous if you think coverage is important. (I think coverage is important.)

I recommend that we consider moving to jest. I am willing to do the labor, and I have done this on much larger test sets for my own projects recently, without codemods (ew,) so I'm not going in blind of what I'm offering.

Make scripts work on Windows

https://github.com/peggyjs/peggy/blob/main/package.json#L25

"test:node": "nve 10,12,14,15 npm run test:bare",

I would like to advocate that we consider testing node versions in CI, rather than locally, as many of us won't be able to install node version managers, and as that way one doesn't have to wait for four versions of node to go by to test. Also, CI can run them in parallel, and run the entire matrix, and we can exclude versions if we need to (which we ought to)

I can't currently run tests

Release script didn't push tag correctly

switch to git push upstream v${VERSION}

Generate API documentation

Some combination of jsdoc, tsc, tsdoc, and/or dts-gen.

Separate, related goal: generate .d.ts file, rather than hand-coding

There's a lot of wasted space. Could I remove it?

pegjs has enormous boatloads of whitespace emitted that doesn't need to be, in the "for speed" version

the "for speed" version should be renamed to "for human readability," and a new version should be emitted "for speed" which is the current one with all the unnecessary whitespace removed, and be the new default

the speed impacts of unrolling are sometimes lost by the download slowdown

on larger parsers it's sometimes 80% of the file

Better docs for parser options

There at least needs to be an example

See: pegjs/pegjs#666

Drop the "size" optimize mode

As I understand, David studied the construction of compilers, that is why he create a library with two possible implementations:

statically generated functions ("speed" optimizing mode)
interpreted bytecode ("size" optimizing mode)

Frankly speaking, size of the generated parsers in the "size" mode don't differ much from these parsers, generated in the "speed" mode, but their parsing speed is worse.

Generating in the "size" mode would be valuable if the generator had generated only bytecode instructions; but it is also generating the interpreter, which for small grammars don't differ much in the size from the grammars, generated in the "speed" mode.

Judging that, there is no reasons to keep the "size" mode, so I suggest to drop it. I think, that nobody uses that mode.

Move to rollup

switch out browserify to rollup.

Bytecode tests should use constants, rather than magic numbers

https://github.com/hildjj/peggy/blob/4d0767ff7895e61d54fa04f6fdcd94fb5f94c8b9/pkg/test/lib/unit/compiler/passes/generate-bytecode.spec.js#L5

I assume that these are probably they

you asked me to comment on tools/release

you had asked me to comment on tools/release

https://github.com/peggyjs/peggy/blob/main/tools/release#L90

there's a lot of value in doing things in the standard way. i worry that i'm becoming a wet blanket about this

this has no attachment to pipes. everything is run through a large monolithic custom script.

there's no way to multiplex processes here. you can't see what actually went wrong in an action.

because this is all buried in a custom script, we lose all piping benefits of the unix way, and gh actions' reporting stuff is no longer available

we can't re-use any standard actions because there's no way to put anything in the positioning chain

users can't use individual steps at their discretion; there can only be one path through this, ever

we're trying to manually implement publishing through npm through a wrapper script in a javascript. this is extremely dangerous and is likely to get hacked.

we're writing our own child process spawning and queueing system

i don't understand why we keep inventing unnecessary infrastructure

there's something really good here for this already. it's called npm scripts. it has the unix interface, which carefully correctly manages process parallelism (this is hard) and has standard logging expectations.

everybody already knows how to use it and work with it.

the result to replace this code is seven lines and is in a standard place where every node programmer will immediately look for it, is robust to several classes of operating system problem that are not handled here, and have been extensively tested

we don't want to write our own auto-publish. that's dangerous. if we're going to auto-publish, it should be the standard one that's been gone over extensively for security topics.

it would be difficult to use that here because it's an action and this doesn't participate in the pipeline lifecycle

this also appears to be doing git diffing. why? is the idea that this would be automated on the developer's local machine? that's not reproducible. this needs to be in ci/cd, and if it's in ci/cd, that's already done

i'm currently somewhat confused what this file is for. it was represented as equivalent to a two line script i put up that replaces a file with a number exported, and allows the source code compile process to handle everything else

these are actually quite night and day different

Could I put up a github action?

I have a fix for the #94 windows problem, but I don't have a linux box to test it against

If I put up a github action, I could test this better first

Eventually we should just use some or another CLI lib

wrt https://github.com/peggyjs/peggy/pull/70/files#diff-373018ba4534661dac9c14da96561406479fbdfadef98eed966f9697372eb469R104

... ew

Everybody argues about what CLI to use. I kind of like commander but I'm open to most things

Whatever it is should build cleanly under rollup on all three major platforms, bare minimum

New Website, Logo, Playground etc

The website is based on the original pegjs.org site and it's pretty dated, it would be nice to have a new site with some better branding. It's also an opportunity to introduce a nicer playground experience which is really important for people new to the project (and also for people not so new to it). TypeScript, ASTExplorer, Babel all have pretty nice playgrounds that we can draw inspiration from.

Move as much of the parser to library methods as possible, for testing's sake

This is really just moving things across an ABI, but it's a little defect prone initially

This shouldn't be done until rollup is in place, as cross-boundary calls in webpack are both space and time expensive

Add support for picking fields with `@`

as discussed at pegjs/pegjs#11,

foo = @bar _ @baz
bar = $"bar"i
baz = $"baz"i
_ = " "*

parse('barbaz') // returns [ 'bar', 'baz' ]

I use this syntax frequently when it's available.

Autogenerate version file artifact

Offer to autogenerate was accepted

Plan is:

npm script which removes the old one
npm script which creates the new one
npm script which updates the README.md
all to be lodged in the standard build process before the build proper begins

Grammar action errors need to be more informative (inherited from PEGjs)

Issue type

Bug Report: no
Feature Request: yes
Question: no
Not an issue: no

Prerequisites

Can you reproduce the issue?: yes
Did you search the repository issues?: yes (kind of)
Did you check the forums?: no (what forums?)
Did you perform a web search (google, yahoo, etc)?: yes

Description

So, as it stands, grammar action errors are about as vague than C++ segfaults.
For example, take this small grammar:

{
	const def1 = 1
        const def2 = 2
	const a = [def1 def2]
}

line = i / l

i = [ \t\n\r+]
l = [a-z]i

The error output is: missing ] after element list
Now, while this is fine in such a small example (its easy to spot), errors like this in a large grammar file is a headache.

I'd like to request the action errors to be reported the same as grammar errors, for example:

line = i / l

i = [ \t\n\r+
l = [a-z]i

outputs Line 3, column 5: Expected "!", "$", "&", "(", ".", character class, comment, end of line, identifier, literal, or whitespace but "[" found., which is magnitudes more helpful.

I think a good solution to this would be to at least include the line number for the action errors

Steps to Reproduce

Make a grammar file
Make a small typo (or other mistake) in javascript action
Observe cryptographically secure error message

Example code:
Oops, provided above in description

Expected behavior:

I'd expect errors in action contexts to at least be descriptive enough to find the error
Actual behavior:

Errors in action context are vague and not helpful

Software

PEG.js: ^0.10.0
Node.js: v14.16.0
NPM or Yarn: yarn (primary) 1.22.10, npm 6.14.11
Browser: N/A
OS: Windows 8.1 32 bit os, 64 bit processor
Editor: Sublime 3.2.2 all day

Remove bower support

Bower is deprecated, we should remove references to it from the website and README etc

Remove custom benchmarking util in favor of standard one

wrt https://github.com/peggyjs/peggy/pull/70/files#diff-373018ba4534661dac9c14da96561406479fbdfadef98eed966f9697372eb469R40

We shouldn't have our own benchmarking utility. It's nonsense for us to maintain this code. These didn't exist when David wrote this, but now they do, and the common ones these days have a better understanding of statistical significance and so on

Generated code has wrong version number

I have noticed that the Peggy version number in the generated output is incorrect. It says;

// Generated by Peggy 0.11.0.
//
// https://github.com/peggyjs/peggy

Whereas I presume it should be:

// Generated by Peggy 1.0.0
//
// https://github.com/peggyjs/peggy

--format — format of the generated parser: amd, commonjs, globals, umd (default: commonjs)

But Peggy includes an es option as well.

It's worth noting that the CLI documentation does have the es type listed:

Format of the generated parser: amd, commonjs, globals, umd, es (default: commonjs).

But, the dependencies section for the JS API says:

valid only when format is set to "amd", "commonjs", or "umd"

Where it should include es" in the list.

I could try and get a PR in to fix these things, but I'm not sure when I'll get the time to do so.

Add `types` to package.json

See https://www.typescriptlang.org/docs/handbook/declaration-files/publishing.html

Replace download link with new one in doc

https://github.com/peggyjs/peggy/blob/main/docs/index.html#L53

            <span id="download">
                <a
                    title="Download a minified version of Peggy for the browser"
                    href="https://github.com/pegjs/pegjs/releases/download/v0.10.0/peg-0.10.0.min.js"
                >minified</a>
                |
                <a
                    title="Download Peggy for the browser"
                    href="https://github.com/pegjs/pegjs/releases/download/v0.10.0/peg-0.10.0.js"
                >development</a>
            </span>

the old release has js file with release

so where we link new peggy?

Turn on eslint prefer-const

Post-1.0

there's a comment that says:

    // Disabled because using `const` for anything else than for immutable
    // variables of permanent character (generally spelled in `ALL_CAPS`) feels
    // confusing.

which i find... not-mainstream.

peggy/lib/peg.d.ts

Lines 17 to 26 in 2a37f1f

 class SyntaxError { 

 line: number; 

 column: number; 

 offset: number; 

 location: LocationRange; 

 expected: any[]; 

 found: any; 

 name: string; 

 message: string; 

 }

Actual code:

peggy/lib/compiler/passes/generate-js.js

Lines 754 to 764 in 2a37f1f

 "function peg$SyntaxError(message, expected, found, location) {", 

 " this.message = message;", 

 " this.expected = expected;", 

 " this.found = found;", 

 " this.location = location;", 

 " this.name = \"SyntaxError\";", 

 "", 

 " if (typeof Error.captureStackTrace === \"function\") {", 

 " Error.captureStackTrace(this, peg$SyntaxError);", 

 " }", 

 "}",

Switch to @peggyjs/eslint-config

I extracted .eslintrc.js to a separate package, so we can keep code-peggy-language and any other repos in sync, lint-wise. See https://github.com/hildjj/code-peggy-language/blob/main/.eslintrc.cjs for an example of the extends syntax.

	class SyntaxError {
	line: number;
	column: number;
	offset: number;
	location: LocationRange;
	expected: any[];
	found: any;
	name: string;
	message: string;
	}

	"function peg$SyntaxError(message, expected, found, location) {",
	" this.message = message;",
	" this.expected = expected;",
	" this.found = found;",
	" this.location = location;",
	" this.name = \"SyntaxError\";",
	"",
	" if (typeof Error.captureStackTrace === \"function\") {",
	" Error.captureStackTrace(this, peg$SyntaxError);",
	" }",
	"}",