julialang / juliasyntax.jl Goto Github PK

View Code? Open in Web Editor NEW

263.0 27.0 28.0 2.15 MB

The Julia compiler frontend

License: Other

Julia 99.77% Shell 0.23%

julia compiler parser syntax

juliasyntax.jl's People

Contributors

Stargazers

Watchers

juliasyntax.jl's Issues

parsing error for comma after newline in between square brackets

julia> [1
       ,2]
ERROR: ParseError:
Error: Expected `]`
@ REPL[5]:2:1
[1
,2]


julia> rand(2,2)[1
       ,2]
ERROR: ParseError:
Error: Expected `]`
@ REPL[6]:2:1
rand(2,2)[1
,2]


julia> (1,
       2)
(1, 2)

Failure to parse some hex floats

julia> -0x1.428a2f98d728bp+341
-5.643803094122362e102

julia> JuliaSyntax.parse(Expr, "-0x1.428a2f98d728bp+341")
ERROR: ArgumentError: cannot parse "-0x1.428a2e98d728bp+341" as Float32
Stacktrace:
  [1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
    @ Base ./parse.jl:373
  [2] #tryparse_internal#477
    @ ./parse.jl:369 [inlined]
  [3] tryparse_internal
    @ ./parse.jl:366 [inlined]
  [4] #parse#478
    @ ./parse.jl:379 [inlined]
  [5] parse
    @ ./parse.jl:379 [inlined]
  [6] julia_string_to_number(str::SubString{String}, kind::JuliaSyntax.Kind)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:26
  [7] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:33
  [8] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:89
  [9] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::Nothing, kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:189
 [10] build_tree
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:186 [inlined]
 [11] #build_tree#90
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:208 [inlined]
 [12] build_tree
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:207 [inlined]
 [13] #parse#83
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:124 [inlined]
 [14] parse(::Type{Expr}, input::String)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120
 [15] top-level scope
    @ REPL[39]:1

I think we're being to eager in interpreting the f as meaning Float32 here.

surprising error for two commas

julia> a,,b
ERROR: ParseError:
Error: unexpected closing token

I don't see this as a closing token. The standard parser says

julia> a,,b
ERROR: syntax: unexpected ","

which seems to make more sense.

`.<:` and `.>:` should not be parsed as unary operators

From #31 (comment)

Unary versions .<: and .>: parse but they shouldn't.
Also, them not being translated to the dotted versions in the Expr conversion is weird.

julia> JuliaSyntax.parseall(Expr, ".<: b", rule=:statement)
:(<:b)

julia> Meta.parse(".<: b")
ERROR: Base.Meta.ParseError("\".<:\" is not a unary operator")
Stacktrace:
...

julia> JuliaSyntax.parseall(Expr, ".>: b", rule=:statement)
:(>:b)

julia> Meta.parse(".>: b")
ERROR: Base.Meta.ParseError("\".>:\" is not a unary operator")
...

:incomplete expression generation

# on https://github.com/JuliaLang/julia/pull/46372
julia> Base.parse_input_line("code_typed((Float64,)) do x")
:($(Expr(:error, JuliaSyntax.ParseError(JuliaSyntax.SourceFile("code_typed((Float64,)) do x", "none", [1, 28]), JuliaSyntax.Diagnostic[JuliaSyntax.Diagnostic(28, 27, :error, "premature end of input"), JuliaSyntax.Diagnostic(28, 27, :error, "Expected `end`")]))))

# on master
julia> Base.parse_input_line("code_typed((Float64,)) do x")
:($(Expr(:incomplete, "incomplete: premature end of input")))

This difference disallows REPL to accept multi-line inputs on JuliaLang/julia#46372, e.g.

julia> code_typed((Float64,)) do x<<<RET>>>
ERROR: ParseError:
Error: premature end of input
@ REPL[16]:2:1
code_typed((Float64,)) do x
Error: Expected `end`
@ REPL[16]:2:1
code_typed((Float64,)) do x

Diagnostics for string->value conversion

With #77, #80, and #81, the only remaining parsing errors are escape sequence errors like #67.

Imho unescape_julia_string should never throw an exception. Instead, we should already check escape sequence validity when parsing (so that we can emit the correct diagnostics) and then emit ErrorVals during SyntaxNode construction.

Error recovery for unexpected continuation keywords

Consider

julia> JuliaSyntax.parse(JuliaSyntax.GreenNode, "if true; x ? true : elseif true end")[1]
     1:35     │[toplevel]
     1:35     │  [if]
     1:2      │    if
     3:3      │    Whitespace
     4:7      │    true                 ✔
     8:26     │    [block]
     8:8      │      ;
     9:26     │      [if]
     9:9      │        Whitespace
    10:10     │        Identifier       ✔
    11:11     │        Whitespace
    12:12     │        ?
    13:13     │        Whitespace
    14:17     │        true             ✔
    18:18     │        Whitespace
    19:19     │        :
    20:20     │        Whitespace
    21:26     │        [error]           ✘
    21:26     │          elseif         ✔
    27:27     │    Whitespace
    28:31     │    [error]               ✘
    28:31     │      true               ✔
    32:32     │    Whitespace
    33:35     │    end

This special case is fixed by #77 by punting the elseif into the containing block instead:

julia> JuliaSyntax.parse(JuliaSyntax.GreenNode, "if true; x ? true : elseif true end")[1]
     1:35     │[toplevel]
     1:35     │  [if]
     1:2      │    if
     3:3      │    Whitespace
     4:7      │    true                 ✔
     8:19     │    [block]
     8:8      │      ;
     9:19     │      [if]
     9:9      │        Whitespace
    10:10     │        Identifier       ✔
    11:11     │        Whitespace
    12:12     │        ?
    13:13     │        Whitespace
    14:17     │        true             ✔
    18:18     │        Whitespace
    19:19     │        :
    20:19     │        error             ✘
    20:20     │    Whitespace
    21:32     │    [elseif]
    21:26     │      elseif
    27:27     │      Whitespace
    28:31     │      true               ✔
    32:32     │      [block]
    32:32     │        Whitespace
    33:35     │    end

but of course that naive solution only works if there is only one missing or extraneous token, so "if true; x ? true : foo ))))) elseif true end" will break it again.

Generally, this should be solvable by an arbitrarily long look-ahead for continuation keywords, but I really don't like that solution (and it might not even work in all cases).

Error message in plain text discards visual location information

When I get a syntax error, the offending characters are highlighted:

In fact, they're... "lowlighted", so that the offending character is dimmer than other code (I can barely see the 8 there), but it's still (de)emphasized.

When I copy this error message and paste it as plain text, all formatting is lost, which is expected:

julia> (2+5+8
ERROR: ParseError:
Error: Expected `)` but got unexpected tokens
@ REPL[17]:2:1
(2+5+8

However, nothing visually points at the location of the error anymore: the 8 is no different from other code. Sure, the location is indicated by REPL[17]:2:1 (Does it mean "second line, first character"? Looks more like a "1st line, 6th character" to me...), but the visual is lost. Implementations of other programming languages rely on actual characters (not formatting) to indicate the position of the error.

Python draws a "pointer" to the error location, so a plain-text error message still tells me where the error is:

>>> (2,4,;)
  File "<stdin>", line 1
    (2,4,;)
         ^
SyntaxError: invalid syntax

Clang does this too:

$ cat syntax_error.c 
int main() {
    0
}
$ clang syntax_error.c 
syntax_error.c:2:6: error: expected ';' after expression
    0
     ^
     ;
syntax_error.c:2:5: warning: expression result unused [-Wunused-value]
    0
    ^
1 warning and 1 error generated.

Rust's error messages draw around the offending code all the time, which is extremely helpful.

It would be nice to have error messages one could copy & paste without loss of information, especially visual indication of where the error is.

Julia 1.8.0
JuliaSyntax.jl 0.1.0

`a b` in REPL treated as incomplete

If I enter

julia> a b<enter>

it waits for more input. This should be an immediate "extra token" error.

extra newline after error output in REPL

julia> "a$2"
ERROR: ParseError:
Error: identifier or parenthesized expression expected after $ in string
@ REPL[14]:1:4
"a$2"


julia>

The error is correct but there is an extra blank line before the next prompt.

Where should the lexer live?

In #31 (comment), @pfitzseb said

Btw, we really should think about upstreaming the Tokenize changes in this repo... Pretty sure the opsuffix changes for &&/|| are implemented there.

We've chatted about this in various places and I mention it in the README. I'd like to resolve the double maintenance problem in some way, for sure :-)

But having modified Tokenize fairly extensively, I'm unsure whether the lexer should be versioned separately from the parser. Currently I see the lexer as serving the needs of parsing rather than something which is independent. Particularly because

Lexing Julia correctly is impossible without keeping state. Worse, that state needs to be recursive for nested string interpolations. Other cases which need state or lookahead are prime (#25) and various contextual keywords like outer. It's possible to add state to the lexer itself, but that's annoyingly redundant. And the redundancy of state becomes much worse when you consider recovery from malformed string interpolations.
JuliaSyntax can give you the disambiguated token stream in a flat format out of ParseStream. It's fairly lightweight, no need to opt into Expr (or other) tree building!
Parsing+lexing is currently only about half as fast as pure lexing.

So with those in mind, I feel like we could just recommend people use the full parser for purposes we previously used Tokenize.jl for? And that more tightly integrating the tokenizer source into JuliaSyntax might be best.

(Somewhat of a side note — I've also wondered whether we could do an Automa.jl - based lexer if we wanted to delve more deeply into performance optimization. I suspect a generated lexer would be a lot faster if unicode decoding were folded into the state machine.)

For now, I'm content to port fixes back and forth as required.

What do people think? @pfitzseb @KristofferC ?

no error for over-long character literal

I get

julia> 'abc'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Should be

julia> 'abc'
ERROR: syntax: character literal contains multiple characters

Prime parsing issues

Bunch of nasty edge cases related to symbols followed by primes:

julia> Meta.parse(":+'")
:(:+')

julia> JuliaSyntax.parseall(Expr, ":+'")
ERROR: ParseError:
Error: extra tokens after end of expression
@ line 1:3
:+'

julia> Meta.parse(":+'l'")
:(:+' * l')

julia> JuliaSyntax.parseall(Expr, ":+'l'")
ERROR: ParseError:
Error: extra tokens after end of expression
@ line 1:3
:+'l'

julia> Meta.parse(":?'")
:(:?')

julia> JuliaSyntax.parseall(Expr, ":?'")
ERROR: ParseError:
Error: extra tokens after end of expression
@ line 1:3
:?'

Diagnostics as pattern matching

I've been thinking about what we'd need for a diagnostics system which can really solve a couple of core problem I'm worrying about:

Accessibility: end users should be easily able to contribute new helpful and friendly diagnostics without understanding the code of the compiler frontend. Friendly comprehensible errors are most helpful to beginners, and beginners should be able to help writing these. But beginners will rarely be able to dive into JuliaSyntax.jl and make changes.

Cleanliness and separation of concerns: If possible I don't want to clutter the parser itself with large amounts of heuristic code and error/warning message formatting.

With these in mind, I want to claim that:

For a parser system where a syntax tree is always produced, compiler diagnostics (warnings, errors) are not really different from linter messages based on symbolic pattern matching

Therefore, we should be inspired linters like semgrep in using pattern matching techniques to match warnings and errors against the (partially broken) AST that the compiler produces. Ideally, errors and warnings could be expressed declaratively as a piece of malformed Julia code with placeholders which capture parts of that code and an error message template.

Discuss :-)

Some tokenize issues

Running the tests from the system image, I found some broken tests due to operators which should be errors, but the JuliaSyntax lexer tokenizes them as operators:

julia> broken_ops = [
        "a .-> b",
        "a .>: b",
        "a .<: b",
        "a ||₁ b",
        "a ||̄ b",
        "a .||₁ b",
        "a &&₁ b",
        "a &&̄ b",
        "a .&&₁ b",
       ]
9-element Vector{String}:
 "a .-> b"
 "a .>: b"
 "a .<: b"
 "a ||₁ b"
 "a ||̄ b"
 "a .||₁ b"
 "a &&₁ b"
 "a &&̄ b"
 "a .&&₁ b"

julia> [[JuliaSyntax.Tokenize.untokenize(t, s) for t in JuliaSyntax.Tokenize.tokenize(s)] for s in broken_ops]
9-element Vector{Vector{String}}:
 ["a", " ", ".->", " ", "b", ""]
 ["a", " ", ".>:", " ", "b", ""]
 ["a", " ", ".<:", " ", "b", ""]
 ["a", " ", "||₁", " ", "b", ""]
 ["a", " ", "||̄", " ", "b", ""]
 ["a", " ", ".||₁", " ", "b", ""]
 ["a", " ", "&&₁", " ", "b", ""]
 ["a", " ", "&&̄", " ", "b", ""]
 ["a", " ", ".&&₁", " ", "b", ""]

Failure to parse blocks in comprehension if condition

julia> JuliaSyntax.parse(Expr, """
       Any[foo(i) 
           for i in x if begin
               true
           end
       ]
       """)
(:($(Expr(:toplevel, :(#= line 1 =#), :($(Expr(:typed_comprehension, :Any, :((foo(i) for i = x if begin)), :($(Expr(:error, true, :($(Expr(:end)))))))))))), JuliaSyntax.Diagnostic[JuliaSyntax.Diagnostic(45, 56, :error, "Expected `]`")], 60)

julia> Meta.parse("""
       Any[foo(i) 
           for i in x if begin
               true
           end
       ]
       """)
:(Any[foo(i) for i = x if begin
          #= none:3 =#
          true
      end])

julia> Meta.parse("""
       Any[foo(i) for i in x if begin
               true
           end
       ]
       """)
ERROR: Base.Meta.ParseError("expected \"]\"")

Note that Meta.parse is sensitive to the newline before for, so it's possible we should treat this as a bug in the reference parser.

Original code can be found here.

Better testing of `SyntaxNode -> Expr` conversion

As exposed by #113, we had some unnecessary regressions due to the Expr conversion code being under-tested.

We need to make sure each branch in src/expr.jl is covered with the tests in test/expr.jl

[update] Some tests are now hosted in test/parser.jl, but it'd be better to move them into test/expr.jl and decouple them from the other tests, I think.

error when encountering format characters

Julia's parser seems to accept this:

julia> JuliaSyntax.parseall(JuliaSyntax.SyntaxNode, "\ufeffusing Test")
ERROR: ParseError:
Error: invalid syntax atom
@ line 1:1
using Test
Error: extra tokens after end of expression
@ line 1:2
using Test

There might be an argument to be made to just disallow this in Julia base.

Ref julia-vscode/CSTParser.jl#333

Incorrect parsing of array literal with newline before comma

On Julia master,

[ []
, [] ]

parses correctly as a Vector{Vector{Any}}, but with JuliaSyntax, we get the ParseError

Error: Expected `]`
@ REPL[11]:2:3
[ []
  ,[] ]

This is a fairly major bug since this prevents loading CpuId which depends on this syntax.

Peek behind failure in `parse_function`

This shouldn't happen:

julia> JuliaSyntax.parseall(JuliaSyntax.SyntaxNode, "function ()(x) 23 end")
ERROR: Internal error: Can't peek behind at start of stream
Stacktrace:
  [1] error(::String, ::String)
    @ Base ./error.jl:42
  [2] internal_error(strs::String)
    @ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:220
  [3] peek_behind(stream::JuliaSyntax.ParseStream; skip_trivia::Bool)
    @ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:521
  [4] peek_behind
    @ ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:503 [inlined]
  [5] #peek_behind#54
    @ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
  [6] peek_behind
    @ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
  [7] parse_function(ps::JuliaSyntax.ParseState)
    @ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:2032
  [8] parse_resword(ps::JuliaSyntax.ParseState)
    @ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:1744

Spotted by @BenChung

Trivia interface

I was thinking a bit about the "right" interface to trivia.

The rust-analyzer people are discussing it over at rust-lang/rust-analyzer#6584 so they've got some good background reading there. It seems generally awkward with no obviously right answer.

IIUC there's two common interfaces:

Roslyn, Swift libsyntax — attach multiple trivia tokens trivia to each side of nontrivia nodes (or just nontrivia leaf nodes?). The nontrivia nodes themselves become "fatter", and the depth of the tree is increased by 1.
rust-analyzer, IntelliJ(??) — attach trivia tokens as arbitrary children of any interior nodes. So they're generally siblings of nontrivia nodes.

The rust-analyzer model is appealing because it leads to simpler data structures with less internal structure. Also it's more general because the trivia might be naturally interspersed with nontrivia children but without a natural attachment to any of the children. But we could go for either approach, or something else entirely.

Whitespace trivia

A useful observation: we can't attaching whitespace so that

Every node represents a contiguous span of bytes in the source file - a fundamental property of green trees
We respect the visual tree structure — in the sense that nested nodes should only contain whitespace relevant to their own internal tree structure (rather than where they are placed in the larger tree.)

In general for a refactoring pass, I guess whitespace will become inconsistent during refactoring and will need to be regenerated. This is obviously true for moving blocks but it's even true for refactoring as simple as renaming identifiers. For example, renaming elements of expressions which span multiple lines:

func(arg1, arg2, ...
     argN, argN1)
^^^^
# problematic whitespace if length of func symbol changes

So I'm kind of convinced that there's no natural representation of whitespace within the green tree, so we may as well do whatever is efficient and simple to implement.

Symbols

Consider a simple thing like (b + c) + (b + c)^2 and a pass which identifies common subexpressions to get

x = b + c
(x) + (x)^2

Here we can and should remove the parentheses (which are trivia after parsing, due to being used for grouping only). What do we even do here? Like whitespace, it seems refactorings will regularly break this kind of trivia and require that it's regenerated from a model of the precedence rules.

Comments

What about comments? This is much more relevant and I think we should aim for "comments are likely to survive symbolic refactoring and remain attached in the right places".

It seems likely there's cases where one or other model wins here, depending on the situation. Some prototyping with simple example refactoring passes might be necessary to get a feel for the pros and cons.

Impact on the parser

One big benefit we have in the ParseStream interface is that trivia is mostly invisible to the parser. So in theory we can adjust trivia attachment heuristics (within whichever model is chosen) independently of the parser code. Julia is sensitive to whitespace and newlines in selected situations, but after parsing is done this information is no longer needed and it may be consistent to split and recombine trivia however we like by floating the boundaries of nodes across the trivia tokens.

Meta.parse on invalid syntax with enable_in_core!

I noticed that the errors are different:

julia> Meta.parse("[(1,2]")
ERROR: Base.Meta.ParseError("unexpected \"]\" in argument list")
Stacktrace:
 [1] #parse#3
   @ ./meta.jl:237 [inlined]
 [2] parse(str::String; raise::Bool, depwarn::Bool)
   @ Base.Meta ./meta.jl:268
 [3] parse(str::String)
   @ Base.Meta ./meta.jl:268
 [4] top-level scope
   @ REPL[1]:1

julia> using JuliaSyntax

julia> JuliaSyntax.enable_in_core!()

julia> Meta.parse("[(1,2]")
ERROR: MethodError: Cannot `convert` an object of type JuliaSyntax.ParseError to an object of type String
Closest candidates are:
  convert(::Type{String}, ::String) at /Applications/Julia-1.7 x86.app/Contents/Resources/julia/share/julia/base/essentials.jl:223
  convert(::Type{T}, ::T) where T<:AbstractString at /Applications/Julia-1.7 x86.app/Contents/Resources/julia/share/julia/base/strings/basic.jl:231
  convert(::Type{T}, ::AbstractString) where T<:AbstractString at /Applications/Julia-1.7 x86.app/Contents/Resources/julia/share/julia/base/strings/basic.jl:232
  ...
Stacktrace:
 [1] Base.Meta.ParseError(msg::JuliaSyntax.ParseError)
   @ Base.Meta ./meta.jl:190
 [2] #parse#3
   @ ./meta.jl:237 [inlined]
 [3] parse(str::String; raise::Bool, depwarn::Bool)
   @ Base.Meta ./meta.jl:268
 [4] parse(str::String)
   @ Base.Meta ./meta.jl:268
 [5] top-level scope
   @ none:1

Not sure if this matters or not, just thought i would report it

Bad error message for `1.+1`

JuliaSyntax gives

While flisp gives

ERROR: syntax: invalid syntax "1.+"; add space(s) to clarify
Stacktrace:
 [1] top-level scope
   @ none:1

Bug parsing exception spec in `catch`

Spotted by @BenChung

julia> itest_parse(JuliaSyntax.parse_expr, "try 3 catch e+3 end")
# Code:
try 3 catch e+3 end

# Green tree:
     1:19     │[try]
     1:3      │  try                        "try"
     4:5      │  [block]
     4:4      │    Whitespace               " "
     5:5      │    Integer              ✔   "3"
     6:6      │  Whitespace                 " "
     7:11     │  catch                      "catch"
    12:12     │  Whitespace                 " "
    13:13     │  Identifier             ✔   "e"
    14:15     │  [block]
    14:15     │    Integer              ✔   "+3"
    16:16     │  Whitespace                 " "
    17:16     │  false                  ✔   ""
    17:16     │  false                  ✔   ""
    17:19     │  end                        "end"

Whereas the reference parser provides the error invalid syntax "catch (e + 3)"

Rework handling of `.` tokenization and dotted operator calls

The handling of . in the tokenizer / parser is pretty wonky / inconsistent because the tokenization of . is context-dependent.

bump_split(), in particular is quite ugly and shouldn't exist:

... is tokenized into K"..." which is usually correct. But incorrect for import/using statements as in import ...A ==> (import (. . . . A)), necessitating the splitting of tokens with bump_split - this is ugly!
Dotted operators are tokenized with the dot as part of the operator. But sometimes we have to split them, as in standalone dotted operators .+ ==> (. +)

In addition, Expr is quite inconsistent about dotted infix calls vs dotted prefix calls. We should really fix this? Then we can remove the . from the operator names and treat it as separate syntax as it should be! (See also #88)

julia> dump(Meta.parse("a .+ b"))
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol .+
    2: Symbol a
    3: Symbol b

julia> dump(Meta.parse("f.(a, b)"))
Expr
  head: Symbol .
  args: Array{Any}((2,))
    1: Symbol f
    2: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Symbol a
        2: Symbol b

for loop parsing error

julia> for i in 1:3, j in i:3, k in j:3
ERROR: ParseError:
Error: premature end of input
@ REPL[122]:2:1
for i in 1:3, j in i:3, k in j:3
Error: Expected `end` but got unexpected tokens
@ REPL[122]:2:1
for i in 1:3, j in i:3, k in j:3

Getting this error in the REPL
Julia v1.8.0
JuliaSyntax.jl v0.1.0

Better diagnostic for block-like `begin ... end` in `typed_hcat`

Over at JuliaLang/julia#46364 @eschnett observed

The array expression [begin 1 end] creates a 1-element array. The begin...end block is not really necessary here, but can be convenient if the expression is much more complicated, e.g. a comprehension.

The typed array expression Int[begin 1 end] leads to the parsing error ERROR: syntax: unexpected "end".

This being due to the ambiguity of whether begin or end in the first slot inside a[] should be treated as block keywords or as the first/last indices of the array:

julia> dump(:(a[end 1]))
Expr
  head: Symbol typed_hcat
  args: Array{Any}((3,))
    1: Symbol a
    2: Symbol end
    3: Int64 1

julia> dump(:(a[1 end]))
ERROR: syntax: unexpected "end"
Stacktrace:
 [1] top-level scope
   @ none:1

Over in that issue, it was suggested that the parser could explain the issue and suggest using let instead of begin, which seems like a good option.

Failure to parse some octal escape sequences

julia> Meta.parse(raw""" "\777" """)
"\xff"

julia> JuliaSyntax.parse(Expr, raw""" "\777" """)
ERROR: ArgumentError: octal escape sequence out of range
Stacktrace:
  [1] unescape_julia_string(io::IOBuffer, str::SubString{String})
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:172
  [2] unescape_julia_string(str::SubString{String}, is_cmd::Bool, is_raw::Bool)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:203
  [3] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:53
  [4] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:89
  [5] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::Nothing, kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:189
  [6] build_tree
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:186 [inlined]
  [7] #build_tree#90
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:208 [inlined]
  [8] build_tree
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:207 [inlined]
  [9] #parse#83
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:124 [inlined]
 [10] parse(::Type{Expr}, input::String)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120
 [11] top-level scope
    @ REPL[81]:1

Better error for trailing commas in `using`

Leaving a trailing comma after the last element of using is a mistake I've made a few times. The parser only catches this error later, resulting in a fairly poor error message. For example, it can gobble up the function keyword in the next part of the file:

julia> using A: b,

       function foo()
       end

ERROR: ParseError:
Error: extra tokens after end of expression
@ REPL[4]:3:9
using A: b,

function foo()
--------^^^^^^---
end

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Incorrect parsing of `function` with parens.

master:

julia> dump(:(function (f(::T) where {T}) end))
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol where
      args: Array{Any}((2,))
        1: Expr
          head: Symbol call
          args: Array{Any}((2,))
            1: Symbol f
            2: Expr
              head: Symbol ::
              args: Array{Any}((1,))
                1: Symbol T
        2: Symbol T
    2: Expr
      head: Symbol block
      args: Array{Any}((2,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[11]
        2: LineNumberNode
          line: Int64 1
          file: Symbol REPL[11]

JuliaSyntax

julia> dump(:(function (f(::T) where {T}) end))
Expr
  head: Symbol function
  args: Array{Any}((2,))
    1: Expr
      head: Symbol tuple
      args: Array{Any}((1,))
        1: Expr
          head: Symbol where
          args: Array{Any}((2,))
            1: Expr
              head: Symbol call
              args: Array{Any}((2,))
                1: Symbol f
                2: Expr
            2: Symbol T
    2: Expr
      head: Symbol block
      args: Array{Any}((2,))
        1: LineNumberNode
          line: Int64 1
          file: Symbol REPL[21]
        2: LineNumberNode
          line: Int64 1
          file: Symbol REPL[21]

When using in REPL, `JuliaSyntax.jl` does not seem to work well with multiline parsing?

When I want to type

map(1:10) do x
    2x
end

after I type map(1:10) do x and hit return, I got

julia> map(1:10) do
ERROR: ParseError:
Error: premature end of input
@ REPL[11]:2:1
map(1:10) do
Error: premature end of input
@ REPL[11]:2:1
map(1:10) do
Error: Expected `end` but got unexpected tokens
@ REPL[11]:2:1
map(1:10) do

Or if I want to type

 w = u +
     v +
     1

I got

julia> w = u +
ERROR: ParseError:
Error: premature end of input
@ REPL[14]:2:1
w = u +

But I can do them with copy-paste:

Can we make the REPL-mode tolerate multiline code a little bit more?

Reworking some Julia AST forms

With JuliaSyntax, we've got our own green tree (GreenNode) and AST (SyntaxNode) which often differ from Expr, due to the requirement that children are strictly in source order. Some current differences are described in https://github.com/JuliaLang/JuliaSyntax.jl#tree-differences-between-greennode-and-expr. Given that we've been forced to diverge we might as well make the most of this and reconsider some aspects of Expr for two reasons:

To give extra textural information to users of green tree (eg, presence of parentheses), without the need to recompute this information by inspecting the trivia. (These nodes will probably be elided during AST conversion?)
To give extra information to AST users like macro writers (eg, conditional ternary is the same as :if for Expr users)

List of possible changes

Considered and rejected for now

:toplevel expressions occur both at file scope and as ;-delimited expressions at file scope. This seems kind of weird?
Is it possible to improve the representation of global x,y vs global x vs global (x,y) = (1,2) vs global (x,y)? (In particular, the variables might or might not be enclosed in a tuple.)
Lower cmd strings to Core.@cmd later, not inside parser
Lower custom string and cmd macros to macrocall later, not inside parser
Infix vs prefix calls - should this be managed in flags or a kind like K"infix_call"? (considered and decided against in #99 and #124)

Dotted unary op with number literal fails to parse to SyntaxNode

julia> JuliaSyntax.parse(Expr, ".-.1")
ERROR: ArgumentError: cannot parse ".-.1" as Float64

julia> JuliaSyntax.parse(Expr, ".-0.1")
ERROR: ArgumentError: cannot parse ".-0.1" as Float64

julia> JuliaSyntax.parse(Expr, ".-1")
ERROR: ArgumentError: invalid BigInt: ".-1"

julia> Meta.parse(".-.1")
:((.-)(0.1))

missing error for **

julia> 2**3
ERROR: ParseError:
Error: invalid syntax atom
@ REPL[3]:1:2
2**3
Error: extra tokens after end of expression
@ REPL[3]:1:4
2**3

instead of

julia> 2**3
ERROR: syntax: use "x^y" instead of "x**y" for exponentiation, and "x..." instead of "**x" for splatting.

Parser error with `peek_behind`

julia> JuliaSyntax.parseall(JuliaSyntax.GreenNode, "function")
ERROR: Internal error: Can't peek behind at start of stream
Stacktrace:
  [1] error(::String, ::String)
    @ Base ./error.jl:42
  [2] internal_error(strs::String)
    @ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:220
  [3] peek_behind(stream::JuliaSyntax.ParseStream; skip_trivia::Bool)
    @ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:521
  [4] peek_behind
    @ ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:503 [inlined]
  [5] #peek_behind#54
    @ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
  [6] peek_behind
    @ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
  [7] parse_function(ps::JuliaSyntax.ParseState)
    @ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:2032

Incorrect precedence for indexing vs macros.

On Julia master:

julia> dump(:(@a[1]))
Expr
  head: Symbol macrocall
  args: Array{Any}((3,))
    1: Symbol @a
    2: LineNumberNode
      line: Int64 1
      file: Symbol REPL[3]
    3: Expr
      head: Symbol vect
      args: Array{Any}((1,))
        1: Int64 1

with JuliaSyntax

julia> dump(:(@a[1]))
Expr
  head: Symbol macrocall
  args: Array{Any}((2,))
    1: Expr
      head: Symbol ref
      args: Array{Any}((2,))
        1: Symbol @a
        2: Int64 1
    2: LineNumberNode
      line: Int64 1
      file: Symbol REPL[26]

AST inconsistency between parsing of `=` vs `kw`

There's various cases where = is parsed into a kw head, but this is inconsistent, especially when named tuples come into play. This requires various gymnastics in the parser as discussed in https://github.com/JuliaLang/JuliaSyntax.jl#kw-and--inconsistencies

For the named tuple inconsistency, consider the difference between the following:

julia> dump(Meta.parse("(a=1, b=2)"))
Expr
  head: Symbol tuple
  args: Array{Any}((2,))
    1: Expr
      head: Symbol =
      args: Array{Any}((2,))
        1: Symbol a
        2: Int64 1
    2: Expr
      head: Symbol =
      args: Array{Any}((2,))
        1: Symbol b
        2: Int64 2

julia> dump(Meta.parse("f(a=1, b=2)"))
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol f
    2: Expr
      head: Symbol kw
      args: Array{Any}((2,))
        1: Symbol a
        2: Int64 1
    3: Expr
      head: Symbol kw
      args: Array{Any}((2,))
        1: Symbol b
        2: Int64 2

Lowering seems to paper over this difference, but it's not very satisfying.

The difference is also confusing when implementing macros as one cannot interpolate expressions like :(a=1) into a call and expect them to become keywords. For example:

julia> :(f($(:(a=1))))
:(f($(Expr(:(=), :a, 1))))

julia> :(f($(Expr(:kw,:x,1))))
:(f(x = 1))

Is there a way to resolve this inconsistency? For example, can we remove the kw head entirely and just use =? Or can we emit :kw in the named tuple case?

One objection to removing kw and always using = is that interpolating things like :(a=1) into a call is that this would change the meaning of the = from an assignment into a keyword argument. However we already have this problem with named tuples. Alternatively, can we always parse named tuples with kw as the expression head?

`dim` can overflow in `parse_array`

E.g. when parsing https://github.com/OpenMendel/MendelImpute.jl/blob/v1.2.3/test/run.jl. That file's obviously invalid, but we should presumably handle this case a bit more nicely. Meta.parse doesn't error here.

┌ Error: parsing failed for /home/pfitzseb/juliasyntaxtest/pkgs/MendelImpute_1.2.3/test/run.jl
│   ex =
│    Numeric flags unable to hold large integer -9223372036854775808
│    Stacktrace:
│      [1] error(s::String)
│        @ Base ./error.jl:35
│      [2] set_numeric_flags
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parse_stream.jl:34 [inlined]
│      [3] parse_array(ps::JuliaSyntax.ParseState, mark::JuliaSyntax.ParseStreamPosition, closer::JuliaSyntax.Kind, end_is_symbol::Bool)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:2634
│      [4] parse_cat(ps::JuliaSyntax.ParseState, closer::JuliaSyntax.Kind, end_is_symbol::Bool)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:2814
│      [5] parse_call_chain(ps::JuliaSyntax.ParseState, mark::JuliaSyntax.ParseStreamPosition, is_macrocall::Bool)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1469
│      [6] parse_call_chain
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1383 [inlined]
│      [7] parse_call(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1311
│      [8] parse_factor(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1258
│      [9] parse_unary(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1101
│     [10] parse_juxtapose(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1058
│     [11] parse_where(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_juxtapose))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1013
│     [12] parse_unary_subtype(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:974
│     [13] parse_LtoR(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_unary_subtype), is_op::typeof(JuliaSyntax.is_prec_bitshift))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:347
│     [14] parse_shift(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:943
│     [15] parse_LtoR(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_shift), is_op::typeof(JuliaSyntax.is_prec_rational))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:347
│     [16] parse_rational(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:938
│     [17] parse_with_chains(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_rational), is_op::typeof(JuliaSyntax.is_prec_times), chain_ops::Tuple{JuliaSyntax.Kind})
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:893
│     [18] parse_term(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:885
│     [19] parse_with_chains(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_term), is_op::typeof(JuliaSyntax.is_prec_plus), chain_ops::Tuple{JuliaSyntax.Kind, JuliaSyntax.Kind})
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:893
│     [20] parse_expr(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:878
│     [21] parse_range(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:794
│     [22] parse_LtoR(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_range), is_op::typeof(JuliaSyntax.is_prec_pipe_gt))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:347
│     [23] parse_pipe_gt(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:781
│     [24] parse_RtoL(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_pipe_gt), is_op::typeof(JuliaSyntax.is_prec_pipe_lt), self::typeof(JuliaSyntax.parse_pipe_lt))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:361
│     [25] parse_pipe_lt(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:775
│     [26] parse_comparison(ps::JuliaSyntax.ParseState, subtype_comparison::Bool)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:749
│     [27] parse_comparison(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:733
│     [28] parse_lazy_cond(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_comparison), is_op::typeof(JuliaSyntax.is_prec_lazy_and), self::typeof(JuliaSyntax.parse_and))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:698
│     [29] parse_and(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:726
│     [30] parse_lazy_cond(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_and), is_op::typeof(JuliaSyntax.is_prec_lazy_or), self::typeof(JuliaSyntax.parse_or))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:698
│     [31] parse_or(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:717
│     [32] parse_arrow(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:674
│     [33] parse_cond(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:629
│     [34] parse_RtoL(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_cond), is_op::typeof(JuliaSyntax.is_prec_pair), self::typeof(JuliaSyntax.parse_pair))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:361
│     [35] parse_pair(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:620
│     [36] parse_comma(ps::JuliaSyntax.ParseState, do_emit::Bool)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:598
│     [37] parse_comma
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:596 [inlined]
│     [38] parse_assignment
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:558 [inlined]
│     [39] parse_eq
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:531 [inlined]
│     [40] parse_docstring(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_eq))
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:489
│     [41] parse_docstring
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:487 [inlined]
│     [42] parse_Nary(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_docstring), delimiters::Tuple{JuliaSyntax.Kind}, closing_tokens::Tuple{JuliaSyntax.Kind})
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:391
│     [43] parse_stmts(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:465
│     [44] parse_toplevel(ps::JuliaSyntax.ParseState)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:429
│     [45] parse(stream::JuliaSyntax.ParseStream; rule::Symbol)
│        @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:98
│     [46] #parse#83
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:123 [inlined]
│     [47] parse
│        @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120 [inlined]
│     [48] (::var"#21#23")()
│        @ Main ~/juliasyntaxtest/run.jl:40
│     [49] with_logstate(f::Function, logstate::Any)
│        @ Base.CoreLogging ./logging.jl:511
│     [50] with_logger(f::Function, logger::ConsoleLogger)
│        @ Base.CoreLogging ./logging.jl:623
│     [51] top-level scope
│        @ ~/juliasyntaxtest/run.jl:12
│     [52] include(fname::String)
│        @ Base.MainInclude ./client.jl:476
│     [53] top-level scope
│        @ REPL[2]:1
│     [54] eval
│        @ ./boot.jl:368 [inlined]
│     [55] eval_user_input(ast::Any, backend::REPL.REPLBackend)
│        @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
│     [56] repl_backend_loop(backend::REPL.REPLBackend)
│        @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
│     [57] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
│        @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
│     [58] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
│        @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
│     [59] run_repl(repl::REPL.AbstractREPL, consumer::Any)
│        @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
│     [60] (::Base.var"#966#968"{Bool, Bool, Bool})(REPL::Module)
│        @ Base ./client.jl:419
│     [61] #invokelatest#2
│        @ ./essentials.jl:729 [inlined]
│     [62] invokelatest
│        @ ./essentials.jl:726 [inlined]
│     [63] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
│        @ Base ./client.jl:404
│     [64] exec_options(opts::Base.JLOptions)
│        @ Base ./client.jl:318
└ @ Main ~/juliasyntaxtest/run.jl:45

Incremental reparsing

@davidanthoff asked on Zulip about incremental reparsing.

is there support for partial reparses, i.e. some sort of incremental parsing? Basic idea is that user presses one key in the editor, and we don't want to reparse the whole document on every key press, but only a subset, based on the precise range of the doc that was edited

To capture my thoughts on this somewhere more permanent, I think this should work fine but there's a couple of tricky things to work out:

First, how are the changed bytes supplied to the parser system? I haven't looked into LanguageServer yet. But presumably it's "insert this byte here" or "change line 10 to 'such-and-such' string". Those might require a representation of the source which isn't a String (or Vector{UInt8} buffer). It might be a rope data structure or something? Should we extend the SourceFile abstraction to allow different AbstractString types? Or perhaps this state should be managed outside the parser completely? Internally, I feel the lexer and parser should always operate on Vector{UInt8} as a concrete efficient datastructure for UTF-8 encoded text, so the subrange of text which is being parsed should probably be copied into one of these for use by the tokenizer.

Second, the new source text intersects with the existing parse tree node(s) which cover some range of bytes. There can be several such nodes nested together; which one do we choose? Equivalently, which production (JuliaSyntax.parse_* function) do we start reparsing from? Starting deeper in the tree is good because it implies a smaller span, but the parser may have nontrivial state which isn't explicit in the parse tree. For example, space sensitive parsing within [] or macro calls. Or the special parsing of in as = within iterator specification of a for loop. So we'd need a list of rules to specify which productions we can restart parsing from, and correctly reconstruct the ParseState for those cases. To start with, toplevel/module scope is probably fine and we could throw something together quickly for that, I think.

Very deep call stacks for nested expressions

While profiling allocations of JuliaSyntax parsing itself, I've noticed that the call stack sometimes gets extremely deep.

For example, the large expression in src/tokenize_utils.jl is_operator_start_char() has some 573 parentheses (!) Setting aside that this is potentially questionable code in its own right ... it seems the call stack repeats with a period of 38 in this case, resulting in a call stack depth of ~32*573 = 21774. This being due to the way the expression is arranged as a completely unbalanced tree.

This is kind of inherent to using recursive descent the way we do, and presumably we can live with it. But it does seem a bit non-ideal for parser performance. Presumably if we parsed expressions with a Pratt parser we could avoid such extreme stack depths as the factor of 38 might be reduced to 2 or so?

Incorrect parsing of `-(a=2)`

On Julia master:

julia> @Meta.lower -(a = 2)
:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = Core.get_binding_type(Main, :a)
│   %2 = Base.convert(%1, 2)
│   %3 = Core.typeassert(%2, %1)
│        a = %3
│   %5 = -2
└──      return %5
))))

With JuliaSyntax:

:($(Expr(:thunk, CodeInfo(
    @ none within `top-level scope`
1 ─ %1 = Core.tuple(:a)
│   %2 = Core.apply_type(Core.NamedTuple, %1)
│   %3 = Core.tuple(2)
│   %4 = (%2)(%3)
│   %5 = Core.kwfunc(-)
│   %6 = (%5)(%4, -)
└──      return %6
))))

Internal error parsing `+=`

julia> Meta.parse("+=")
┌ Error: JuliaSyntax parser failed — falling back to flisp!
│   exception =
│    MethodError: no method matching head(::Nothing)
│    
│    Closest candidates are:
│      head(::JuliaSyntax.GreenNode)
│       @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/green_tree.jl:69
│      head(::JuliaSyntax.SyntaxToken)
│       @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/parse_stream.jl:127
│      head(::JuliaSyntax.TaggedRange)
│       @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/parse_stream.jl:146
│      ...
│    
│    Stacktrace:
│      [1] kind(x::Nothing)
│        @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/parse_stream.jl:91
│      [2] _incomplete_tag(n::JuliaSyntax.SyntaxNode)
│        @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/hooks.jl:55
│      [3] _core_parser_hook(code::String, filename::String, lineno::Int64, offset::Int64, options::Symbol)
│        @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/hooks.jl:172

`i++` could use a better error message

I'm opening this issue based on a question on discourse.

Consider the following code:

i = 0
for n = 1:5
    i++
end

This will result in (at least on Julia 1.7.0)

ERROR: syntax: unexpected "end"

This error message could benefit from JuliaLang/julia#45791. However, I'm doing a separate issue for this special case because a new user might write i++ not knowing it is not equivalent to i += 1 (or even that ++ is parsed as an infix operator). Would it be possible to have a more specific error message? Something like

ERROR: syntax: attempted to call an infix operator with just one argument - perhaps you meant "i += 1"?

Internal error in parser when failing to parse a Float64

julia> 0x1p
┌ Error: JuliaSyntax parser failed — falling back to flisp!
│   exception =
│    ArgumentError: cannot parse "0x1p" as Float64
│    Stacktrace:
│      [1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
│        @ Base ./parse.jl:373
│      [2] #tryparse_internal#494
│        @ ./parse.jl:369 [inlined]
│      [3] tryparse_internal
│        @ ./parse.jl:366 [inlined]
│      [4] #parse#495
│        @ ./parse.jl:379 [inlined]
│      [5] parse
│        @ ./parse.jl:379 [inlined]
│      [6] julia_string_to_number(str::SubString{String}, kind::JuliaSyntax.Kind)
│        @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/value_parsing.jl:28
│      [7] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
│        @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/syntax_tree.jl:33
│      [8] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
│        @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/syntax_tree.jl:87
│      [9] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::String, kws::Base.Pairs{Symbol, JuliaSyntax.Kind, Tuple{Symbol}, NamedTuple{(:wrap_toplevel_as_kind,), Tuple{JuliaSyntax.Kind}}})
│        @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/syntax_tree.jl:187

I believe this should emit an error instead of throwing inside the parser.

Replac[ing] flisp

Hi,

"Once mature, replace Julia's flisp-based reference frontend in Core" Can it be done already, despite imperfect parsing? I mean (and I could check) make a sysimage without it, and with JuliaSyntax.jl instead. Do you have such a sysimage?

"Differences from the flisp parser" (and your nice JuliaCon video, as I recall) implies that flisp is already fully bypassed, at runtime, i.e. flisp is just (still) sitting there, just unused.

I'm ok with a replacement not capable of parsing all code (just Base, so most code), and that's already good enough for most users.

I've been thinking about making a minimal sysimage (resurrecting JuliaLite), to help with julia startup, e.g. sacrificing LinearAlgebra, to help for (small) Julia (non-math) scripts.

I'm not sure where flisp resides, don't locate a lib for it. Do you know if it's part of libjulia.so.1.8 and/or sys.so? I know it's rather small, so opening it might not be speed-critical, and nor for small script it might not be too much an overhead to actual parse-time.

I'm a bit confused by your 4 sec. parse time claim in your video, is that for a huge script or some pathological? My understanding was that the parser isn't speed-critical (relative to other, e.g. optimization), but every bit helps. And it's relaive importance when you run with -O0 (often useful for scripts) or --compile=min

Macro expansion: Provide information about lexical environment

https://github.com/jolin-io/WhereTraits.jl cites the following limitation with attendant explanation:

Top Level Only
Currently only top-level functions are supported, as the syntax stores and needs information about previous function definitions, which it stores globally. If macros would get informed about whether they are defined within another function, WhereTraits could also support innerfunctions.

I know flisp is involved with macro expansion. Is the above limitation something that can be addressed with this package?

Failures parsing underflow and overflow of Float64 literals

julia> JuliaSyntax.parse(Expr, "1.9824062450251952342e-2660052")
ERROR: ArgumentError: cannot parse "1.9824062450251952342e-2660052" as Float64
Stacktrace:
  [1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
    @ Base ./parse.jl:373
  [2] #tryparse_internal#477
    @ ./parse.jl:369 [inlined]
  [3] tryparse_internal
    @ ./parse.jl:366 [inlined]
  [4] #parse#478
    @ ./parse.jl:379 [inlined]
  [5] parse
    @ ./parse.jl:379 [inlined]
  [6] julia_string_to_number(str::SubString{String}, kind::JuliaSyntax.Kind)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:28
  [7] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:33
  [8] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:89
  [9] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::Nothing, kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:189
 [10] build_tree
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:186 [inlined]
 [11] #build_tree#90
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:208 [inlined]
 [12] build_tree
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:207 [inlined]
 [13] #parse#83
    @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:124 [inlined]
 [14] parse(::Type{Expr}, input::String)
    @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120
 [15] top-level scope
    @ REPL[62]:1

julia> Meta.parse("1.9824062450251952342e-2660052")
0.0

Any need for a contributor?

Hi, I wonder whether this project needs a student contributors. I am an undergraduate student from Peking University (in China). I am mainly interested in the rewriting the lowering pipeline (defined in julia-syntax.scm) of Julia's frontend. Previously I was working on static compilation of Julia (using LLVM JITLink) and I found that it would be helpful to rewrite Julia's frontend (to get rid of some gensym and record necessary information) if a more static approach is needed. But people working on this stuff told me that Julia is unlikely to work towards this direction and will stick to solution like system image + parallel compilation, which renders my though unnecessary. Besides, rewriting the frontend is time-consuming and hard to review.

The developers point me to this project. This project is really great and looks really promising! I would like to devote my free time to this project, mainly the lowering part (it seems that currently you are focusing on the parser part). I am looking forward to hearing your thoughts on it.

some errors when parsing `test/syntax.jl`

I thought that file might be a good stress test of this package. It seems to do quite well. The first case is probably up to interpretation, since although the flisp parser parses it, it's definitely not valid syntax. There does seem to be a bug in parsing empty multidimensional array literals though:

julia> JuliaSyntax.parseall(JuliaSyntax.SyntaxNode, read("test/syntax.jl"); filename=abspath("test/syntax.jl"))
ERROR: Error: Expected identifier:
@test !@isdefined(y)

@test_throws ErrorException eval(:(import .Mod.x as (a.b)))

import .Mod.maybe_undef as mu

Error: unexpected closing token:
@testset "empty nd arrays" begin
    @test :([])    == Expr(:vect)
    @test :([;])   == Expr(:ncat, 1)
    @test :([;;])  == Expr(:ncat, 2)
    @test :([;;;]) == Expr(:ncat, 3)

Error: unexpected closing token:
    @test :([])    == Expr(:vect)
    @test :([;])   == Expr(:ncat, 1)
    @test :([;;])  == Expr(:ncat, 2)
    @test :([;;;]) == Expr(:ncat, 3)


Error: unexpected closing token:
    @test :([;])   == Expr(:ncat, 1)
    @test :([;;])  == Expr(:ncat, 2)
    @test :([;;;]) == Expr(:ncat, 3)

    @test []    == Array{Any}(undef, 0)

Error: unexpected closing token:

    @test []    == Array{Any}(undef, 0)
    @test [;]   == Array{Any}(undef, 0)
    @test [;;]  == Array{Any}(undef, 0, 0)
    @test [;;;] == Array{Any}(undef, 0, 0, 0)

Error: unexpected closing token:
    @test []    == Array{Any}(undef, 0)
    @test [;]   == Array{Any}(undef, 0)
    @test [;;]  == Array{Any}(undef, 0, 0)
    @test [;;;] == Array{Any}(undef, 0, 0, 0)


Error: unexpected closing token:
    @test [;]   == Array{Any}(undef, 0)
    @test [;;]  == Array{Any}(undef, 0, 0)
    @test [;;;] == Array{Any}(undef, 0, 0, 0)

    @test :(T[])    == Expr(:ref, :T)

Error: unexpected closing token:

    @test :(T[])    == Expr(:ref, :T)
    @test :(T[;])   == Expr(:typed_ncat, :T, 1)
    @test :(T[;;])  == Expr(:typed_ncat, :T, 2)
    @test :(T[;;;]) == Expr(:typed_ncat, :T, 3)

Error: unexpected closing token:
    @test :(T[])    == Expr(:ref, :T)
    @test :(T[;])   == Expr(:typed_ncat, :T, 1)
    @test :(T[;;])  == Expr(:typed_ncat, :T, 2)
    @test :(T[;;;]) == Expr(:typed_ncat, :T, 3)


Error: unexpected closing token:
    @test :(T[;])   == Expr(:typed_ncat, :T, 1)
    @test :(T[;;])  == Expr(:typed_ncat, :T, 2)
    @test :(T[;;;]) == Expr(:typed_ncat, :T, 3)

    @test Int[]    == Array{Int}(undef, 0)

Error: unexpected closing token:

    @test Int[]    == Array{Int}(undef, 0)
    @test Int[;]   == Array{Int}(undef, 0)
    @test Int[;;]  == Array{Int}(undef, 0, 0)
    @test Int[;;;] == Array{Int}(undef, 0, 0, 0)

Error: unexpected closing token:
    @test Int[]    == Array{Int}(undef, 0)
    @test Int[;]   == Array{Int}(undef, 0)
    @test Int[;;]  == Array{Int}(undef, 0, 0)
    @test Int[;;;] == Array{Int}(undef, 0, 0, 0)


Error: unexpected closing token:
    @test Int[;]   == Array{Int}(undef, 0)
    @test Int[;;]  == Array{Int}(undef, 0, 0)
    @test Int[;;;] == Array{Int}(undef, 0, 0, 0)

    @test :([  ]) == Expr(:vect)

Error: unexpected closing token:
    @test :([
            ]) == Expr(:vect)
    @test :([ ;; ]) == Expr(:ncat, 2)
    @test :([
             ;;

Error: unexpected closing token:
    @test :([ ;; ]) == Expr(:ncat, 2)
    @test :([
             ;;
            ]) == Expr(:ncat, 2)

Once those are fixed, I think that would be a good test case for CI.