julialang / juliasyntax.jl Goto Github PK
View Code? Open in Web Editor NEWThe Julia compiler frontend
License: Other
The Julia compiler frontend
License: Other
julia> [1
,2]
ERROR: ParseError:
Error: Expected `]`
@ REPL[5]:2:1
[1
,2]
julia> rand(2,2)[1
,2]
ERROR: ParseError:
Error: Expected `]`
@ REPL[6]:2:1
rand(2,2)[1
,2]
julia> (1,
2)
(1, 2)
julia> -0x1.428a2f98d728bp+341
-5.643803094122362e102
julia> JuliaSyntax.parse(Expr, "-0x1.428a2f98d728bp+341")
ERROR: ArgumentError: cannot parse "-0x1.428a2e98d728bp+341" as Float32
Stacktrace:
[1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
@ Base ./parse.jl:373
[2] #tryparse_internal#477
@ ./parse.jl:369 [inlined]
[3] tryparse_internal
@ ./parse.jl:366 [inlined]
[4] #parse#478
@ ./parse.jl:379 [inlined]
[5] parse
@ ./parse.jl:379 [inlined]
[6] julia_string_to_number(str::SubString{String}, kind::JuliaSyntax.Kind)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:26
[7] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:33
[8] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:89
[9] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::Nothing, kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:189
[10] build_tree
@ ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:186 [inlined]
[11] #build_tree#90
@ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:208 [inlined]
[12] build_tree
@ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:207 [inlined]
[13] #parse#83
@ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:124 [inlined]
[14] parse(::Type{Expr}, input::String)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120
[15] top-level scope
@ REPL[39]:1
I think we're being to eager in interpreting the f
as meaning Float32 here.
julia> a,,b
ERROR: ParseError:
Error: unexpected closing token
I don't see this as a closing token. The standard parser says
julia> a,,b
ERROR: syntax: unexpected ","
which seems to make more sense.
From #31 (comment)
.<:
and .>:
parse but they shouldn't.Expr
conversion is weird.julia> JuliaSyntax.parseall(Expr, ".<: b", rule=:statement)
:(<:b)
julia> Meta.parse(".<: b")
ERROR: Base.Meta.ParseError("\".<:\" is not a unary operator")
Stacktrace:
...
julia> JuliaSyntax.parseall(Expr, ".>: b", rule=:statement)
:(>:b)
julia> Meta.parse(".>: b")
ERROR: Base.Meta.ParseError("\".>:\" is not a unary operator")
...
# on https://github.com/JuliaLang/julia/pull/46372
julia> Base.parse_input_line("code_typed((Float64,)) do x")
:($(Expr(:error, JuliaSyntax.ParseError(JuliaSyntax.SourceFile("code_typed((Float64,)) do x", "none", [1, 28]), JuliaSyntax.Diagnostic[JuliaSyntax.Diagnostic(28, 27, :error, "premature end of input"), JuliaSyntax.Diagnostic(28, 27, :error, "Expected `end`")]))))
# on master
julia> Base.parse_input_line("code_typed((Float64,)) do x")
:($(Expr(:incomplete, "incomplete: premature end of input")))
This difference disallows REPL to accept multi-line inputs on JuliaLang/julia#46372, e.g.
julia> code_typed((Float64,)) do x<<<RET>>>
ERROR: ParseError:
Error: premature end of input
@ REPL[16]:2:1
code_typed((Float64,)) do x
Error: Expected `end`
@ REPL[16]:2:1
code_typed((Float64,)) do x
With #77, #80, and #81, the only remaining parsing errors are escape sequence errors like #67.
Imho unescape_julia_string
should never throw an exception. Instead, we should already check escape sequence validity when parsing (so that we can emit the correct diagnostics) and then emit ErrorVal
s during SyntaxNode construction.
Consider
julia> JuliaSyntax.parse(JuliaSyntax.GreenNode, "if true; x ? true : elseif true end")[1]
1:35 │[toplevel]
1:35 │ [if]
1:2 │ if
3:3 │ Whitespace
4:7 │ true ✔
8:26 │ [block]
8:8 │ ;
9:26 │ [if]
9:9 │ Whitespace
10:10 │ Identifier ✔
11:11 │ Whitespace
12:12 │ ?
13:13 │ Whitespace
14:17 │ true ✔
18:18 │ Whitespace
19:19 │ :
20:20 │ Whitespace
21:26 │ [error] ✘
21:26 │ elseif ✔
27:27 │ Whitespace
28:31 │ [error] ✘
28:31 │ true ✔
32:32 │ Whitespace
33:35 │ end
This special case is fixed by #77 by punting the elseif
into the containing block instead:
julia> JuliaSyntax.parse(JuliaSyntax.GreenNode, "if true; x ? true : elseif true end")[1]
1:35 │[toplevel]
1:35 │ [if]
1:2 │ if
3:3 │ Whitespace
4:7 │ true ✔
8:19 │ [block]
8:8 │ ;
9:19 │ [if]
9:9 │ Whitespace
10:10 │ Identifier ✔
11:11 │ Whitespace
12:12 │ ?
13:13 │ Whitespace
14:17 │ true ✔
18:18 │ Whitespace
19:19 │ :
20:19 │ error ✘
20:20 │ Whitespace
21:32 │ [elseif]
21:26 │ elseif
27:27 │ Whitespace
28:31 │ true ✔
32:32 │ [block]
32:32 │ Whitespace
33:35 │ end
but of course that naive solution only works if there is only one missing or extraneous token, so "if true; x ? true : foo ))))) elseif true end"
will break it again.
Generally, this should be solvable by an arbitrarily long look-ahead for continuation keywords, but I really don't like that solution (and it might not even work in all cases).
When I get a syntax error, the offending characters are highlighted:
In fact, they're... "lowlighted", so that the offending character is dimmer than other code (I can barely see the 8
there), but it's still (de)emphasized.
When I copy this error message and paste it as plain text, all formatting is lost, which is expected:
julia> (2+5+8
ERROR: ParseError:
Error: Expected `)` but got unexpected tokens
@ REPL[17]:2:1
(2+5+8
However, nothing visually points at the location of the error anymore: the 8
is no different from other code. Sure, the location is indicated by REPL[17]:2:1
(Does it mean "second line, first character"? Looks more like a "1st line, 6th character" to me...), but the visual is lost. Implementations of other programming languages rely on actual characters (not formatting) to indicate the position of the error.
Python draws a "pointer" to the error location, so a plain-text error message still tells me where the error is:
>>> (2,4,;)
File "<stdin>", line 1
(2,4,;)
^
SyntaxError: invalid syntax
Clang does this too:
$ cat syntax_error.c
int main() {
0
}
$ clang syntax_error.c
syntax_error.c:2:6: error: expected ';' after expression
0
^
;
syntax_error.c:2:5: warning: expression result unused [-Wunused-value]
0
^
1 warning and 1 error generated.
Rust's error messages draw around the offending code all the time, which is extremely helpful.
It would be nice to have error messages one could copy & paste without loss of information, especially visual indication of where the error is.
If I enter
julia> a b<enter>
it waits for more input. This should be an immediate "extra token" error.
julia> "a$2"
ERROR: ParseError:
Error: identifier or parenthesized expression expected after $ in string
@ REPL[14]:1:4
"a$2"
julia>
The error is correct but there is an extra blank line before the next prompt.
In #31 (comment), @pfitzseb said
Btw, we really should think about upstreaming the Tokenize changes in this repo... Pretty sure the opsuffix changes for
&&
/||
are implemented there.
We've chatted about this in various places and I mention it in the README. I'd like to resolve the double maintenance problem in some way, for sure :-)
But having modified Tokenize fairly extensively, I'm unsure whether the lexer should be versioned separately from the parser. Currently I see the lexer as serving the needs of parsing rather than something which is independent. Particularly because
outer
. It's possible to add state to the lexer itself, but that's annoyingly redundant. And the redundancy of state becomes much worse when you consider recovery from malformed string interpolations.ParseStream
. It's fairly lightweight, no need to opt into Expr
(or other) tree building!So with those in mind, I feel like we could just recommend people use the full parser for purposes we previously used Tokenize.jl for? And that more tightly integrating the tokenizer source into JuliaSyntax might be best.
(Somewhat of a side note — I've also wondered whether we could do an Automa.jl - based lexer if we wanted to delve more deeply into performance optimization. I suspect a generated lexer would be a lot faster if unicode decoding were folded into the state machine.)
For now, I'm content to port fixes back and forth as required.
What do people think? @pfitzseb @KristofferC ?
I get
julia> 'abc'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
Should be
julia> 'abc'
ERROR: syntax: character literal contains multiple characters
Bunch of nasty edge cases related to symbols followed by primes:
julia> Meta.parse(":+'")
:(:+')
julia> JuliaSyntax.parseall(Expr, ":+'")
ERROR: ParseError:
Error: extra tokens after end of expression
@ line 1:3
:+'
julia> Meta.parse(":+'l'")
:(:+' * l')
julia> JuliaSyntax.parseall(Expr, ":+'l'")
ERROR: ParseError:
Error: extra tokens after end of expression
@ line 1:3
:+'l'
julia> Meta.parse(":?'")
:(:?')
julia> JuliaSyntax.parseall(Expr, ":?'")
ERROR: ParseError:
Error: extra tokens after end of expression
@ line 1:3
:?'
I've been thinking about what we'd need for a diagnostics system which can really solve a couple of core problem I'm worrying about:
Accessibility: end users should be easily able to contribute new helpful and friendly diagnostics without understanding the code of the compiler frontend. Friendly comprehensible errors are most helpful to beginners, and beginners should be able to help writing these. But beginners will rarely be able to dive into JuliaSyntax.jl and make changes.
Cleanliness and separation of concerns: If possible I don't want to clutter the parser itself with large amounts of heuristic code and error/warning message formatting.
With these in mind, I want to claim that:
For a parser system where a syntax tree is always produced, compiler diagnostics (warnings, errors) are not really different from linter messages based on symbolic pattern matching
Therefore, we should be inspired linters like semgrep in using pattern matching techniques to match warnings and errors against the (partially broken) AST that the compiler produces. Ideally, errors and warnings could be expressed declaratively as a piece of malformed Julia code with placeholders which capture parts of that code and an error message template.
Discuss :-)
Running the tests from the system image, I found some broken tests due to operators which should be errors, but the JuliaSyntax lexer tokenizes them as operators:
julia> broken_ops = [
"a .-> b",
"a .>: b",
"a .<: b",
"a ||₁ b",
"a ||̄ b",
"a .||₁ b",
"a &&₁ b",
"a &&̄ b",
"a .&&₁ b",
]
9-element Vector{String}:
"a .-> b"
"a .>: b"
"a .<: b"
"a ||₁ b"
"a ||̄ b"
"a .||₁ b"
"a &&₁ b"
"a &&̄ b"
"a .&&₁ b"
julia> [[JuliaSyntax.Tokenize.untokenize(t, s) for t in JuliaSyntax.Tokenize.tokenize(s)] for s in broken_ops]
9-element Vector{Vector{String}}:
["a", " ", ".->", " ", "b", ""]
["a", " ", ".>:", " ", "b", ""]
["a", " ", ".<:", " ", "b", ""]
["a", " ", "||₁", " ", "b", ""]
["a", " ", "||̄", " ", "b", ""]
["a", " ", ".||₁", " ", "b", ""]
["a", " ", "&&₁", " ", "b", ""]
["a", " ", "&&̄", " ", "b", ""]
["a", " ", ".&&₁", " ", "b", ""]
julia> JuliaSyntax.parse(Expr, """
Any[foo(i)
for i in x if begin
true
end
]
""")
(:($(Expr(:toplevel, :(#= line 1 =#), :($(Expr(:typed_comprehension, :Any, :((foo(i) for i = x if begin)), :($(Expr(:error, true, :($(Expr(:end)))))))))))), JuliaSyntax.Diagnostic[JuliaSyntax.Diagnostic(45, 56, :error, "Expected `]`")], 60)
julia> Meta.parse("""
Any[foo(i)
for i in x if begin
true
end
]
""")
:(Any[foo(i) for i = x if begin
#= none:3 =#
true
end])
julia> Meta.parse("""
Any[foo(i) for i in x if begin
true
end
]
""")
ERROR: Base.Meta.ParseError("expected \"]\"")
Note that Meta.parse
is sensitive to the newline before for
, so it's possible we should treat this as a bug in the reference parser.
Original code can be found here.
As exposed by #113, we had some unnecessary regressions due to the Expr
conversion code being under-tested.
We need to make sure each branch in src/expr.jl is covered with the tests in test/expr.jl
[update] Some tests are now hosted in test/parser.jl, but it'd be better to move them into test/expr.jl and decouple them from the other tests, I think.
Julia's parser seems to accept this:
julia> JuliaSyntax.parseall(JuliaSyntax.SyntaxNode, "\ufeffusing Test")
ERROR: ParseError:
Error: invalid syntax atom
@ line 1:1
using Test
Error: extra tokens after end of expression
@ line 1:2
using Test
There might be an argument to be made to just disallow this in Julia base.
On Julia master,
[ []
, [] ]
parses correctly as a Vector{Vector{Any}}
, but with JuliaSyntax, we get the ParseError
Error: Expected `]`
@ REPL[11]:2:3
[ []
,[] ]
This is a fairly major bug since this prevents loading CpuId which depends on this syntax.
This shouldn't happen:
julia> JuliaSyntax.parseall(JuliaSyntax.SyntaxNode, "function ()(x) 23 end")
ERROR: Internal error: Can't peek behind at start of stream
Stacktrace:
[1] error(::String, ::String)
@ Base ./error.jl:42
[2] internal_error(strs::String)
@ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:220
[3] peek_behind(stream::JuliaSyntax.ParseStream; skip_trivia::Bool)
@ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:521
[4] peek_behind
@ ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:503 [inlined]
[5] #peek_behind#54
@ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
[6] peek_behind
@ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
[7] parse_function(ps::JuliaSyntax.ParseState)
@ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:2032
[8] parse_resword(ps::JuliaSyntax.ParseState)
@ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:1744
Spotted by @BenChung
I was thinking a bit about the "right" interface to trivia.
The rust-analyzer people are discussing it over at rust-lang/rust-analyzer#6584 so they've got some good background reading there. It seems generally awkward with no obviously right answer.
IIUC there's two common interfaces:
The rust-analyzer model is appealing because it leads to simpler data structures with less internal structure. Also it's more general because the trivia might be naturally interspersed with nontrivia children but without a natural attachment to any of the children. But we could go for either approach, or something else entirely.
A useful observation: we can't attaching whitespace so that
In general for a refactoring pass, I guess whitespace will become inconsistent during refactoring and will need to be regenerated. This is obviously true for moving blocks but it's even true for refactoring as simple as renaming identifiers. For example, renaming elements of expressions which span multiple lines:
func(arg1, arg2, ...
argN, argN1)
^^^^
# problematic whitespace if length of func symbol changes
So I'm kind of convinced that there's no natural representation of whitespace within the green tree, so we may as well do whatever is efficient and simple to implement.
Consider a simple thing like (b + c) + (b + c)^2
and a pass which identifies common subexpressions to get
x = b + c
(x) + (x)^2
Here we can and should remove the parentheses (which are trivia after parsing, due to being used for grouping only). What do we even do here? Like whitespace, it seems refactorings will regularly break this kind of trivia and require that it's regenerated from a model of the precedence rules.
What about comments? This is much more relevant and I think we should aim for "comments are likely to survive symbolic refactoring and remain attached in the right places".
It seems likely there's cases where one or other model wins here, depending on the situation. Some prototyping with simple example refactoring passes might be necessary to get a feel for the pros and cons.
One big benefit we have in the ParseStream
interface is that trivia is mostly invisible to the parser. So in theory we can adjust trivia attachment heuristics (within whichever model is chosen) independently of the parser code. Julia is sensitive to whitespace and newlines in selected situations, but after parsing is done this information is no longer needed and it may be consistent to split and recombine trivia however we like by floating the boundaries of nodes across the trivia tokens.
I noticed that the errors are different:
julia> Meta.parse("[(1,2]")
ERROR: Base.Meta.ParseError("unexpected \"]\" in argument list")
Stacktrace:
[1] #parse#3
@ ./meta.jl:237 [inlined]
[2] parse(str::String; raise::Bool, depwarn::Bool)
@ Base.Meta ./meta.jl:268
[3] parse(str::String)
@ Base.Meta ./meta.jl:268
[4] top-level scope
@ REPL[1]:1
julia> using JuliaSyntax
julia> JuliaSyntax.enable_in_core!()
julia> Meta.parse("[(1,2]")
ERROR: MethodError: Cannot `convert` an object of type JuliaSyntax.ParseError to an object of type String
Closest candidates are:
convert(::Type{String}, ::String) at /Applications/Julia-1.7 x86.app/Contents/Resources/julia/share/julia/base/essentials.jl:223
convert(::Type{T}, ::T) where T<:AbstractString at /Applications/Julia-1.7 x86.app/Contents/Resources/julia/share/julia/base/strings/basic.jl:231
convert(::Type{T}, ::AbstractString) where T<:AbstractString at /Applications/Julia-1.7 x86.app/Contents/Resources/julia/share/julia/base/strings/basic.jl:232
...
Stacktrace:
[1] Base.Meta.ParseError(msg::JuliaSyntax.ParseError)
@ Base.Meta ./meta.jl:190
[2] #parse#3
@ ./meta.jl:237 [inlined]
[3] parse(str::String; raise::Bool, depwarn::Bool)
@ Base.Meta ./meta.jl:268
[4] parse(str::String)
@ Base.Meta ./meta.jl:268
[5] top-level scope
@ none:1
Not sure if this matters or not, just thought i would report it
Spotted by @BenChung
julia> itest_parse(JuliaSyntax.parse_expr, "try 3 catch e+3 end")
# Code:
try 3 catch e+3 end
# Green tree:
1:19 │[try]
1:3 │ try "try"
4:5 │ [block]
4:4 │ Whitespace " "
5:5 │ Integer ✔ "3"
6:6 │ Whitespace " "
7:11 │ catch "catch"
12:12 │ Whitespace " "
13:13 │ Identifier ✔ "e"
14:15 │ [block]
14:15 │ Integer ✔ "+3"
16:16 │ Whitespace " "
17:16 │ false ✔ ""
17:16 │ false ✔ ""
17:19 │ end "end"
Whereas the reference parser provides the error invalid syntax "catch (e + 3)"
The handling of .
in the tokenizer / parser is pretty wonky / inconsistent because the tokenization of .
is context-dependent.
bump_split()
, in particular is quite ugly and shouldn't exist:
...
is tokenized into K"..."
which is usually correct. But incorrect for import
/using
statements as in import ...A ==> (import (. . . . A))
, necessitating the splitting of tokens with bump_split
- this is ugly!.+ ==> (. +)
In addition, Expr
is quite inconsistent about dotted infix calls vs dotted prefix calls. We should really fix this? Then we can remove the .
from the operator names and treat it as separate syntax as it should be! (See also #88)
julia> dump(Meta.parse("a .+ b"))
Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol .+
2: Symbol a
3: Symbol b
julia> dump(Meta.parse("f.(a, b)"))
Expr
head: Symbol .
args: Array{Any}((2,))
1: Symbol f
2: Expr
head: Symbol tuple
args: Array{Any}((2,))
1: Symbol a
2: Symbol b
julia> for i in 1:3, j in i:3, k in j:3
ERROR: ParseError:
Error: premature end of input
@ REPL[122]:2:1
for i in 1:3, j in i:3, k in j:3
Error: Expected `end` but got unexpected tokens
@ REPL[122]:2:1
for i in 1:3, j in i:3, k in j:3
Getting this error in the REPL
Julia v1.8.0
JuliaSyntax.jl v0.1.0
Over at JuliaLang/julia#46364 @eschnett observed
The array expression
[begin 1 end]
creates a 1-element array. Thebegin
...end
block is not really necessary here, but can be convenient if the expression is much more complicated, e.g. a comprehension.The typed array expression
Int[begin 1 end]
leads to the parsing errorERROR: syntax: unexpected "end"
.
This being due to the ambiguity of whether begin
or end
in the first slot inside a[]
should be treated as block keywords or as the first/last indices of the array:
julia> dump(:(a[end 1]))
Expr
head: Symbol typed_hcat
args: Array{Any}((3,))
1: Symbol a
2: Symbol end
3: Int64 1
julia> dump(:(a[1 end]))
ERROR: syntax: unexpected "end"
Stacktrace:
[1] top-level scope
@ none:1
Over in that issue, it was suggested that the parser could explain the issue and suggest using let
instead of begin
, which seems like a good option.
See also: #81
julia> Meta.parse(raw""" "\777" """)
"\xff"
julia> JuliaSyntax.parse(Expr, raw""" "\777" """)
ERROR: ArgumentError: octal escape sequence out of range
Stacktrace:
[1] unescape_julia_string(io::IOBuffer, str::SubString{String})
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:172
[2] unescape_julia_string(str::SubString{String}, is_cmd::Bool, is_raw::Bool)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:203
[3] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:53
[4] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:89
[5] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::Nothing, kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:189
[6] build_tree
@ ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:186 [inlined]
[7] #build_tree#90
@ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:208 [inlined]
[8] build_tree
@ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:207 [inlined]
[9] #parse#83
@ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:124 [inlined]
[10] parse(::Type{Expr}, input::String)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120
[11] top-level scope
@ REPL[81]:1
Leaving a trailing comma after the last element of using
is a mistake I've made a few times. The parser only catches this error later, resulting in a fairly poor error message. For example, it can gobble up the function
keyword in the next part of the file:
julia> using A: b,
function foo()
end
ERROR: ParseError:
Error: extra tokens after end of expression
@ REPL[4]:3:9
using A: b,
function foo()
--------^^^^^^---
end
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
master:
julia> dump(:(function (f(::T) where {T}) end))
Expr
head: Symbol function
args: Array{Any}((2,))
1: Expr
head: Symbol where
args: Array{Any}((2,))
1: Expr
head: Symbol call
args: Array{Any}((2,))
1: Symbol f
2: Expr
head: Symbol ::
args: Array{Any}((1,))
1: Symbol T
2: Symbol T
2: Expr
head: Symbol block
args: Array{Any}((2,))
1: LineNumberNode
line: Int64 1
file: Symbol REPL[11]
2: LineNumberNode
line: Int64 1
file: Symbol REPL[11]
JuliaSyntax
julia> dump(:(function (f(::T) where {T}) end))
Expr
head: Symbol function
args: Array{Any}((2,))
1: Expr
head: Symbol tuple
args: Array{Any}((1,))
1: Expr
head: Symbol where
args: Array{Any}((2,))
1: Expr
head: Symbol call
args: Array{Any}((2,))
1: Symbol f
2: Expr
2: Symbol T
2: Expr
head: Symbol block
args: Array{Any}((2,))
1: LineNumberNode
line: Int64 1
file: Symbol REPL[21]
2: LineNumberNode
line: Int64 1
file: Symbol REPL[21]
When I want to type
map(1:10) do x
2x
end
after I type map(1:10) do x
and hit return, I got
julia> map(1:10) do
ERROR: ParseError:
Error: premature end of input
@ REPL[11]:2:1
map(1:10) do
Error: premature end of input
@ REPL[11]:2:1
map(1:10) do
Error: Expected `end` but got unexpected tokens
@ REPL[11]:2:1
map(1:10) do
Or if I want to type
w = u +
v +
1
I got
julia> w = u +
ERROR: ParseError:
Error: premature end of input
@ REPL[14]:2:1
w = u +
But I can do them with copy-paste:
Can we make the REPL-mode tolerate multiline code a little bit more?
With JuliaSyntax, we've got our own green tree (GreenNode
) and AST (SyntaxNode
) which often differ from Expr
, due to the requirement that children are strictly in source order. Some current differences are described in https://github.com/JuliaLang/JuliaSyntax.jl#tree-differences-between-greennode-and-expr. Given that we've been forced to diverge we might as well make the most of this and reconsider some aspects of Expr
for two reasons:
:if
for Expr
users)List of possible changes
Expr
to hold LineNumberNode
. Avoid this!
let
argument lists (#126)elseif
conditional has a block for the line number->
has a block only for the body line numberK"macrocall"
- allow users to easily distinguish macrocalls with parentheses from those without them (#218)K"parens"
)global const
is normalized to const global
in the parser. Should be done in Expr
conversion (#130)do
seems partially lowered. Should it be flatter like f(x) do y ; body end
being (do (call f) (tuple y) (block body))
? (#98)@.
to @__dot__
later, not inside parserCore.@doc
later, not inside parser (#217)a .+ b
is inconsistent with prefix calls like f.(a, b)
(see #90).*(x,y) ==> (call .* x y)
vs (.*)(x,y) ==> (call (. *) x y)
ie, there's a dotted symbol vs Expr(:.)
in these cases (see discussion in #90)=
vs kw
(#99)call
head rather than as a syntactic operator for consistency with suffixed versions like x'ᵀ
(#124 )(a=1, b=2; c=3; d=4)
are a weird nested structure. Let's flatten this and bring the multiple sets of `K"parameters" into the parent tuple. (#133)try body1 catch exc body2 else body3 finally body4 end
to (try body1 (catch exc body2) (else body3) (finally body4))
(#234)import A.b.c
are different from normal A.b.c
. Use a different head for these.mutable
is present in a struct
and whether a module is actually baremodule
. These seem more like part of the expression head - would they be better as flags?:toplevel
expressions occur both at file scope and as ;
-delimited expressions at file scope. This seems kind of weird?global x,y
vs global x
vs global (x,y) = (1,2)
vs global (x,y)
? (In particular, the variables might or might not be enclosed in a tuple.)Core.@cmd
later, not inside parsermacrocall
later, not inside parserK"infix_call"
? (considered and decided against in #99 and #124)julia> JuliaSyntax.parse(Expr, ".-.1")
ERROR: ArgumentError: cannot parse ".-.1" as Float64
julia> JuliaSyntax.parse(Expr, ".-0.1")
ERROR: ArgumentError: cannot parse ".-0.1" as Float64
julia> JuliaSyntax.parse(Expr, ".-1")
ERROR: ArgumentError: invalid BigInt: ".-1"
julia> Meta.parse(".-.1")
:((.-)(0.1))
julia> 2**3
ERROR: ParseError:
Error: invalid syntax atom
@ REPL[3]:1:2
2**3
Error: extra tokens after end of expression
@ REPL[3]:1:4
2**3
instead of
julia> 2**3
ERROR: syntax: use "x^y" instead of "x**y" for exponentiation, and "x..." instead of "**x" for splatting.
julia> JuliaSyntax.parseall(JuliaSyntax.GreenNode, "function")
ERROR: Internal error: Can't peek behind at start of stream
Stacktrace:
[1] error(::String, ::String)
@ Base ./error.jl:42
[2] internal_error(strs::String)
@ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:220
[3] peek_behind(stream::JuliaSyntax.ParseStream; skip_trivia::Bool)
@ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:521
[4] peek_behind
@ ~/.julia/dev/JuliaSyntax/src/parse_stream.jl:503 [inlined]
[5] #peek_behind#54
@ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
[6] peek_behind
@ ~/.julia/dev/JuliaSyntax/src/parser.jl:80 [inlined]
[7] parse_function(ps::JuliaSyntax.ParseState)
@ JuliaSyntax ~/.julia/dev/JuliaSyntax/src/parser.jl:2032
On Julia master:
julia> dump(:(@a[1]))
Expr
head: Symbol macrocall
args: Array{Any}((3,))
1: Symbol @a
2: LineNumberNode
line: Int64 1
file: Symbol REPL[3]
3: Expr
head: Symbol vect
args: Array{Any}((1,))
1: Int64 1
with JuliaSyntax
julia> dump(:(@a[1]))
Expr
head: Symbol macrocall
args: Array{Any}((2,))
1: Expr
head: Symbol ref
args: Array{Any}((2,))
1: Symbol @a
2: Int64 1
2: LineNumberNode
line: Int64 1
file: Symbol REPL[26]
There's various cases where =
is parsed into a kw
head, but this is inconsistent, especially when named tuples come into play. This requires various gymnastics in the parser as discussed in https://github.com/JuliaLang/JuliaSyntax.jl#kw-and--inconsistencies
For the named tuple inconsistency, consider the difference between the following:
julia> dump(Meta.parse("(a=1, b=2)"))
Expr
head: Symbol tuple
args: Array{Any}((2,))
1: Expr
head: Symbol =
args: Array{Any}((2,))
1: Symbol a
2: Int64 1
2: Expr
head: Symbol =
args: Array{Any}((2,))
1: Symbol b
2: Int64 2
julia> dump(Meta.parse("f(a=1, b=2)"))
Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol f
2: Expr
head: Symbol kw
args: Array{Any}((2,))
1: Symbol a
2: Int64 1
3: Expr
head: Symbol kw
args: Array{Any}((2,))
1: Symbol b
2: Int64 2
Lowering seems to paper over this difference, but it's not very satisfying.
The difference is also confusing when implementing macros as one cannot interpolate expressions like :(a=1)
into a call and expect them to become keywords. For example:
julia> :(f($(:(a=1))))
:(f($(Expr(:(=), :a, 1))))
julia> :(f($(Expr(:kw,:x,1))))
:(f(x = 1))
Is there a way to resolve this inconsistency? For example, can we remove the kw
head entirely and just use =
? Or can we emit :kw
in the named tuple case?
One objection to removing kw
and always using =
is that interpolating things like :(a=1)
into a call is that this would change the meaning of the =
from an assignment into a keyword argument. However we already have this problem with named tuples. Alternatively, can we always parse named tuples with kw
as the expression head?
E.g. when parsing https://github.com/OpenMendel/MendelImpute.jl/blob/v1.2.3/test/run.jl. That file's obviously invalid, but we should presumably handle this case a bit more nicely. Meta.parse
doesn't error here.
┌ Error: parsing failed for /home/pfitzseb/juliasyntaxtest/pkgs/MendelImpute_1.2.3/test/run.jl
│ ex =
│ Numeric flags unable to hold large integer -9223372036854775808
│ Stacktrace:
│ [1] error(s::String)
│ @ Base ./error.jl:35
│ [2] set_numeric_flags
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parse_stream.jl:34 [inlined]
│ [3] parse_array(ps::JuliaSyntax.ParseState, mark::JuliaSyntax.ParseStreamPosition, closer::JuliaSyntax.Kind, end_is_symbol::Bool)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:2634
│ [4] parse_cat(ps::JuliaSyntax.ParseState, closer::JuliaSyntax.Kind, end_is_symbol::Bool)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:2814
│ [5] parse_call_chain(ps::JuliaSyntax.ParseState, mark::JuliaSyntax.ParseStreamPosition, is_macrocall::Bool)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1469
│ [6] parse_call_chain
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1383 [inlined]
│ [7] parse_call(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1311
│ [8] parse_factor(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1258
│ [9] parse_unary(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1101
│ [10] parse_juxtapose(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1058
│ [11] parse_where(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_juxtapose))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:1013
│ [12] parse_unary_subtype(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:974
│ [13] parse_LtoR(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_unary_subtype), is_op::typeof(JuliaSyntax.is_prec_bitshift))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:347
│ [14] parse_shift(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:943
│ [15] parse_LtoR(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_shift), is_op::typeof(JuliaSyntax.is_prec_rational))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:347
│ [16] parse_rational(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:938
│ [17] parse_with_chains(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_rational), is_op::typeof(JuliaSyntax.is_prec_times), chain_ops::Tuple{JuliaSyntax.Kind})
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:893
│ [18] parse_term(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:885
│ [19] parse_with_chains(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_term), is_op::typeof(JuliaSyntax.is_prec_plus), chain_ops::Tuple{JuliaSyntax.Kind, JuliaSyntax.Kind})
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:893
│ [20] parse_expr(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:878
│ [21] parse_range(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:794
│ [22] parse_LtoR(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_range), is_op::typeof(JuliaSyntax.is_prec_pipe_gt))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:347
│ [23] parse_pipe_gt(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:781
│ [24] parse_RtoL(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_pipe_gt), is_op::typeof(JuliaSyntax.is_prec_pipe_lt), self::typeof(JuliaSyntax.parse_pipe_lt))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:361
│ [25] parse_pipe_lt(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:775
│ [26] parse_comparison(ps::JuliaSyntax.ParseState, subtype_comparison::Bool)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:749
│ [27] parse_comparison(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:733
│ [28] parse_lazy_cond(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_comparison), is_op::typeof(JuliaSyntax.is_prec_lazy_and), self::typeof(JuliaSyntax.parse_and))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:698
│ [29] parse_and(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:726
│ [30] parse_lazy_cond(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_and), is_op::typeof(JuliaSyntax.is_prec_lazy_or), self::typeof(JuliaSyntax.parse_or))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:698
│ [31] parse_or(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:717
│ [32] parse_arrow(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:674
│ [33] parse_cond(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:629
│ [34] parse_RtoL(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_cond), is_op::typeof(JuliaSyntax.is_prec_pair), self::typeof(JuliaSyntax.parse_pair))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:361
│ [35] parse_pair(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:620
│ [36] parse_comma(ps::JuliaSyntax.ParseState, do_emit::Bool)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:598
│ [37] parse_comma
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:596 [inlined]
│ [38] parse_assignment
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:558 [inlined]
│ [39] parse_eq
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:531 [inlined]
│ [40] parse_docstring(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_eq))
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:489
│ [41] parse_docstring
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:487 [inlined]
│ [42] parse_Nary(ps::JuliaSyntax.ParseState, down::typeof(JuliaSyntax.parse_docstring), delimiters::Tuple{JuliaSyntax.Kind}, closing_tokens::Tuple{JuliaSyntax.Kind})
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:391
│ [43] parse_stmts(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:465
│ [44] parse_toplevel(ps::JuliaSyntax.ParseState)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser.jl:429
│ [45] parse(stream::JuliaSyntax.ParseStream; rule::Symbol)
│ @ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:98
│ [46] #parse#83
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:123 [inlined]
│ [47] parse
│ @ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120 [inlined]
│ [48] (::var"#21#23")()
│ @ Main ~/juliasyntaxtest/run.jl:40
│ [49] with_logstate(f::Function, logstate::Any)
│ @ Base.CoreLogging ./logging.jl:511
│ [50] with_logger(f::Function, logger::ConsoleLogger)
│ @ Base.CoreLogging ./logging.jl:623
│ [51] top-level scope
│ @ ~/juliasyntaxtest/run.jl:12
│ [52] include(fname::String)
│ @ Base.MainInclude ./client.jl:476
│ [53] top-level scope
│ @ REPL[2]:1
│ [54] eval
│ @ ./boot.jl:368 [inlined]
│ [55] eval_user_input(ast::Any, backend::REPL.REPLBackend)
│ @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
│ [56] repl_backend_loop(backend::REPL.REPLBackend)
│ @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
│ [57] start_repl_backend(backend::REPL.REPLBackend, consumer::Any)
│ @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
│ [58] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool)
│ @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
│ [59] run_repl(repl::REPL.AbstractREPL, consumer::Any)
│ @ REPL ~/julia-1.8.0/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
│ [60] (::Base.var"#966#968"{Bool, Bool, Bool})(REPL::Module)
│ @ Base ./client.jl:419
│ [61] #invokelatest#2
│ @ ./essentials.jl:729 [inlined]
│ [62] invokelatest
│ @ ./essentials.jl:726 [inlined]
│ [63] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
│ @ Base ./client.jl:404
│ [64] exec_options(opts::Base.JLOptions)
│ @ Base ./client.jl:318
└ @ Main ~/juliasyntaxtest/run.jl:45
@davidanthoff asked on Zulip about incremental reparsing.
is there support for partial reparses, i.e. some sort of incremental parsing? Basic idea is that user presses one key in the editor, and we don't want to reparse the whole document on every key press, but only a subset, based on the precise range of the doc that was edited
To capture my thoughts on this somewhere more permanent, I think this should work fine but there's a couple of tricky things to work out:
First, how are the changed bytes supplied to the parser system? I haven't looked into LanguageServer yet. But presumably it's "insert this byte here" or "change line 10 to 'such-and-such' string". Those might require a representation of the source which isn't a String
(or Vector{UInt8}
buffer). It might be a rope data structure or something? Should we extend the SourceFile
abstraction to allow different AbstractString
types? Or perhaps this state should be managed outside the parser completely? Internally, I feel the lexer and parser should always operate on Vector{UInt8}
as a concrete efficient datastructure for UTF-8 encoded text, so the subrange of text which is being parsed should probably be copied into one of these for use by the tokenizer.
Second, the new source text intersects with the existing parse tree node(s) which cover some range of bytes. There can be several such nodes nested together; which one do we choose? Equivalently, which production (JuliaSyntax.parse_*
function) do we start reparsing from? Starting deeper in the tree is good because it implies a smaller span, but the parser may have nontrivial state which isn't explicit in the parse tree. For example, space sensitive parsing within []
or macro calls. Or the special parsing of in
as =
within iterator specification of a for loop. So we'd need a list of rules to specify which productions we can restart parsing from, and correctly reconstruct the ParseState for those cases. To start with, toplevel/module scope is probably fine and we could throw something together quickly for that, I think.
While profiling allocations of JuliaSyntax parsing itself, I've noticed that the call stack sometimes gets extremely deep.
For example, the large expression in src/tokenize_utils.jl is_operator_start_char()
has some 573 parentheses (!) Setting aside that this is potentially questionable code in its own right ... it seems the call stack repeats with a period of 38 in this case, resulting in a call stack depth of ~32*573 = 21774. This being due to the way the expression is arranged as a completely unbalanced tree.
This is kind of inherent to using recursive descent the way we do, and presumably we can live with it. But it does seem a bit non-ideal for parser performance. Presumably if we parsed expressions with a Pratt parser we could avoid such extreme stack depths as the factor of 38 might be reduced to 2 or so?
On Julia master:
julia> @Meta.lower -(a = 2)
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = Core.get_binding_type(Main, :a)
│ %2 = Base.convert(%1, 2)
│ %3 = Core.typeassert(%2, %1)
│ a = %3
│ %5 = -2
└── return %5
))))
With JuliaSyntax:
:($(Expr(:thunk, CodeInfo(
@ none within `top-level scope`
1 ─ %1 = Core.tuple(:a)
│ %2 = Core.apply_type(Core.NamedTuple, %1)
│ %3 = Core.tuple(2)
│ %4 = (%2)(%3)
│ %5 = Core.kwfunc(-)
│ %6 = (%5)(%4, -)
└── return %6
))))
julia> Meta.parse("+=")
┌ Error: JuliaSyntax parser failed — falling back to flisp!
│ exception =
│ MethodError: no method matching head(::Nothing)
│
│ Closest candidates are:
│ head(::JuliaSyntax.GreenNode)
│ @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/green_tree.jl:69
│ head(::JuliaSyntax.SyntaxToken)
│ @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/parse_stream.jl:127
│ head(::JuliaSyntax.TaggedRange)
│ @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/parse_stream.jl:146
│ ...
│
│ Stacktrace:
│ [1] kind(x::Nothing)
│ @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/parse_stream.jl:91
│ [2] _incomplete_tag(n::JuliaSyntax.SyntaxNode)
│ @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/hooks.jl:55
│ [3] _core_parser_hook(code::String, filename::String, lineno::Int64, offset::Int64, options::Symbol)
│ @ JuliaSyntax ~/.julia/dev/JuliaSyntax/sysimage/JuliaSyntax/src/hooks.jl:172
I'm opening this issue based on a question on discourse.
Consider the following code:
i = 0
for n = 1:5
i++
end
This will result in (at least on Julia 1.7.0)
ERROR: syntax: unexpected "end"
This error message could benefit from JuliaLang/julia#45791. However, I'm doing a separate issue for this special case because a new user might write i++
not knowing it is not equivalent to i += 1
(or even that ++
is parsed as an infix operator). Would it be possible to have a more specific error message? Something like
ERROR: syntax: attempted to call an infix operator with just one argument - perhaps you meant "i += 1"?
julia> 0x1p
┌ Error: JuliaSyntax parser failed — falling back to flisp!
│ exception =
│ ArgumentError: cannot parse "0x1p" as Float64
│ Stacktrace:
│ [1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
│ @ Base ./parse.jl:373
│ [2] #tryparse_internal#494
│ @ ./parse.jl:369 [inlined]
│ [3] tryparse_internal
│ @ ./parse.jl:366 [inlined]
│ [4] #parse#495
│ @ ./parse.jl:379 [inlined]
│ [5] parse
│ @ ./parse.jl:379 [inlined]
│ [6] julia_string_to_number(str::SubString{String}, kind::JuliaSyntax.Kind)
│ @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/value_parsing.jl:28
│ [7] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
│ @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/syntax_tree.jl:33
│ [8] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
│ @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/syntax_tree.jl:87
│ [9] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::String, kws::Base.Pairs{Symbol, JuliaSyntax.Kind, Tuple{Symbol}, NamedTuple{(:wrap_toplevel_as_kind,), Tuple{JuliaSyntax.Kind}}})
│ @ JuliaSyntax ~/julia/usr/share/julia/stdlib/v1.9/JuliaSyntax/src/syntax_tree.jl:187
I believe this should emit an error instead of throwing inside the parser.
Hi,
"Once mature, replace Julia's flisp-based reference frontend in Core" Can it be done already, despite imperfect parsing? I mean (and I could check) make a sysimage without it, and with JuliaSyntax.jl instead. Do you have such a sysimage?
"Differences from the flisp parser" (and your nice JuliaCon video, as I recall) implies that flisp is already fully bypassed, at runtime, i.e. flisp is just (still) sitting there, just unused.
I'm ok with a replacement not capable of parsing all code (just Base, so most code), and that's already good enough for most users.
I've been thinking about making a minimal sysimage (resurrecting JuliaLite), to help with julia startup, e.g. sacrificing LinearAlgebra, to help for (small) Julia (non-math) scripts.
I'm not sure where flisp resides, don't locate a lib for it. Do you know if it's part of libjulia.so.1.8 and/or sys.so? I know it's rather small, so opening it might not be speed-critical, and nor for small script it might not be too much an overhead to actual parse-time.
I'm a bit confused by your 4 sec. parse time claim in your video, is that for a huge script or some pathological? My understanding was that the parser isn't speed-critical (relative to other, e.g. optimization), but every bit helps. And it's relaive importance when you run with -O0 (often useful for scripts) or --compile=min
https://github.com/jolin-io/WhereTraits.jl cites the following limitation with attendant explanation:
Top Level Only
Currently only top-level functions are supported, as the syntax stores and needs information about previous function definitions, which it stores globally. If macros would get informed about whether they are defined within another function, WhereTraits could also support innerfunctions.
I know flisp is involved with macro expansion. Is the above limitation something that can be addressed with this package?
julia> JuliaSyntax.parse(Expr, "1.9824062450251952342e-2660052")
ERROR: ArgumentError: cannot parse "1.9824062450251952342e-2660052" as Float64
Stacktrace:
[1] _parse_failure(T::Type, s::String, startpos::Int64, endpos::Int64) (repeats 2 times)
@ Base ./parse.jl:373
[2] #tryparse_internal#477
@ ./parse.jl:369 [inlined]
[3] tryparse_internal
@ ./parse.jl:366 [inlined]
[4] #parse#478
@ ./parse.jl:379 [inlined]
[5] parse
@ ./parse.jl:379 [inlined]
[6] julia_string_to_number(str::SubString{String}, kind::JuliaSyntax.Kind)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/value_parsing.jl:28
[7] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:33
[8] JuliaSyntax.SyntaxNode(source::JuliaSyntax.SourceFile, raw::JuliaSyntax.GreenNode{JuliaSyntax.SyntaxHead}, position::UInt32)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:89
[9] build_tree(::Type{JuliaSyntax.SyntaxNode}, stream::JuliaSyntax.ParseStream; filename::Nothing, kws::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:189
[10] build_tree
@ ~/.julia/packages/JuliaSyntax/OawBx/src/syntax_tree.jl:186 [inlined]
[11] #build_tree#90
@ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:208 [inlined]
[12] build_tree
@ ~/.julia/packages/JuliaSyntax/OawBx/src/expr.jl:207 [inlined]
[13] #parse#83
@ ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:124 [inlined]
[14] parse(::Type{Expr}, input::String)
@ JuliaSyntax ~/.julia/packages/JuliaSyntax/OawBx/src/parser_api.jl:120
[15] top-level scope
@ REPL[62]:1
julia> Meta.parse("1.9824062450251952342e-2660052")
0.0
Hi, I wonder whether this project needs a student contributors. I am an undergraduate student from Peking University (in China). I am mainly interested in the rewriting the lowering pipeline (defined in julia-syntax.scm) of Julia's frontend. Previously I was working on static compilation of Julia (using LLVM JITLink) and I found that it would be helpful to rewrite Julia's frontend (to get rid of some gensym
and record necessary information) if a more static approach is needed. But people working on this stuff told me that Julia is unlikely to work towards this direction and will stick to solution like system image + parallel compilation, which renders my though unnecessary. Besides, rewriting the frontend is time-consuming and hard to review.
The developers point me to this project. This project is really great and looks really promising! I would like to devote my free time to this project, mainly the lowering part (it seems that currently you are focusing on the parser part). I am looking forward to hearing your thoughts on it.
I thought that file might be a good stress test of this package. It seems to do quite well. The first case is probably up to interpretation, since although the flisp parser parses it, it's definitely not valid syntax. There does seem to be a bug in parsing empty multidimensional array literals though:
julia> JuliaSyntax.parseall(JuliaSyntax.SyntaxNode, read("test/syntax.jl"); filename=abspath("test/syntax.jl"))
ERROR: Error: Expected identifier:
@test !@isdefined(y)
@test_throws ErrorException eval(:(import .Mod.x as (a.b)))
import .Mod.maybe_undef as mu
Error: unexpected closing token:
@testset "empty nd arrays" begin
@test :([]) == Expr(:vect)
@test :([;]) == Expr(:ncat, 1)
@test :([;;]) == Expr(:ncat, 2)
@test :([;;;]) == Expr(:ncat, 3)
Error: unexpected closing token:
@test :([]) == Expr(:vect)
@test :([;]) == Expr(:ncat, 1)
@test :([;;]) == Expr(:ncat, 2)
@test :([;;;]) == Expr(:ncat, 3)
Error: unexpected closing token:
@test :([;]) == Expr(:ncat, 1)
@test :([;;]) == Expr(:ncat, 2)
@test :([;;;]) == Expr(:ncat, 3)
@test [] == Array{Any}(undef, 0)
Error: unexpected closing token:
@test [] == Array{Any}(undef, 0)
@test [;] == Array{Any}(undef, 0)
@test [;;] == Array{Any}(undef, 0, 0)
@test [;;;] == Array{Any}(undef, 0, 0, 0)
Error: unexpected closing token:
@test [] == Array{Any}(undef, 0)
@test [;] == Array{Any}(undef, 0)
@test [;;] == Array{Any}(undef, 0, 0)
@test [;;;] == Array{Any}(undef, 0, 0, 0)
Error: unexpected closing token:
@test [;] == Array{Any}(undef, 0)
@test [;;] == Array{Any}(undef, 0, 0)
@test [;;;] == Array{Any}(undef, 0, 0, 0)
@test :(T[]) == Expr(:ref, :T)
Error: unexpected closing token:
@test :(T[]) == Expr(:ref, :T)
@test :(T[;]) == Expr(:typed_ncat, :T, 1)
@test :(T[;;]) == Expr(:typed_ncat, :T, 2)
@test :(T[;;;]) == Expr(:typed_ncat, :T, 3)
Error: unexpected closing token:
@test :(T[]) == Expr(:ref, :T)
@test :(T[;]) == Expr(:typed_ncat, :T, 1)
@test :(T[;;]) == Expr(:typed_ncat, :T, 2)
@test :(T[;;;]) == Expr(:typed_ncat, :T, 3)
Error: unexpected closing token:
@test :(T[;]) == Expr(:typed_ncat, :T, 1)
@test :(T[;;]) == Expr(:typed_ncat, :T, 2)
@test :(T[;;;]) == Expr(:typed_ncat, :T, 3)
@test Int[] == Array{Int}(undef, 0)
Error: unexpected closing token:
@test Int[] == Array{Int}(undef, 0)
@test Int[;] == Array{Int}(undef, 0)
@test Int[;;] == Array{Int}(undef, 0, 0)
@test Int[;;;] == Array{Int}(undef, 0, 0, 0)
Error: unexpected closing token:
@test Int[] == Array{Int}(undef, 0)
@test Int[;] == Array{Int}(undef, 0)
@test Int[;;] == Array{Int}(undef, 0, 0)
@test Int[;;;] == Array{Int}(undef, 0, 0, 0)
Error: unexpected closing token:
@test Int[;] == Array{Int}(undef, 0)
@test Int[;;] == Array{Int}(undef, 0, 0)
@test Int[;;;] == Array{Int}(undef, 0, 0, 0)
@test :([ ]) == Expr(:vect)
Error: unexpected closing token:
@test :([
]) == Expr(:vect)
@test :([ ;; ]) == Expr(:ncat, 2)
@test :([
;;
Error: unexpected closing token:
@test :([ ;; ]) == Expr(:ncat, 2)
@test :([
;;
]) == Expr(:ncat, 2)
Once those are fixed, I think that would be a good test case for CI.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.