Comments (11)
Seems like this can be in 2.0, unless it is easy enough to do. Stefan?
from julia.
Yeah, it turns out to be a pain in the ass to do from the C code (can't find the email thread, but it is).
Another possibly simpler way to handle this is to change the parser to disallow \x80
through \xff
in bare string literals but to allow them in prefixed string literals, passing them through to implementing macro to decide whether to allow them or not. Throwing a syntax error from a macro is no problem, it's throwing a syntax error from C code called directly by the parser that's problematic. @JeffBezanson, can you do that?
from julia.
Actually it makes sense anyway for the parser to give a syntax error on invalid utf-8 sequences in a string or anywhere else. The parser can also check in cases where it does unescaping, and the julia string macros can check when they do unescaping. In case of an error the macro should return an error expression, Expr(:error,{"msg"})
instead of throwing, to allow history to work properly.
Also this seems strange:
julia> "\\\""
"\\\""
julia> S"\\\""
"\""
Am I doing something wrong in the parser?
from julia.
Also I see utf32.j is not in use. Can we get it in shape?
from julia.
I could work on utf32.j
and latin1.j
or we could mothball them for the release. The main issue is that currently we don't actually have any support for reading files with different encodings. These aren't really useful without that.
There's definitely something funny going on with escape handling that's different for bare vs. prefixed or otherwise macro-handled strings. See issue #100. Pretty sure it's the same problem. I'm looking into it, but I'm not quite sure what's going wrong yet.
from julia.
Ah, I was wrong, a macro can throw an error and it is automatically handled.
from julia.
How does 6614948 not address all of this issue? Seems fully addressed to me.
from julia.
Oh, I guess that's true since print_unescaped never generates invalid sequences. But we have this:
julia> "\x80"
syntax error: invalid utf-8 sequence
julia> "\x80$1"
"\u00801"
We have to do something else with \x and \000. In byte arrays, b"\x80"
, we clearly want \x to insert bytes. So to be consistent it should always insert bytes, and for strings this process is followed by a check to make sure all those bytes add up to valid utf-8.
from julia.
Ah, yeah. That is still an issue. I'll add a check after constructing a new string. If we just disallowed the escapes above \x7f altogether checking UTF-8 validity would be unnecessary since there'd be no way to even express an invalid string. That would have to match between the parser and the str_S form though — shouldn't allow it in one and not the other.
from julia.
Yes that would also be a sensible approach. It's kind of a toss up. I prefer to err on the side of allowing as much as possible. You can enter anything, but we call a validation routine. It's "trust but verify" :)
The validation is needed anyway since the source file itself could contain invalid utf-8 even with no escape sequences present.
Plus, after fixing this we get byte array literals for free.
from julia.
Closed by ad06687.
from julia.
Related Issues (20)
- We should have no throw `setindex!` for existing indices. HOT 2
- whether `Core.checked_dims` decides to throw depends on dimension order (for empty arrays) HOT 2
- colonful `reshape` may spuriously throw `DivideError`
- GC and multithreading. Strange behaviour. HOT 3
- Update main entry-point docs to use lowercase `args` HOT 2
- Segmentation fault with Distributed when --threads is set HOT 2
- Julia is incorrectly passing arguments to C on Apple M1 HOT 11
- Strange code generation with `unsafe_trunc` and dynamic dispatch HOT 15
- Segfault during push!() if using a large struct in 1.10 due to elsize overflow HOT 7
- segfault in `jl_datatype_layout` while constructing `Memory` HOT 5
- Slow/delays in REPL upon start HOT 3
- Creating struct instances is slow when another inner constructor returns `undef` fields HOT 4
- Possible bug when constructing a string from an array of characters HOT 4
- Too long stack traces at the REPL shell mode HOT 1
- 'make test' fails for julia-1.11.0-beta1 HOT 6
- Identical calls to getindex not optimised out HOT 6
- 20x slowdown when using `Int32` instead of `Int16` HOT 2
- Segfault during `eval` HOT 6
- `supertype` is documented to only accept a DataType when it also accepts other types HOT 2
- `det` inexact for complex skew symmetric matrix HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from julia.