Git Product home page Git Product logo

Comments (8)

dsnet avatar dsnet commented on August 18, 2024 2

The proposed future "encoding/json/jsontext" package produces a better error message:

jsontext: invalid surrogate pair `\udca0'none'` within string

as it preserves the context that this is occurring within a surrogate pair.

from protobuf.

neild avatar neild commented on August 18, 2024

U+DCA0 is a Unicode surrogate, and should not appear in valid UTF-8 text. A conformant UTF-8 decoder is required to reject input containing an unpaired surrogate; see "How do I convert an unpaired UTF-16 surrogate to UTF-8?" here: https://unicode.org/faq/utf_bom.html

I'm not sure about the history that led to encoding/json accepting this invalid input and protojson rejecting it, but I don't think protojson is doing the wrong thing here.

from protobuf.

puellanivis avatar puellanivis commented on August 18, 2024

I think this is still generating an improper error message. It seems to be complaining that the invalid escape code is 'none' (which is not at all being used as an escape code), when it should probably be reporting the invalid escape code to be dca0, right?

from protobuf.

neild avatar neild commented on August 18, 2024

Yes, you're right; I wasn't looking at the error message. The error is definitely wrong, or at least confusing.

Also, the protojson parser expects a surrogate to be followed by \uXXXX. I'm not certain if we should be expecting the potential for input like \udca0 (the surrogate) followed by a single unescaped character with no \u prefix.

from protobuf.

puellanivis avatar puellanivis commented on August 18, 2024

Yeah… not sure what the proper behavior should be in error conditions. But also, do we check at all that a Hi Surrogate is followed by a Lo Surrogate, and vice versa, or do we just check that surrogates occur in pairs?

I smell the familiar scent of a rabbit role into a pedantic reading of standards… 😩 At least the minimal solution here seems to be that if a surrogate isn’t followed by a surrogate, we report the escape itself as invalid? Or some sort of message that surrogates must be paired?

from protobuf.

dsnet avatar dsnet commented on August 18, 2024

The JSON mapping for protobuf follows RFC 7493, which requires strict checking of surrogate halves (see section 2.1 of RFC 7493). The "encoding/json" package only adheres to RFC 8259, which leaves split surrogate halves as undefined behavior (see section 8.2 of RFC 8259). As a side note, the "encoding/json/v2" draft proposal will target compliance with RFC 7293 by default.

I'm not certain if we should be expecting the potential for input like \udca0 (the surrogate) followed by a single unescaped character with no \u prefix.

A standalone surrogate half is invalid and should be rejected. The validity of a surrogate pair is determine using utf16.DecodeRune.

I don't see anything wrong with what protojson is doing. It correctly identified a surrogate half and was now expecting another escaped surrogate half. The error message says:

proto: syntax error (line 1:9): invalid escape code "'none'" in string

Strictly speaking, this error message is correct as 'none' is literally an invalid escape code.

from protobuf.

puellanivis avatar puellanivis commented on August 18, 2024

Interesting, I see what you mean with the “'none' is an invalid escape code” message. It also just coincidentally has the same length one would expect a \uXXXX escape sequence to be.

I do like the error message from jsontext which makes it clear exactly what is actually going wrong.

from protobuf.

lfolger avatar lfolger commented on August 18, 2024

Improving the error message seems reasonasble to use and we are happy to accept a contribution.

from protobuf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.