Git Product home page Git Product logo

Comments (3)

tysonzero avatar tysonzero commented on August 28, 2024

The current implementation seems to already basically do the right thing on UTF-8 encoded ByteString, trying to leave the URL alone except when it's truly invalid (a raw π half-way through) and then URL-encoding those invalid characters.

It should just be more transparent about the fact that it works on unicode/utf-8 by working on Text directly, helping avoid a variety of errors in the process.

from req.

mrkkrp avatar mrkkrp commented on August 28, 2024

ByteString semantically represents a sequence of 8 bit octets, not a series of ASCII characters, and URLs are semantically sequences of [ASCII] characters. You can see this distinction when looking at how Char8 is too big to store only ASCII, and with how often ByteString is used to store non-ASCII data, and with the non-ASCII encoding typically used when outputting to stdout.

Even so, Text is composed of elements which are even "bigger" allowing all sorts of Unicode in it. Thus it's hardly a better option?

The type which we take as the argument of parseUrlHttp(s) function is dictated by the http-types package because we build on functions like decodePathSegments and parseQueryText, both of which take ByteStrings and make some Text pieces out of it. If you want to change this, then perhaps the right place to start is http-types issue tracker.

Anyway, AFAIU, both Text and ByteString are not perfect:

  • In case of ByteString it's 1) have elements which are too big 2) encoding issues may introduce bugs.
  • In case of Text it's 1) even bigger with Unicode symbols which cannot appear un-escaped in URLs (correct me if I'm wrong, I may have forgotten the subtleties).

Since both options are not ideal and http-types uses ByteString I think we should stick to it.

from req.

tysonzero avatar tysonzero commented on August 28, 2024

The flaws in the types are more or less what you said, but I would adjust the phrasing slightly to show why Text really is much less problematic than ByteString:

The two main issues are:

  1. Encoding issues: It's relatively easy to have bugs that slip through the type checker if you just use the wrong encoding/decoding function. This issue is unique to ByteString.

  2. Too many codepoints: Suffered by both ByteString and Text, but I would actually argue that ByteString is even worse in this regard. At least with Text, these overly large codepoints have a pretty sane default behavior, namely UTF-8 encode + percent encode (e.g. what your browser does when you type arbitrary unicode in the URL).

The http-types aspect is interesting, and my follow up to that is that http-types is also doing the wrong thing, as they are currently worrying about encoding and assuming it to be utf8, when they should really only be dealing with charpoints. Occasionally it makes sense to do both such as in XML when you need to try and guess the encoding based on the encoding parameter and change the parser accordingly, but usually it does not make sense.

from req.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.