Git Product home page Git Product logo

Comments (8)

sldblog avatar sldblog commented on August 23, 2024

I was hesitating which project to open this issue against, ended up opening it where I believe the cause is. Please feel free to close it if it should be opened against HalBuilder/halbuilder-json.

from halbuilder-core.

sztupy avatar sztupy commented on August 23, 2024

Just to clarify some points: According to the JSON spec the result has to be in Unicode (not neceserally UTF-8), but the above code makes it possible to render results in a non-conformant encoding, like ISO 8859-1

from halbuilder-core.

talios avatar talios commented on August 23, 2024

Any recommendations for a solution? Normally, I'd expect myself to be calling .getBytes("UTF-8") on the resulting String, or calling toString(contentType, writer).

As far as the JVM is concerned, all strings are unicode, so in the above example - it's only in the call to System.out.println() where ISO 8859-1 is creeping in ( unless I'm overlooking something ).

Maybe we add a toBytes(encoding) method to the Representation interface?

from halbuilder-core.

sztupy avatar sztupy commented on August 23, 2024

I think what you are missing is that usually you'll use halbuilder with JAX-RS's Response class. Here is another working minimal example with JAX-RS that shows the difference between for example fasterxml and HalBuilder:

public class TestResource {
    private final RepresentationFactory representationFactory;

    public TestResource(RepresentationFactory representationFactory) {
        this.representationFactory = representationFactory;
    }

    @GET
    @Path("/jsondata")
    @Produces("application/json")
    public Response jsonData() {
        return Response.ok(new Test()).build();
    }

    @GET
    @Path("/haldata")
    @Produces("application/hal+json")
    public Response halData() {
        Representation representation = representationFactory.newRepresentation();
        representation.withBean(new Test());
        return Response.ok(representation).build();
    }

    public class Test {
        private String value;

        public Test() {
            this.value = "Motörhead";
        }

        public String getValue() {
            return value;
        }
    }
}

If you compile this and call the /jsondata endpoint, it will return UTF-8, even if the file.encoding is set to iso-8859-1:

$ curl "http://localhost:8080/jsondata" | hexdump -C
00000000  7b 22 76 61 6c 75 65 22  3a 22 4d 6f 74 c3 b6 72  |{"value":"Mot..r|
00000010  68 65 61 64 22 7d                                 |head"}|

However the /haldata endpoint will return the response in the specified file.encoding, which, when not set to UTF-8 generates invalid JSON responses:

$ curl "http://localhost:8080/haldata" | hexdump -C
00000000  7b 0a 20 20 22 76 61 6c  75 65 22 20 3a 20 22 4d  |{.  "value" : "M|
00000010  6f 74 f6 72 68 65 61 64  22 0a 7d                 |ot.rhead".}|

I don't know what magic the default json renderer does to ensure the result is always fine, but this magic seems to be missing from HalBuilder.

from halbuilder-core.

sldblog avatar sldblog commented on August 23, 2024

Changing the sample code to

import java.io.*;
public class Test {
  public static void main(String... args) throws Exception {
    StringWriter w = new StringWriter();
    w.append("\u65E5\u672C\u8A9E");
    System.out.println(new String(w.toString().getBytes("UTF-8")));
  }
}

does the trick:

➜  LC_ALL=en_US java Test
日本語
➜  LC_ALL=en_US java -Dfile.encoding=utf-8 Test
日本語
➜  LC_ALL=en_US.utf-8 java Test
日本語
➜  LC_ALL=en_US.utf-8 java -Dfile.encoding=iso-8859-1 Test
日本語

from halbuilder-core.

sldblog avatar sldblog commented on August 23, 2024

The problem is whether all representations return unicode. I imagine representation writers should know about the encoding of the output they will create?

from halbuilder-core.

talios avatar talios commented on August 23, 2024

What I'm thinking here, rather than the new String() approach which is along the lines of:

import java.io.*;
public class Test {
  public static void main(String... args) throws Exception {
    ByteArrayOutputStream boas = new ByteArrayOutputStream();
    OutputStreamWriter osw = new OutputStreamWriter(boas, "UTF-8");
    osw.append("\u65E5\u672C\u8A9E");
    osw.flush();
    System.out.println(boas.toString());
  }
}

Using a ByteArrayOutputStream rather than a StringWriter directly. This seems to work with from my OSX machine:

➜ ~ java -Dfile.encoding=ISO-8859-1 Test
日本語

@sztupy makes the assumption that everyone is using JAX-RS ( we're actually using a (sadly older) version of Restlet at work ). However, even there, the API usage is very similar, rather than using the variant passing in the Writer.

@sldblog does raise an interesting point about whether each representation should know about the encoding it desires, however - if we continue to pass in the Writer to them, the ultimate encoding used is beyond out control ( see the example, where the encoding is set on the writer ).

Maybe, for the direct toString(contentType) we just default to UTF-8 but introduce a new variant of toString(contentType, encoding)? That does start to get messy with the various other over loaded variants like toString(contentType, flags) - named/optional parameters would be lovely here.

Thoughts?

from halbuilder-core.

sldblog avatar sldblog commented on August 23, 2024

OutputStreamWriter is definitely much better than the String#getBytes stuff, if the result must remain String.

from halbuilder-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.