fasterxml / jackson-dataformats-binary Goto Github PK

View Code? Open in Web Editor NEW

304.0 20.0 127.0 10.07 MB

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile

License: Apache License 2.0

Java 98.93% Shell 0.02% Logos 1.05%

avro cbor protobuf smile jackson-backends hacktoberfest

jackson-dataformats-binary's Introduction

Overview

This is a multi-module umbrella project for Jackson standard binary dataformat backends.

Dataformat backends are used to support format alternatives to JSON, using general-purpose Jackson API. Formats included allow access using all 3 API styles (streaming, databinding, tree model).

For Jackson 2.x this is done by sub-classing Jackson core abstractions of:

All backends sub-class JsonFactory, which is factory for:
- JsonParser for reading data (decoding data encoding in supported format)
- JsonGenerator for writing data (encoding data using supported format)
Starting with 2.10 there is also sub-class of ObjectMapper (like CBORMapper, SmileMapper) for all formats, mostly for convenience
Jackson 2.10 also added "Builder" style construction for above-mentioned factories, mappers.

Status

Binary formats supported

Currently included backends are:

License

All modules are licensed under Apache License 2.0.

Maven dependencies

To use these format backends Maven-based projects, use following dependency:

<dependency>
  <groupId>com.fasterxml.jackson.dataformat</groupId>
  <artifactId>jackson-dataformat-[FORMAT]</artifactId>
  <version>2.13.0</version>
</dependency>

where [FORMAT] is one of supported modules (avro, cbor, smile etc)

Development

Maintainers

Author: Tatu Saloranta (@cowtowncoder)
Active Maintainers:
- Michael Liedtke (@mcliedtke) (Ion backend)

You may at-reference them as necessary but please keep in mind that all maintenance work is strictly voluntary (no one gets paid to work on this or any other Jackson components) so there is no guarantee for timeliness of responses.

Branches

master branch is for developing the next major Jackson version -- 3.0 -- but there are active maintenance branches in which much of development happens:

2.14 is for developing the next 2.x version
2.13 and 2.12 are for backported fixes for 2.13/2.12 versions (respectively)

Older branches are usually not changed but are available for historic reasons. All released versions have matching git tags (jackson-dataformats-binary-2.10.3).

Note that since individual format modules used to live in their own repositories, older branches (before 2.8) and tags do not exist in this repository.

Other Jackson binary backends

In addition to binary format backends hosted by FasterXML in this repo, there are other known Jackson backends for binary data formats. For example:

bson4jackson for BSON
EXIficient for Efficient XML Interchange
jackson-dataformat-msgpack for MessagePack (aka MsgPack) format

See Wiki for more information (javadocs).

jackson-dataformats-binary's People

Contributors

Stargazers

Watchers

Forkers

lemonzone2010 panchenko vrdate trilliant hseok2jang doctorzk zjuzxk shotbythought pszymczyk marcusb hellblazer lubovarga baharclerode aosagie niksu7 fivetran knoguchi jmax01 blacelle email2liyang keimhaqi philipa thsiung quaff valery1707 jaceklach marsqing rculbertson suyinlong jcustenborder mmilkin trompa wujimin huaweicse kucera-jan-cz luissilvawba carterkozak sullis michalmisiewicz zelinzhao lukidzi raganhan chakra-coder kharsha64 researchmore tmawan steinarb iziamos gaybro8777 shashikanthkbagali5 amitsingh-10 solarnetwork zslayton ankel rashtao yawkat paulfferraro tgregg jeeftor valadan everysens mcliedtke amorimjuliana jwijgerd emeraldjava marcospassos shogondo guillaumebort jobarr-amzn appsou davidlepilote isabella232 corporateadedayo hohle santosh653 jhhladky asafblv mgoertzen dsyer fmeum manaigrn-amzn wiltonlazary asellappen michalfoksa haosong hherman1 sonrohan mgodave mynameissanshao brankoterzicinstana martingian gubaojian topru333 charleszkq zeyucai neoremind htmldoug popematt atokuzamzn chessvivek

jackson-dataformats-binary's Issues

[smile] Make Smile format jar work as "smile tool" (add Main-Class in manifest)

WIth Jackson 1.x, it was simple to make Smile jar a command-line tool. But with 2.x we have the challenge of bundling things. In all likelihood we'll want to do this via new "Jackson tools" project, but for now let's keep issue here.

Serialization of multiple objects (`SequenceWriter`)

A common use case with Avro is to serialize multiple objects to a single file, and to deserialize multiple objects from a single file. This is done using DataFileWriter and DataFileReader. The file contains a single schema for all objects.

Would it be possible to do this with jackson-dataformat-avro?

Regards,
Tom

Parsing context not properly updated for root context

Currently parsing context at outermost (root) level does not work well with respect to logical model: we should really just have logical Object, and no root. Not a big deal in many ways, but seems wrong.

[protobuf] Some fields are left null

Depending on field names and ordering, some fields are at times left out, probably when reading in proto message (since this can be reproduced also by writing the same content with c++ compiled proto interface and reading in with Jackson).

Attached JUnit test case with two ways to work around the problem (JsonPropertyOrder annotation or leaving out a nested object value).

Tested with version 2.8.7 (and same problems in previous versions too, but I haven't run this exact test with any other version).
TestProto.java.txt

(cbor) Buffer size dependency in `UTF8JsonGenerator writeRaw(...)`

(from FasterXML/jackson-core#354 -- note that it is not really CBOR issue, but is included here because reproduction from jackson-core itself is difficult. So we use CBOR module for regression tests in a way)

UT8JsonGenerator improperly depends on the relative sizes of _outputBuffer and _charBuffer; if the former is more than three times the latter, writeRaw will attempt to write too many characters to the character buffer and produce an ArrayIndexOutOfBoundsException. Normally this does not occur, as the default sizes for the byte buffer and character buffer are 8000 and 4000 respectively. But the CBOR module asks for a 16k buffer, when can then be recycled to the UTF8JsonGenerator, triggering the error. The test below can reproduce it,.

package org.fizmo;

import com.fasterxml.jackson.annotation.JsonRawValue;
import com.fasterxml.jackson.annotation.JsonValue;
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonGenerator;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.cbor.CBORFactory;
import com.fasterxml.jackson.dataformat.cbor.CBORGenerator;
import org.junit.Assert;
import org.junit.Test;

public class JacksonCborTest {

    public static class RawBean {
        public String value;

        public RawBean(String value) {
            this.value = value;
        }

        @JsonValue
        @JsonRawValue
        public String getValue() {
            return value;
        }
    }

    @Test
    public void testCase() {
        String data = "{\"x\":\"" + generate(5000) + "\"}";
        RawBean bean = new RawBean(data);
        ObjectMapper mapper = new ObjectMapper();
        mapper.findAndRegisterModules();
        JsonFactory factory = new JsonFactory(mapper);
        CBORFactory cborFactory = new CBORFactory(mapper);

        try {
            try (CBORGenerator generator = cborFactory.createGenerator(System.out)) {
                //generator.writeObject(1);
            }
            JsonGenerator generator = factory.createGenerator(System.out);
            generator.writeObject(bean);
        } catch (Exception e) {
            Assert.fail(e.getMessage());
        }

    }

    private static final String ALPHA_NUMERIC_STRING = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";

    private String generate(int count) {
        StringBuilder builder = new StringBuilder();
        while (count-- != 0) {
            int character = (int)(Math.random()*ALPHA_NUMERIC_STRING.length());
            builder.append(ALPHA_NUMERIC_STRING.charAt(character));
        }
        return builder.toString();
    }
}

[avro] Regression due to changed namespace of inner enum types

This change:

7ed5465#diff-f1d5995cc2009c62f822a9e5da3ebd00R27

modified the namespace for inner enumerations. It used to just be the containing package, now it has the type. Since the AvroAlias annotation isn't yet supported, there's no way to make the schema generated from 2.8.7 compatible (in terms of avro's SchemaCompatibility.checkReaderWriterCompatibility) with 2.8.6

"Can not create generator" in _nonByteSource()

I'm studying the binary backends in preparation for possibly writing one of my own. Each of them has a _nonByteSource() method in the factory class that throws an UnsupportedOperationException with the message, "Can not create generator for non-byte-based source". Do I understand correctly that it should say "parser", rather than "generator"?

[avro] Add support for reading schema from Avro-encoded file

(moved from FasterXML/jackson-dataformat-avro#10)

Avro streams may include embedded schema, and since it should be relatively safe to either auto-detect it; or just configure this to be the default if no schema is specified, we should support this mode.

As to sample data, maybe this project:

https://github.com/miguno/avro-cli-examples

has data we could use for confirming proper usage.

A follow-up feature should probably be that of producing & embedded schema; but that'd be a separate RFE.

Java float deserialized as `DoubleNode` instance

When parsing JSON documents with a schema described by suitably annotated Java POJO classes, Jackson represents float values with DoubleNode instances, not FloatNode instances as might be expected. This causes problems with other downstream libraries that attempt to interpret the node tree. There are situations where they may call isFloat() on the respective nodes, expecting it to return true for nodes that were derived from float properties. However, in these circumstances DoubleNode.isFloat() is invoked yielding a return value of false and causing said libraries to raise an error.

I believe I have traced this behaviour back to: com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer._fromFloat(...)

The last call to the nodeFactory returns a DoubleNode not a FloatNode. I wonder if this should instead be implemented in a similar manner to _fromInt(...) like so:

    protected final JsonNode _fromFloat(JsonParser p, DeserializationContext ctxt,
            final JsonNodeFactory nodeFactory) throws IOException
    {
        JsonParser.NumberType nt = p.getNumberType();
        if (nt == JsonParser.NumberType.BIG_DECIMAL
            || ctxt.isEnabled(DeserializationFeature.USE_BIG_DECIMAL_FOR_FLOATS)) {
            return nodeFactory.numberNode(p.getDecimalValue());
        }
        if (nt == JsonParser.NumberType.DOUBLE) {              // PROPOSED CHANGE
            return nodeFactory.numberNode(p.getDoubleValue()); // PROPOSED CHANGE
        }                                                      // PROPOSED CHANGE
        return nodeFactory.numberNode(p.getFloatValue());      // PROPOSED CHANGE
    }

I am currently working around this by patching the downstream library to do something like this:

if (node.isDouble() && node.doubleValue() >= -Float.MAX_VALUE
    && node.doubleValue() <= Float.MAX_VALUE || node.isFloat()) {
        return datum.floatValue();

Reference: problem discussion on jackson-user

SMILE format specification - bug in "safe binary" encoding

SMILE specification contains the following statements:

"Big" decimal/integer values use "safe" binary encoding

"Safe" binary encoding simply uses 7 LSB: data is left aligned (i.e. any padding of the last byte is in its rightmost, least-significant, bits).

Let's consider a simple array: Array(0x01). After encoding it to 7LSB it will use 2 bytes. According to the specification "any padding of the last byte is in its rightmost, least-significant, bits", so it should look like this:

_0000000 _1pppppp   (hex: 0x0040)
where:
_ - unused byte (0)
p - padding (0)

However, the jackson library behaves differently. The following code (Scala) encodes a BigInteger (1).

object SmileTestDataGenerator {
  def main(args: Array[String]) {
    val sf = new SmileFactory()
    val os = new ByteArrayOutputStream()
    val gen = sf.createGenerator(os)
    gen.writeNumber(BigInteger.valueOf(1))
    gen.close()
    println(DatatypeConverter.printHexBinary(os.toByteArray))
  }
}

Output:

3A290A0126810001

After removing header/type token/content length we are left with 0001!
I've discovered that padding is located in the last byte but in its LEFT-MOST, most-significant bits. So an array(0x01) is encoded to:
_0000000 pppppp_1

Am I right? Is the specification wrong or is it a bug in implementation?

[ion] Plans to include Amazon's Ion?

Not sure where to place this but I think Amazon's Ion is similar to cbor.

Replace use of `BinaryDecoder` with direct access

Due to other rewrites use of Avro-lib BinaryDecoder has been shrunk to a small subset. By removing the rest would allow some efficiency gains, not because decoder is slow, but because using it requires additional buffers which could be eliminated with direct access.

Reading Avro with specified reader and writer schemas

In origin avro-tools when I read some data I can specify reader and writer schemas. It is related with backward/forward compatibility:

SpecificDatumReader r = new SpecificDatumReader(writer, reader);
BinaryDecoder decoder = DecoderFactory.get().directBinaryDecoder(inputStream, null);
return r.read(null, decoder);

Is it possible to specify writer and reader schema in AvroMapper?

[cbor] Implement `JsonGenerator.writeArray()` methods to be added in 2.8

As soon as support for efficient scalar-array writes in jackson-core (see FasterXML/jackson-core#277) is added, should add specialized implementation for cbor backend, since this allows use of length-prefix notation, needed/preferred by some decoders.

Also related to #1 and #2.

[protobuf] . can't resolve inner types

When messages are defined as below, the parser failed to resolve t1.i1 type.

package mypackage;

message t1 {
        message i1 {
                optional uint32 x = 1;
                optional uint32 y = 2;
        }
}

message t2 {
        optional t1.i1 z = 1;
}

Error

Exception in thread "main" java.lang.IllegalArgumentException: Unknown protobuf field type 't1.i1' for field 'z' of MessageType 't2' (known enum types: ; known message types: t1, t2)
	at com.fasterxml.jackson.dataformat.protobuf.schema.TypeResolver._resolve(TypeResolver.java:141)
	at com.fasterxml.jackson.dataformat.protobuf.schema.TypeResolver.resolve(TypeResolver.java:93)
	at com.fasterxml.jackson.dataformat.protobuf.schema.NativeProtobufSchema.forType(NativeProtobufSchema.java:67)
	at com.fasterxml.jackson.dataformat.protobuf.schema.ProtobufSchemaLoader.load(ProtobufSchemaLoader.java:51)

Add a way to produce "file" style Avro output

(moved from FasterXML/jackson-dataformat-avro#42 by @srini-daruna)

I created avro schema from class and got the byte array and want to write the data to avro file.
How can i do that.?

InputData source not (yet?) supported for this format (Smile, CBOR)

Seems the createParser method of SmileFactory and CBORFactory is lacking implementation for java.io.DataInput sources.

[cbor] Support writing Map (Object) with finite length

(from https://github.com/FasterXML/jackson-dataformat-cbor/issues/11)

By @mbaril:

It would be great to add the possibility to encode array and map with finite length.

Add tracking of `required fields during writing

(moved from FasterXML/jackson-dataformat-protobuf#4)

Initial generator version does not keep track of required fields, but it should in near future.
Required fields for which no explicit value is given should either throw an exception, or write "default" placeholder value; choice could depend on a feature.

Serialization of multiple nesting levels has issues

With latest snapshot version 2.9.0.pr3-SNAPSHOT, field of bigger tag will be serialized between fields of smaller tag. Below is the test case.
The serialization result is(in hex)
08 01 12 0a 12 02 08 03 08 02 12 02 08 04
While the correct result is(in hex)
08 01 12 0a 08 02 12 02 08 03 12 02 08 04

'08 02' is inserted between list element '12 02 08 03' and '12 02 08 04'

import static org.junit.Assert.assertEquals;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.protobuf.ProtobufMapper;
import com.fasterxml.jackson.dataformat.protobuf.schema.ProtobufSchema;
import com.fasterxml.jackson.dataformat.protobuf.schemagen.ProtobufSchemaGenerator;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.util.Arrays;
import java.util.List;
import org.junit.Test;

/**
 * Created by marsqing on 23/03/2017.
 */
public class BugReport {

  @Test
  public void test() throws Exception {
    ProtobufMapper mapper = new ProtobufMapper();
    ProtobufSchema schema = getSchema(mapper, Level1.class);

    System.out.println(schema.getSource());

    Level1 level1 = new Level1();
    Level2 level2 = new Level2();
    Level3 level3a = new Level3();
    Level3 level3b = new Level3();

    level1.setValue(1);
    level2.setValue(2);
    level3a.setValue(3);
    level3b.setValue(4);
    List<Level3> level3s = Arrays.asList(level3a, level3b);

    level1.setLevel2(level2);
    level2.setLevel3s(level3s);

    ByteArrayOutputStream bout = new ByteArrayOutputStream();
    mapper.writer(schema).writeValue(bout, level1);

    showBytes(bout.toByteArray());

    Level1 gotLevel1 = mapper.readerFor(Level1.class).with(schema).readValue(new ByteArrayInputStream(bout.toByteArray()));

//    byte[] correct = new byte[]{0x08, 0x01, 0x12, 0x0a, 0x08, 0x02, 0x12, 0x02, 0x08, 0x03, 0x12, 0x02, 0x08, 0x04};
//    Level1 gotLevel1 = mapper.readerFor(Level1.class).with(schema).readValue(new ByteArrayInputStream(correct));

    assertEquals(level1.getValue(), gotLevel1.getValue());
    assertEquals(level2.getValue(), gotLevel1.getLevel2().getValue());
    assertEquals(level3s.size(), gotLevel1.getLevel2().getLevel3s().size());
    assertEquals(level3a.getValue(), gotLevel1.getLevel2().getLevel3s().get(0).getValue());
    assertEquals(level3b.getValue(), gotLevel1.getLevel2().getLevel3s().get(1).getValue());
  }

  private ProtobufSchema getSchema(ObjectMapper mapper, Class<?> clazz) throws Exception {
    ProtobufSchemaGenerator gen = new ProtobufSchemaGenerator();
    mapper.acceptJsonFormatVisitor(clazz, gen);
    return gen.getGeneratedSchema();
  }

  private void showBytes(byte[] bytes) {
    for (byte b : bytes) {
      System.out.print(String.format("%8s", Integer.toHexString(b)).substring(6, 8).replaceAll(" ", "0") + " ");
    }
    System.out.println();
  }

  public static class Level1 {

    private int value;

    private Level2 level2;

    public int getValue() {
      return value;
    }

    public void setValue(int value) {
      this.value = value;
    }

    public Level2 getLevel2() {
      return level2;
    }

    public void setLevel2(Level2 level2) {
      this.level2 = level2;
    }
  }

  public static class Level2 {

    private int value;
    private List<Level3> level3s;

    public int getValue() {
      return value;
    }

    public void setValue(int value) {
      this.value = value;
    }

    public List<Level3> getLevel3s() {
      return level3s;
    }

    public void setLevel3s(List<Level3> level3s) {
      this.level3s = level3s;
    }
  }

  public static class Level3 {

    private int value;

    public int getValue() {
      return value;
    }

    public void setValue(int value) {
      this.value = value;
    }

  }

}

(avro) Implement native `float` handling for parser

Current implementation coerces float values into double internally. This should not be necessary; cbor for example supports distinction.

Implement `JsonGenerator.writeStartArray(int)`

(moved from https://github.com/FasterXML/jackson-dataformat-cbor/issues/1)

When jackson-core adds the new variant of writeStartArray, CBOR module should implement it, at least for cases where length is small enough and we can save one byte from encoded length. Ideally we should also verify that actual number of value writes matches to indicated length.

CBORParser seems to promote float to double

I want to use Jackson CBOR because the CBOR RFC has self describing types and distinguishes between different sizes of floating point and integral numbers. I want to use this feature of the format specification to infer a schema from collections of CBOR messages.

By default, the CBORGenerator will write longs < Integer.MAX_VALUE as ints, but this can be disabled.

CBORGenerator writes floats and doubles properly. CBORParser correctly identifies floats as floats (nextToken case 26) but constructs a double nevertheless. This seems to be because JsonToken in Jackson core does not differentiate between floats and doubles.

Here's a simple test to demonstrate the issue:

  public void testPrimitiveTypeInvariance() throws IOException {
    ObjectMapper mapper = new ObjectMapper(
            new CBORFactory().disable(CBORGenerator.Feature.WRITE_MINIMAL_INTS)
    );
    Map<String, Object> map = ImmutableMap.of("longField", 1L, "intField", 1, "doubleField", 1.0, "floatField", 1.0f);
    byte[] json = mapper.writeValueAsBytes(map);
    Map<String, Object> fromCbor = mapper.readerFor(Map.class).readValue(json);
    test(fromCbor);
  }

  private void test(Map<String, Object> map) {
    Assert.assertTrue("long not preserved: " + className(map.get("longField")), map.get("longField") instanceof Long);
    Assert.assertTrue("int not preserved: " + className(map.get("intField")), map.get("intField") instanceof Integer);
    Assert.assertTrue("double not preserved: " + className(map.get("doubleField")), map.get("doubleField") instanceof Double);
    Assert.assertTrue("float not preserved: " + className(map.get("floatField")), map.get("floatField") instanceof Float);
  }


  private String className(Object o) {
    if(null == o) return "null";
    return o.getClass().getCanonicalName();
  }

The version of jackson-dataformat-cbor is 2.8.3.

(protobuf) Implement native `float` handling for parser

Current implementation coerces float values into double internally. This should not be necessary; cbor for example supports distinction.

Support @JsonSubTypes in schema generation and serialization

(moved from FasterXML/jackson-dataformat-avro#28 authored by @osi)

I'd like to get the JsonSubTypes annotation working for schema generation and serialization.

For schema generation, I think it should be a union of all the possible sub-types.

For serialization, if the configuration is such that the type name would be included as a property, ignore that, since it will be included as the name of the record type.

I have written some failing tests, but it is unclear to me how to proceed, since the TypeIdResolver doesn't provided a way to interrogate all possible types.

package com.fasterxml.jackson.dataformat.avro;

import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
import org.apache.avro.Schema;
import org.junit.Test;

import java.util.Arrays;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertNotNull;

public class JsonTypeInfoTest {

    @Test
    public void testGenerateSchemaForSubTypes() throws Exception {
        AvroMapper mapper = new AvroMapper();
        AvroSchema schema = mapper.schemaFor(Thing.class);

        Schema stringOrNull = Schema.createUnion(Arrays.asList(Schema.create(Schema.Type.NULL), Schema.create(Schema.Type.STRING)));
        Schema thingOne = Schema.createRecord(
                ThingOne.class.getSimpleName(),
                "Schema for " + ThingOne.class.getName(),
                ThingOne.class.getPackage().getName(),
                false);
        thingOne.setFields(Arrays.asList(
                new Schema.Field("favoriteColor", stringOrNull, null, null),
                new Schema.Field("name", stringOrNull, null, null)
        ));

        Schema thingTwo = Schema.createRecord(
                ThingTwo.class.getSimpleName(),
                "Schema for " + ThingTwo.class.getName(),
                ThingTwo.class.getPackage().getName(),
                false);
        thingTwo.setFields(Arrays.asList(
                new Schema.Field("favoriteFood", stringOrNull, null, null),
                new Schema.Field("name", stringOrNull, null, null)
        ));

        Schema expected = Schema.createUnion(Arrays.asList(thingOne, thingTwo));
        assertEquals(expected, schema.getAvroSchema());
    }

    @Test
    public void testSerializeSubType() throws Exception {
        AvroMapper mapper = new AvroMapper();
        AvroSchema schema = mapper.schemaFor(Thing.class);

        assertNotNull(mapper.writer(schema).writeValueAsBytes(new ThingOne("hello", "blue")));
    }

    @JsonTypeInfo(use = JsonTypeInfo.Id.NAME,
            include = JsonTypeInfo.As.PROPERTY,
            property = "@type")
    @JsonSubTypes({
            @JsonSubTypes.Type(value = ThingOne.class, name = "one"),
            @JsonSubTypes.Type(value = ThingTwo.class, name = "two")
    })
    public interface Thing {
        @JsonProperty
        String name();
    }

    public static class ThingOne implements Thing {
        public final String favoriteColor;
        private final String name;

        public ThingOne(String name, String favoriteColor) {
            this.name = name;
            this.favoriteColor = favoriteColor;
        }

        @Override
        public String name() {
            return name;
        }
    }

    public static class ThingTwo implements Thing {
        public final String favoriteFood;
        private final String name;

        public ThingTwo(String name, String favoriteFood) {
            this.name = name;
            this.favoriteFood = favoriteFood;
        }

        @Override
        public String name() {
            return name;
        }
    }
}

java.lang.ArrayIndexOutOfBoundsException at CBORGenerator.java:548

I am encountering index out of bound exceptions at CBORGenerator.java:548 when performing serialization with a new ObjectMapper(new CBORFactory()) instance. I haven't done a full investigation of the cause since calls to CBORGenerator::writeStartObject are very stateful. Here is the offending function.

@Override
// since 2.8
public final void writeStartObject(Object forValue) throws IOException {
    _verifyValueWrite("start an object");
    JsonWriteContext ctxt = _writeContext.createChildObjectContext();
    _writeContext = ctxt;
    if (forValue != null) {
        ctxt.setCurrentValue(forValue);
    }
    if (_elementCountsPtr > 0) {
        _elementCounts[_elementCountsPtr++] = _currentRemainingElements;
    }
    _currentRemainingElements = INDEFINITE_LENGTH;
    _writeByte(BYTE_OBJECT_INDEFINITE);
}

An example error producing call has _elementCountsPtr = 11 and _elementCounts.length = 10. My guess at a solution and my current workaround is to add a safety check like in CBORGenerator::writeStartArray. Here is that function.

@Override
public void writeStartArray(int elementsToWrite) throws IOException {
    _verifyValueWrite("start an array");
    _writeContext = _writeContext.createChildArrayContext();
    if (_elementCounts.length == _elementCountsPtr) { // initially, as well as if full
        _elementCounts = Arrays.copyOf(_elementCounts, _elementCounts.length+10);
    }
    _elementCounts[_elementCountsPtr++] = _currentRemainingElements;
    _currentRemainingElements = elementsToWrite;
    _writeLengthMarker(PREFIX_TYPE_ARRAY, elementsToWrite);
}

As you can see, the function resizes the _elementCounts array when required. I don't know what kind of coding styles you want to employ, but I would change the offending section in every writeStartArray and writeStartObject overload to read:

    if (_elementCounts.length <= _elementCountsPtr) { // less than or equal to catch more bad cases
        //use _elementCountsPtr for new size to guarantee the array is big enough for this call
        _elementCounts = Arrays.copyOf(_elementCounts, _elementCountsPtr+10); 
    }
    if (_elementCountsPtr > 0) { //guard for negative indexes
        _elementCounts[_elementCountsPtr++] = _currentRemainingElements;
    }

Sorry I couldn't be more helpful with regard to why it is happening. Though I'd be happy to make a pull request with my described solution. Let me know what you think.

(cbor) Incorrect coercion for int-valued Map keys to String

Looks like CBORParser._numberToName(..) method (about line 790 - 820 depending on branch) has a typo in coercion:

        return String.valueOf(1);

where 1 should be i instead.

Primitive fields inside unwrapped optional value not considered nullable in schema

(from FasterXML/jackson-dataformat-avro#29 by @osi)

When generating an avro schema, given:

class A {
 @JsonUnwrapped
 public final Optional<B> b;
}

class B {
  public final boolean flag;
}

The generated schema for A does not consider the 'flag' field to be nullable.

SmileParser doesn't honor USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING

I'm seeing some odd issues with the wrong strings getting used for some field names. The probability is small, but on a large cluster with many messages it happens frequently enough. I suspect it is some issue with the ThreadLocal buffers because the field names I'm seeing seem to be coming from different messages that are only really connected by the use of jackson and the workers happen to get processed using the same thread pool.

I wanted to disable the use of thread locals to test that theory, but it doesn't look like that is possible for the smile parser. Is it expected that all factories that support the thread local recycling would honor the current USE_THREAD_LOCAL_FOR_BUFFER_RECYCLING flag?

Add support for serializing Avros' `GenericRecord` etc types

While maybe not the biggest problem in the world, there may be use cases where it'd be nice to let users essentially convert data bound in Avro-specific containers (like GenericContainer) into more sensible types (like basic Maps and Collections, say; or just POJOs).
One way to achieve this would be to add custom serializers for said types.

Another related feature would be the reverse (construct these types from streaming input, i.e add JsonSerializers); that can be a follow-up step if need be.

How to use the same ?

Hi Friends,
I was trying to use this library for one of my project, but I don't know how to use the same.
I want to send a Java object form client (written in Java) to server (written in C) and prints the values of object on my server. I tried to use serialisation, but it was also not working. Some one suggested me to use CBOR and I was googling, but not able to find any proper implementation of the same.

I got this library and trying to use this, but I am not able to find any main method or way to use the same.

My final aim is to send the Java object from java client to C server, and print the values of object on my server.

Is it possible with using this library.
If anybody implemented the same kind of stuff, request you kindly provide the suggestion.

Thanks in advance.

[avro] Ignoring built-in annotations

(moved from FasterXML/jackson-dataformat-avro#25)

In using this it seems the mapper is ignoring standard annotations that comes with avro. Fields marked with AvroIgnore are not ignored and it seems everything unions with null making the nullable annotation useless in the schema generation. Those are the two big ones I've witnessed but I'm sure the others do not work either.

[smile] Fail to report error for trying to write field name outside Object (root level)

(note: same as #18)

Current check for writeFieldName() does not prevent write at root level (or possibly even in array?), and happily writes a text value. This should not be allowed as it is likely to result in invalid output.

Note: same problem occurs with CBOR and Smile as well; possibly with other format backends as well.

Decimal support in Avro Schemas

Previously Avro didn't have direct support for Decimal types like BigDecimal in Java. Now it has: Decimal. That means the current implementation which use "double" as a replacement for decimal can be improved.

[avro] Add support for Avro default values

(moved from FasterXML/jackson-dataformat-avro#34 by @turbospaces)

Right now RecordVisitor create Fields without default VALUE, which causes problem when the schema evaluates and client operates with OLD data http://ben-tech.blogspot.com/2013/05/avro-schema-evolution.html

I suggest to change Record visitor to:

@Override
public void optionalProperty(BeanProperty writer) throws JsonMappingException {
  Schema schema = schemaForWriter( writer );
  if ( !writer.getType().isPrimitive() ) {
      schema = AvroSchemaHelper.unionWithNull( schema );
  }
    _fields.add( new Schema.Field( writer.getName(), schema, null,
       writer.isRequired() ? null : Schema.parseJson( "null" ) ) );
}

Exception serializing double[][]

To reproduce it (v. 2.8.3):

        ObjectMapper mapper = new ObjectMapper(new CBORFactory());
        byte[] cborBytes = mapper.writeValueAsBytes(new double[][]{ {1.2323132131} });

I suspect that the problem is in CBORGenerator the line 678, the call to _verifyValueWrite("") has been already performed in the parent method, line 601, but I'm not familiar with the code, maybe there is another cause.

    private final void _writeNumberNoCheck(double d) throws IOException {
        _verifyValueWrite("write number");
        _ensureRoomForOutput(11);

The exception log trace:

Exception in thread "main" com.fasterxml.jackson.core.JsonGenerationException: Array size mismatch: number of element encoded is not equal to reported array/map size.
    at com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:1897)
    at com.fasterxml.jackson.dataformat.cbor.CBORGenerator._failSizedArrayOrObject(CBORGenerator.java:1089)
    at com.fasterxml.jackson.dataformat.cbor.CBORGenerator._verifyValueWrite(CBORGenerator.java:1080)
    at com.fasterxml.jackson.dataformat.cbor.CBORGenerator._writeNumberNoCheck(CBORGenerator.java:678)
    at com.fasterxml.jackson.dataformat.cbor.CBORGenerator.writeArray(CBORGenerator.java:604)
    at com.fasterxml.jackson.databind.ser.std.StdArraySerializers$DoubleArraySerializer.serialize(StdArraySerializers.java:686)
    at com.fasterxml.jackson.databind.ser.std.StdArraySerializers$DoubleArraySerializer.serialize(StdArraySerializers.java:620)
    at com.fasterxml.jackson.databind.ser.std.ObjectArraySerializer.serializeContents(ObjectArraySerializer.java:256)
    at com.fasterxml.jackson.databind.ser.std.ObjectArraySerializer.serialize(ObjectArraySerializer.java:216)
    at com.fasterxml.jackson.databind.ser.std.ObjectArraySerializer.serialize(ObjectArraySerializer.java:26)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:292)
    at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:3672)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsBytes(ObjectMapper.java:3072)

[avro] Add support for Avro annotations via `AvroAnnotationIntrospector`

(moved from FasterXML/jackson-dataformat-avro#40)

Looks like there are a bunch of Avro-introduced annotations, added in Avro-generated Java classes, that could be used as aliases for Jackson annotations similar to how JAXB annotations are supported.

Interest in EXI for JSON processor

Hi,

based on https://github.com/EXIficient/exificient-for-json I use Jackson to convert JSON from/to EXI4JSON (see https://www.w3.org/TR/exi-for-json/).

EXI4JSON is a compact format for JSON based on EXI technology.

I was wondering whether you are interested in such a PR?

Thanks,

-- Daniel

[avro/cbor/smile] Override `JsonParser.isNaN()` if/as needed

As per [https://github.com/FasterXML/jackson-core/issues/314] there's new method in JsonParser.
While default impl may work it's best to verify it works for binary formats since this is not guaranteed, esp. if floats are natively supported.

Deserialization of multiple (root) values from Avro

(note: counterpart of #35)

Since it is possible (and legal from what I have read) to create root-value sequences (not unlike with JSON and similar formats) in Avro, this codec should allow doing the same via Jackson.
Earlier versions will only allow reading of the first such value, but underlying Avro codec does support incremental decoding without problems.

Getting "type not supported as root type by protobuf" for serialization of short and UUID types

Hi,

I'm trying to serialize to following class to protobuf using jackson dataformats:

public class HeaderTest
{
  @JsonProperty("version")
  private short version;
  @JsonProperty("messageId")
  private UUID messageId;


  public HeaderTest(short version, UUID messageId)
  {
    this.messageId = messageId;
    this.version = version;
  }
  public short getVersion() {
    return version;
  }

  public void setVersion(short version) {
    this.version = version;
  }

  public UUID getMessageId() {
    return messageId;
  }

  public void setMessageId(UUID messageId) {
    this.messageId = messageId;
  }
}

With the following code:

ProtobufMapper mapper = new ProtobufMapper();
**ProtobufSchema schemaWrapper = mapper.generateSchemaFor(HeaderTest.class);**
NativeProtobufSchema nativeProtobufSchema = schemaWrapper.getSource();
String asProtofile = nativeProtobufSchema.toString();

The bold line throws this exception:

java.lang.UnsupportedOperationException: 'Integer' type not supported as root type by protobuf

	at com.fasterxml.jackson.dataformat.protobuf.schemagen.ProtoBufSchemaVisitor._throwUnsupported(ProtoBufSchemaVisitor.java:141)
	at com.fasterxml.jackson.dataformat.protobuf.schemagen.ProtoBufSchemaVisitor.expectIntegerFormat(ProtoBufSchemaVisitor.java:112)
	at com.fasterxml.jackson.databind.ser.std.StdSerializer.visitIntFormat(StdSerializer.java:215)
	at com.fasterxml.jackson.databind.ser.std.NumberSerializers$Base.acceptJsonFormatVisitor(NumberSerializers.java:75)
	at com.fasterxml.jackson.databind.ser.std.NumberSerializers$ShortSerializer.acceptJsonFormatVisitor(NumberSerializers.java:103)
	at com.fasterxml.jackson.dataformat.protobuf.schemagen.ProtobuffSchemaHelper.acceptTypeElement(ProtobuffSchemaHelper.java:49)
	at com.fasterxml.jackson.dataformat.protobuf.schemagen.MessageElementVisitor.getDataType(MessageElementVisitor.java:122)
	at com.fasterxml.jackson.dataformat.protobuf.schemagen.MessageElementVisitor.buildFieldElement(MessageElementVisitor.java:86)
	at com.fasterxml.jackson.dataformat.protobuf.schemagen.MessageElementVisitor.optionalProperty(MessageElementVisitor.java:65)
	at com.fasterxml.jackson.databind.ser.BeanPropertyWriter.depositSchemaProperty(BeanPropertyWriter.java:805)
	at com.fasterxml.jackson.databind.ser.std.BeanSerializerBase.acceptJsonFormatVisitor(BeanSerializerBase.java:833)
	at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.acceptJsonFormatVisitor(DefaultSerializerProvider.java:580)
	at com.fasterxml.jackson.databind.ObjectMapper.acceptJsonFormatVisitor(ObjectMapper.java:3641)
	at com.fasterxml.jackson.databind.ObjectMapper.acceptJsonFormatVisitor(ObjectMapper.java:3620)
	at com.fasterxml.jackson.dataformat.protobuf.ProtobufMapper.generateSchemaFor(ProtobufMapper.java:95)

Any ideas why I can't serialize those types? I couldn't find any online resource about that issue.

Thanks

Problem decoding Maps with union values

(reported by a user)

Looks like Avro codec (version 2.7.8) writes nested Maps correctly, but has some issues with decoding, such that entries (of the outermost Map) after first one are dropped. That is, only the first key/value pair seems to be exposed.

[cbor] Fail to report error for trying to write field name outside Object (root level)

(note: same as FasterXML/jackson-core#282)

Note: same problem occurs with CBOR and Smile as well; possibly with other format backends as well.

[protobuf] Add a way to programmatically construct `ProtobufSchema` instances without proto file

(moved from FasterXML/jackson-dataformat-protobuf#7)

Currently the only way to construct a schema needed for decoding and encoding is to use a proto file (or String).
While is the common way of defining and using schemas, it would be useful to be alternatively able to construct schemas programmatically. This would allow use of alternate sources.

Another possibility (for which different issue may be filed) is to build schemas out of general Jackson introspection functionality.

Add support for upcoming packed arrays tags

(originally by neothemachine)

There is a draft RFC to be published soon which adds support for packed/typed arrays in CBOR, see the current version. As one of the original CBOR authors is also an author of this proposal I'm sure it will gain widespread adoption in encoders and decoders, especially because it is such a useful feature and will help greatly when big packed arrays need to be efficiently encoded and also decoded both in terms of space usage and speed.

I thought it would be good putting it on the radar for implementation. I think it will be rather straightforward to implement. I currently don't have time for it myself in the next 3-4 weeks but I certainly will help where I can, reviewing code etc.

[smile] jackson-dataformat-smile 2.8.7 throwing NoSuchMethodError when trying to write POJO as Bytes

2017-03-31 17:35:30,470 (857753712@qtp-1838151277-0) [ERROR: Slf4jLog] Error for / java.lang.NoSuchMethodError: com.fasterxml.jackson.dataformat.smile.SmileFactory._decorate(Ljava/io/OutputStream;Lcom/fasterxml/jackson/core/io/IOContext;)Ljava/io/OutputStream; at com.fasterxml.jackson.dataformat.smile.SmileFactory.createGenerator(SmileFactory.java:351) at com.fasterxml.jackson.dataformat.smile.SmileFactory.createGenerator(SmileFactory.java:27) at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsBytes(ObjectMapper.java:2299)

I have made sure to standardize the version of jackson-annotations, jackson-core, jackson-databind and jackson-dataformat-smile in my project.

Protobuf serialization error for 3 nested level classes

When serializing an object whose class has three nested level, the result is not correct and can not be deserialized to the origin value. Below is the test case to reproduce.

import static org.junit.Assert.assertEquals;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.protobuf.ProtobufMapper;
import com.fasterxml.jackson.dataformat.protobuf.schema.ProtobufSchema;
import com.fasterxml.jackson.dataformat.protobuf.schemagen.ProtobufSchemaGenerator;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import org.junit.Test;

public class BugReport {

  @Test
  public void test() throws Exception {
    ProtobufMapper mapper = new ProtobufMapper();
    ProtobufSchema schema = getSchema(mapper, Level1.class);

    System.out.println(schema.getSource());

    Level1 level1 = new Level1();
    Level2 level2 = new Level2();
    Level3 level3 = new Level3();

    level1.setValue(1);
    level2.setValue(2);
    level3.setValue(3);
    level1.setLevel2(level2);
    level2.setLevel3(level3);

    ByteArrayOutputStream bout = new ByteArrayOutputStream();
    mapper.writer(schema).writeValue(bout, level1);

    showBytes(bout.toByteArray());
    // 08 01 12 04 12 02 08 03
    // the correct result is
    // 08 01 12 06 08 02 12 02 08 03 

    Level1 gotLevel1 = mapper.readerFor(Level1.class).with(schema).readValue(new ByteArrayInputStream(bout.toByteArray()));

//    byte[] bytes = new byte[]{0x08, 0x01, 0x12, 0x06, 0x08, 0x02, 0x12, 0x02, 0x08, 0x03};
//    Level1 gotLevel1 = JacksonProtobuf2Serializer.INSTANCE.deserialize(new ByteArrayInputStream(bytes), Level1.class);

    assertEquals(level1.getValue(), gotLevel1.getValue());
    assertEquals(level2.getValue(), gotLevel1.getLevel2().getValue());
    assertEquals(level3.getValue(), gotLevel1.getLevel2().getLevel3().getValue());
  }

  private ProtobufSchema getSchema(ObjectMapper mapper, Class<?> clazz) throws Exception {
    ProtobufSchemaGenerator gen = new ProtobufSchemaGenerator();
    mapper.acceptJsonFormatVisitor(clazz, gen);
    return gen.getGeneratedSchema();
  }

  private void showBytes(byte[] bytes) {
    for (byte b : bytes) {
      System.out.print(String.format("%8s", Integer.toHexString(b)).substring(6, 8).replaceAll(" ", "0") + " ");
    }
    System.out.println();
  }

  public static class Level1 {

    private int value;

    private Level2 level2;

    public int getValue() {
      return value;
    }

    public void setValue(int value) {
      this.value = value;
    }

    public Level2 getLevel2() {
      return level2;
    }

    public void setLevel2(Level2 level2) {
      this.level2 = level2;
    }
  }

  public static class Level2 {

    private int value;
    private Level3 level3;

    public int getValue() {
      return value;
    }

    public void setValue(int value) {
      this.value = value;
    }

    public Level3 getLevel3() {
      return level3;
    }

    public void setLevel3(Level3 level3) {
      this.level3 = level3;
    }
  }

  public static class Level3 {

    private int value;

    public int getValue() {
      return value;
    }

    public void setValue(int value) {
      this.value = value;
    }
  }

}

[protobuf] . parser fails with /* comment */ [v2.8]

The Proto parser fails to parse /* */ comment. I modified the SchemaParsingTest to include the comment like this

    final protected static String PROTOC_STRINGS_PACKED =
            "message Strings {\n"
            +" repeated string values = 2 [packed=true]; /* comment */\n"
            +"}\n"
    ;

Then run the test

Tests run: 6, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.007 sec <<< FAILURE! - in com.fasterxml.jackson.dataformat.protobuf.SchemaParsingTest
testPacked(com.fasterxml.jackson.dataformat.protobuf.SchemaParsingTest)  Time elapsed: 0.006 sec  <<< ERROR!
java.lang.IllegalStateException: Syntax error in Unnamed-protobuf-schema at 2:45: expected '/'
	at com.squareup.protoparser.ProtoParser.unexpected(ProtoParser.java:903)
	at com.squareup.protoparser.ProtoParser.tryAppendTrailingDocumentation(ProtoParser.java:845)
	at com.squareup.protoparser.ProtoParser.readField(ProtoParser.java:347)
	at com.squareup.protoparser.ProtoParser.readDeclaration(ProtoParser.java:165)
	at com.squareup.protoparser.ProtoParser.readMessage(ProtoParser.java:218)
	at com.squareup.protoparser.ProtoParser.readDeclaration(ProtoParser.java:152)
	at com.squareup.protoparser.ProtoParser.readProtoFile(ProtoParser.java:92)
	at com.squareup.protoparser.ProtoParser.parse(ProtoParser.java:61)
	at com.fasterxml.jackson.dataformat.protobuf.schema.ProtobufSchemaLoader._loadNative(ProtobufSchemaLoader.java:157)
	at com.fasterxml.jackson.dataformat.protobuf.schema.ProtobufSchemaLoader.parseNative(ProtobufSchemaLoader.java:131)
	at com.fasterxml.jackson.dataformat.protobuf.schema.ProtobufSchemaLoader.parse(ProtobufSchemaLoader.java:105)
	at com.fasterxml.jackson.dataformat.protobuf.SchemaParsingTest.testPacked(SchemaParsingTest.java:118)

I know the problem is ProtoParser, and it's deprecated. Are you going to migrate it to Wire Protocol Buffers ? As far as I see the WPB SyntaxReader class knows the comment syntax.

[cbor] Overflow when decoding uint32 for Major type 0

(moved from jackson-dataformat-cbor, by @TianlinZhou)

When decoding unit32 that is larger than 2^31 - 1, overflow will happen.
For example, decoding "1a 8a e8 08 f1" will get -1964504847, but it should be 2330462449.

[cbor] Support parsing of `BigInteger`, `BigDecimal`, not just generating

Currently BigInteger and BigDecimal values are serialized successfully using recommended tag, structure, but parsing does not detect this as special case, and simply exposes 2-element array. But it would make sense to seamlessly support parsing/decoding as well.