Git Product home page Git Product logo

Comments (5)

mtth avatar mtth commented on August 15, 2024

Hi @acromarco. Reading data across compatible schemas requires a resolver. See #383 (comment) for more information and an example.

from avsc.

acromarco avatar acromarco commented on August 15, 2024

@mtth

Reading data across compatible schemas requires a resolver.

Thank you for your answer! Somehow I thought/expected that "basic schema evolution" works out the box.
For me the documentation at https://github.com/mtth/avsc/wiki/Advanced-usage#schema-evolution created the impression that creating a resolver is only needed for special cases for increasing performance.
Would it be technical possible using Avro to allow schema evolution without creating resolvers?

When I create a resolver it's now possible to read the old schema but not anymore the new schema:

const resolver = typeVersion2.createResolver(typeVersion1);

// works fine now, cool :-) !
const deSerialized2 = typeVersion2.fromBuffer(buf, resolver);
expect(deSerialized2).toEqual({ ...dummyObjectToSerialize, newField: 'myDefault' });

const dummyObjectToSerialize2 = { name: 'Albert', newField: 'myValue' };
const buf2 = typeVersion2.toBuffer(dummyObjectToSerialize2);

// works fine
const deSerialized3 = typeVersion2.fromBuffer(buf2);
expect(deSerialized3).toEqual(dummyObjectToSerialize2);

// throws "trailing data" error :-(
const deSerialized4 = typeVersion2.fromBuffer(buf2, resolver);
expect(deSerialized4).toEqual(dummyObjectToSerialize2);

Reading a buffer from a new schema using the resolver results in an "trailing data" error :-(.

So, what is the recommended way to decode an buffer that can be from multiple schema versions?
Is it necessary to add a kind of "schema-version" field in order to use a resolver or not?
This could get messy after some iterations of schema evolution.

Also what should I do when the reader don't know about the new schema? Just imagine an old client that tries to read data that is written with a new extended schema?

// throws "trailing data" error :-(
const deSerialized5 = typeVersion1.fromBuffer(buf2); // buf2 contains new extended schema
expect(deSerialized5).toEqual(dummyObjectToSerialize);

This fails also with the "trailing data" error. I would expect that this works automatically because all required fields are in the data. How should an old client knows about a new schema version in order to create resolvers?

Sorry for all the "dumb" questions. I'm new to Avro and maybe my expectations are wrong.

from avsc.

acromarco avatar acromarco commented on August 15, 2024

@mtth

Is the following explanation correct?

Decoding avro-encoded data requires to know exactly the same schema that was used for encoding.
This is caused by the efficient binary nature of avro.
The encoded data doesn't contain enough "structural" or "metadata" that would allow a mapping/decoding to a slightly different (compatible) schema like one with additional optional fields.
Therefore, a decoding client must create a resolver which is created from the encoding schema and the actual compatible client schema.
This means that in practice, to support reading data from multiple and possibly unknown compatible schemas , the avro-encoded data needs to be accompanied directly by the encoding schema or a schema version and a method to look up the corresponding encoding schema (e.g. schema registry). Such a schema version must be provided outside the actual avro-encoded data, because otherwise there would be no way to read it.

from avsc.

mtth avatar mtth commented on August 15, 2024

Yes, that's right.

from avsc.

acromarco avatar acromarco commented on August 15, 2024

Thank you!
I will close this issue now, as it was never a bug but just my misunderstanding of how Avro works.
However, maybe it's possible to make the documentation in the future more foolproof by:

from avsc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.