Git Product home page Git Product logo

marshal-2's Introduction

Marshal data serialization format and library.

A Marshal is a tuple that knows how to serialize and deserialize itself. Marshals can contain nested marshals and byte arrays as will as basic primitive types. In addition, type information is stored for each object, allowing for typed reconstruction of known types, pretty printing, and stable sort orders without mixing of types in the sorting.

Lexicographical Format

This is the "native" mode for the marshal library, the reason for its existence. In this format, the serialized byte arrays can be lexicographically ordered, and the ordering will match the sorted order of the tuples.

This is useful in systems such as HBase that deal only with byte arrays.

The format looks a bit like this:

[type] [data] [SEPARATOR] [type] [data] [SEPARATOR] ... [SEPARATOR]

Primitive Types

Marshal m1 = Marshal.builder()
    .addByte((byte)1)
    .addDouble(3.14)
    .addString("Hello, world!")
    .addInteger(42)
    .addLong(1337l)
    .build();

byte[] s1 = m1.toBytes();
Marshal m2 = Marshal.fromBytes(s1);

byte b = m2.getByteAt(0);
double d = m2.getDoubleAt(1);
String s = m2.getStringAt(2);
int i = m2.getIntegerAt(3);
long l = m2.getLongAt(4);

Nested Types

byte[] bytes = { ... };
ByteArray array1 = new ByteArray(bytes);
Marshal m3 = Marshal.builder()
    .addByteArray(array1)
    .addMarshal(m2)
    .build();

byte[] serialized2 = m3.toBytes();
Marshal m4 = Marshal.fromBytes(serialized2);

ByteArray array2 = m4.getByteArrayAt(0);
Marshal m5 = m4.getMarshalAt(1);
// m5 is the same as m1/m2, so we can access nested objects
double d2 = m5.getDoubleAt(1); // 3.14

Prefixes

When serialized, the byte arrays can be sorted. Prefixes can be used for queries into databases such as HBase. Prefixes come in two forms: terminated, and unterminated.

Marshal m6 = Marshal.builder()
    .addByte((byte)1)
    .addDouble(3.14)
    .addString("Hello")
    .build();

// takes 3 elements from the marshal
// terminates the prefix with a separator byte
// p1 is not a prefix of s1, since strings are not equal
byte[] p1 = m6.prefixTerminatedBytes(3);

// p2 is a prefix of s1, since the last string is a prefix of the last
// string in m1
byte[] p2 = m6.prefixUnterminatedBytes(3);

Writable Format

The lexicographical format is great when you need prefixes, but it can be inefficient to encode nested structures when sorting and prefixes are not needed. Thus, the writable format provides an alternative, faster serialization mechanism that does not have the same properties, but can still decode and encode marshals using an alternative serialization mechanism.

Writing:

DataOutput output = ...;
m1.write(output);

Reading:

DataInput input = ...;
Marshal m7 = Marshal.read(input);

Hadoop

For convenience, since the writable format is normally used with Hadoop, a MarshalWritable is provided as a separate subpackage. It uses the writable serialization mechanism behind the scenes.

Exceptions

All marshal encoding and decoding operations will throw a MarshalException upon failure. For convenience, MarshalException extends IOException.

Reading and writing in the writable format will throw an IOException if the underlying data source or sink throws an exception.

Build Process

./gradlew clean build

Please use --no-ff when merging feature branches.

Pushing the master branch will run the CI process, which will upload new JARs to artifactory.

Benchmarks

Run JMH benchmarks with:

./gradlew clean jmh

Results will be under marshal/build/reports/jmh

marshal-2's People

Contributors

bvargo avatar xorlev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.