It would be an enhancement if the client of hazelcast could plug in a
custom serializer. Sometimes the cient knows best how to serialize objects
as fast as possible, and some librariers out there claim to be faster than
the default java framework.
It could be implemented by adding a serializer interface to the API, and
clients could set a serialzier object into the configuration (preferred),
or the configuration could take a classname and the create on lazily (xml
configuration)
For example
<code>
inteface HzSerializer{
public void serialize(Object object, OutputStream stream);
public Object deserialize(InputStream stream);
}
</code>
or something like that.
Migrated from http://code.google.com/p/hazelcast/issues/detail?id=153
earlier comments
constantin.rack said, at 2010-02-20T15:27:37.000Z:
Issue 219 has been merged into this issue.
constantin.rack said, at 2010-02-23T08:14:21.000Z:
Suggestion from mailing list to use Kyro, may be worth a look:
http://code.google.com/p/kryo/
dtravin said, at 2010-04-04T12:59:27.000Z:
I am looking forward to implement this myself.
Extracting an interface from a class com.hazelcast.nio.Serializer is not a complex
thing to do.
But, what is the way to inject that interface into ThreadContext class ?
ian.phillips said, at 2010-04-04T16:47:28.000Z:
Well, there is an interface already: Serializer$TypeSerializer, it just needs to be made public (and probably
moved to it's own top level file rather than being a nested interface).
The question is how to handle registration of new TypeSerializers and how to tag the data when serialised. As I
see it there are 2 options: require the user to handle this manually (e.g. a registerSerializer(int tag,
TypeSerializer serializer) method) or to try to handle this automatically (e.g. a registerSerializer(TypeSerializer
serializer) method). The latter option could be accomplished either by using a distributed map or even just a
simple counter (which would need to be protected with a distributed lock).
I think that I prefer having the user handle tagging manually as it is much simpler, and the other options
could always be implemented on top of this in user code so they're something that could be added at a later
data if there was enough interest.
dtravin said, at 2010-04-05T19:55:21.000Z:
I do not get your point and my question was how to inject a serializer interface into
ThreadContext.
At this moment serializer is instantiated once as a final field of ThreadContext in a
static method get() and there are 56 usages of that method in code.
Config c = new Config();
HazelcastSerializer myCustomWhatEverSerializer = ......
c.getSerializerConfig().setSerializer(myCustomWhatEverSerializer);
HazelcastInstance hz = HazelcastInstance.newInstance(c);
That is how I see the usage.
Please, explain me your vision in a bit detailed mode.t
ian.phillips said, at 2010-04-08T15:07:19.000Z:
First off some background thinking; as I see it there are a number of ways of allowing custom serializers and 2
main things that are affected:
(a) how the data will be tagged on the wire; and
(b) how the serializers will be incorporated into the current API.
Let's look at (b) first as it's by far the thornier issue (I know, I've had a go at implementing this and it is
tricky!). First off: I don't think that Serializer should be part of the ThreadContext class. If we assume that a
given JVM can have multiple HazelcastInstance objects (which it can) and that these can be connected to
different clusters then holding serializers in thread local storage isn't going to work.
Another problem lies with the fact that currently much of the Serializer code is static and this also won't work
in the presence of multiple Hazelcast instances.
So, some design decisions need to be made:
-
all of the Serializer code needs to be made non-static;
-
the Serializer instances need to be stored in the HazelcastInstance (FactoryImpl or HazelcastClient),
presumably adding a getSerializer() method to the HazelcastInstance interface, this will probably still need to
be handled in a thread local manner;
-
the factory needs to create & inject the Serializer instance based on the configuration provided, I don't think
that it is an unreasonable expectation that all members of a cluster use the same serializer configuration, but
there are probably also ways around this if need be.
The serializer could be handed down to where it is needed probably by attaching it to the Node instance that
the factory creates, although I haven't checked this in detail.
One huge glaring open question: how will clients learn about custom serializers given that they do not
currently have access to the config data? One option could be to make the SerializationConfig class
serializable and then load it from whichever cluster member the client connects to.
OK, I've gone on for quite a while now, so I'll give you a chance to air your thoughts on the matter ;-)
ian.phillips said, at 2010-04-08T15:10:24.000Z:
By the way: if oztalip or fuad have any comments on my proposed approach I'd be interested to hear them:
does this sounds reasonable to you guys?
does my primitive approach to client handling sound suitable for a first draft?
can you think of any issues that I'm missing?
and, I guess, given that this is sounding like a fairly intrusive change would you still be interested in receiving
patches for this? one big patch or break it up into smaller ones?
dtravin said, at 2010-04-13T07:54:52.000Z:
Hi, Ian
I have started a refactoring just to move serializer to FactoryImpl and met some
obstacles.
- The use of factory is almost everywhere done by accessing public field in Node.
To my mind node.factory.getSerializer() is a bit ugly.
- import static IOUtils.toData and toObject is used in many places
I had to change signatures of those methods to accept serializer and to adapt the all
places where it is used.
I can make it work, but this looks weird.
Talip and Fuad, do you have any comments?
ian.phillips said, at 2010-04-13T14:42:01.000Z:
I took a slightly different approach - I added Serializer a field on the node class, and added corresponding toData
and toObject methods. As you spotted, it turns out that most of the places that serialization is used the is a node
instance handy, and node.toData/node.toObject looks just fine.
I deleted the static methods on IOUtils, and also the Serializer from ThreadContext.
I've got a couple of outstanding issues to resolve then I'll post a patch here, hopefully this evening sometime.
ian.phillips said, at 2010-04-15T13:03:45.000Z:
Hi dtravin,
Hmm, this is getting complicated!
I'm not sure how to go about implementing client serializers right now. The issue is that the client code does
not reference the core hazelcast module, as I see it there are 2 options, neither trivial:
a) separate out some of the I/O code into a new module which would be used by both the hazelcast and
hazelcast-client modules; or
b) force users to write 2 versions of each custom TypeSerializer (this could be simplified if the user was
prepared to depend on the hazelcast module from their custom serializer).
I'd be reasonably happy with (a), the new module could also hold the test support classes, then hazelcast-
client would not need to depend on hazelcast at all, which seems like a nice bonus.
I'm going to attach a partial patch here to illustrate where I'm going with my changes - I haven't included all of
the files here but rather that subset which I think illustrates the relevant changes.
As well as this there are a number of changes to other files to use the HazelcastClient/Node serialzer rather
than the static one which no longer exists, and some updated to the test suite - these aren't included here as it
makes the patch a bit too big to scan easily.
Talip and Fuad, still interested in hearing your thoughts on the matter.
Cheers,
Ian.
oztalip said, at 2010-04-15T13:06:42.000Z:
Sorry for not being responsive. I started looking at it. I will get back to you with details very soon.
-talip
oztalip said, at 2010-04-15T13:38:45.000Z:
Ian,
Quick note: In your implementation (patch) each hazelcast instance has its own Serializer and all user threads are
actually using the same Serializer instance but the default Serializer is not thread-safe, it is using the same
none-thread-safe FastByteArrayOutputStream instance for example.
ian.phillips said, at 2010-04-15T13:40:34.000Z:
Hi Talip,
No problem at all - I just find myself with some unexpected free time today due to a flight being cancelled.
Going back to my shared module approach (option a from my last comment) the attached file is a first stab at
what would need to be separated out into a common module, it may be possible to reduce this list of files
with some closer analysis - this is just a naive approach based on moving files until the module builds.
If you do want to take that approach it should be possible to unify the client and cluster
Serializer/TypeSerializer implementations, possibly changing the interface from this:
public interface TypeSerializer {
boolean isSerializable(Object object);
void write(FastByteArrayOutputStream bbos, T obj) throws Exception;
T read(FastByteArrayInputStream bbis) throws Exception;
}
to this:
public interface TypeSerializer {
boolean isSerializable(Object object);
void write(FastByteArrayOutputStream bbos, T obj, boolean client) throws Exception;
T read(FastByteArrayInputStream bbis, boolean client) throws Exception;
}
Anyway, just jotting down some thoughts for you at this stage.
Cheers,
Ian.
ian.phillips said, at 2010-04-15T14:30:20.000Z:
Fixed, it's ThreadLocal on Node now. I'm also making some more changes to my version and will upload a new
patch later today.
Cheers,
Ian.
dtravin said, at 2010-04-24T18:58:25.000Z:
Hey, Ian
Where is your final patch?
I want to see it in action
Daniel
drew.botwinick said, at 2010-04-27T20:12:29.000Z:
I'm new to this project (and just recently wrote this to the group thread discussing
this issue), but based on reading the comments in this issue, this is really becoming
messy. I know java serialization is unpopular, but by using readResolve() and
writeReplace(), you can substitute a different "container" object that itself can be
Externalizable and make a much simpler interface that works on top of java
serialization. It's really easy.
public interface SerializableData extends Serializable {
public Object writeReplace() throws ObjectStreamException;
}
public interface SerializedDataContainer extends Externalizable {
public Object readResolve() throws ObjectStreamException;
@Override
public void writeExternal(ObjectOutput out) throws IOException;
@Override
public void readExternal(ObjectInput in) throws IOException,
ClassNotFoundException;
}
You can use this approach with java serialization backed by any serialization
"engine" the user chooses. You get 99% the performance of the back-end serialization
engine and barely more overheard than the back-end serialization engine. More
importantly, it's ridiculously simple and integrates with anything that uses java
serialization. That means it's portable AND doesn't require any special
considerations on the part of the library (i.e. hazelcast, in this case).
-Drew Botwinick
ian.phillips said, at 2010-04-28T12:51:15.000Z:
Hi Drew,
Sure, the writeReplace/readResolve mechanism is really useful, but… one of the issues with Java serialization
that I've been thinking about (and Daniel, this is the main reason I've not uploaded the full patch) is that it's
Java specific, and one of the stated goals for Hazelcast is to support non-Java clients. Using Java serialization
as a portable object format (POF) strikes me as a little clunky.
I'm currently thinking of this as the interface into the POF system: we have a POFService (which fulfils basically
the same rôle as the current Serializer class) and a POFContext which holds all of the thread local buffers and
has methods to read and write data, like so
interface POFService {
Data toData(Object object); // resets buffers, writes tag
Object toObject(Data data);
}
interface POFContext {
// avoids broken UTF implementation in DataInput/DataOutput
// good for the .NET and other clients
void write(Object object);
<T> T read(Class<T> type);
// ... methods for primitive types ...
void writeAll(Iterable iterable);
void writeAll(Map map);
void writeAll(Object[] array);
<T, C extends Collection> C readAll(Class<T> type, C into, boolean includeTags);
<T, M extends Map> M readAll(Class<T> type, M into, boolean includeTags);
<T> T[] readAll(Class<T> type, T[] into, boolean includeTags);
}
and an implementation of these, similar to the Serializer impl in my previous patch
POFServiceImpl implements POFService, POFContext {
ThreadLocal contexts = … ;
Map<POFSerializer, Integer> typeToIdMap = … ;
Map<Integer, POFSerializer> idToTypeMap = … ;
}
and we still need a TypeSerializer (renamed for consistency):
interface POFSerializer {
void write(T object, POFContext context);
T read(POFContext context);
}
then a class which uses this would be defined like so
public class Employee {
private final int employeeNumber;
private String name;
private int age;
private double salary;
private Address address;
// no need for a default constructor
public Employee(int employeeNumber) {
this.employeeNumber = employeeNumber;
}
// getters, setters …
public static class Serializer implements POFSerializer<Employee> {
public void write(Employee e, POFContext context) {
context.write(e.firstName);
context.write(e.lastName);
context.write(e.age);
context.write(e.salary);
context.write(e.address);
}
public Employee read(POFContext context) {
int employeeNumber = context.read(Integer.class);
Employee e = new Employee(employeeNumber);
e.name = context.read(String.class);
e.age = context.read(Integer.class);
e.salary = context.read(Double.class);
e.address = context.read(Address.class);
}
}
}
and could be configured like so
com.example.Foo.Serializer
com.example.io.FooSerializer
com.example.Bar.Serializer
com.example.io.BarSerializer
I'll probably have a crack at coding this up over the coming long weekend (when I'll be away with spotty
internet cover, so it'll give me something to do :-)
I'm interested to hear what peoples thoughts are w.r.t. the cross-platform/language possibilities.
/Ian.
drew.botwinick said, at 2010-04-28T18:14:44.000Z:
Hi Ian,
I forgot to consider cross-language compatibility... It is certainly true that it'd
be better to avoid some of java serialization's quirks for a "POF". (It would be
messy and ridiculous, although I suppose somewhat useful, to have a "java
serialization interpreter" for .net.) With that in mind, I like your proposal.
I also like the idea of using an integer to tag the class/serializer (but unlike
Kryo, defining the integer in config so that it is more portable). This essentially
amounts to your own serialization mechanism, but I think that might be necessary for
a solid cross-language mechanism.
This approach would require implementing the POF system on every target language, but
that'd probably be the most consistent solution. I'm sold.
Good work! :-)
-Drew
P.S.>> You forgot to return the new Employee in Employee.Serializer.read(...) :-P
j.gonon said, at 2011-02-15T15:26:17.000Z:
Hi,
I'm new here and like to help with serialization.
What I'm currently using is an interface looking like "POFContext" but "InputStream" and "OutputStream" objects are passed as parameters.
I don't understand the use of "POFService".
I agree on the fact that "id <-> type" should be found in a configuration file.
noctariushtc said, at 2011-05-29T14:55:30.000Z:
Maybe that patch could be an idea how to do it (sorry just missed that issue when I initially opened the new one):
http://code.google.com/p/hazelcast/issues/detail?id=571
paul.woodward said, at 2011-10-26T08:19:45.000Z:
This issue has been open for 2 years now, what (if any) are the plans for support for custom serializers?
fuadmalik said, at 2011-10-26T17:02:40.000Z:
We have made some changes to support custom serialization. Even with some tweaks I was able to plug the Protobuf. But still we need to work on and shape the API.
mehmetdoghan said, at 2011-11-22T08:14:37.000Z:
Issue 571 has been merged into this issue.
noctariushtc said, at 2011-12-07T15:33:04.000Z:
Just want to point at the possible patch posted in issue 571.