In order to store job data in the queue, it needs to be serialized and deserialized fo

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Reinventing the wheel with custom serialization? about mosquito HOT 4 CLOSED

mosquito-cr commented on May 18, 2024

Reinventing the wheel with custom serialization?

from mosquito.

Comments (4)

robacarp commented on May 18, 2024

@straight-shoota yep, and I agree. The serialization system is both redundant and complicated. You probably shouldn't be using the mosquito custom serializers. Ideally you shouldn't have to.

Most importantly, mosquito should not be used as a canonical data store. You shouldn't be storing full objects as parameters in your jobs -- whatever your real data store is, send the keys to your objects into a job instead of trying to serialize the entire object. The first task in your job should be to read any objects from the data store. Doing it this way accomplishes three things: it keeps the mosquito data footprint low, prevents stale objects (and thus decreases the probability of a distributed race condition), and maintains the single responsibility principle (let your ORM do the ORM-ing, and mosquito can handle the job scheduling). The current, unrefined method is left as such to be a kind of syntactic salt. The feature is there, if you really want it, but you probably shouldn't be using it. It's not a hard rule, it's just the choice I made. There is certainly some iteration to be done even within the paradigm.

Secondly, as you mention, mosquito architecturally gets in the way here. The metadata and job parameters are currently stored in the same redis hash. If it's used with a simple set of parameters as recommended above, this isn't a big problem, but even the recently implemented throttling code exposes some of the ways this is not going to be a scalable paradigm. Storing json in redis instead of using redis built in hashes feels like reinventing an altogether different wheel, but I digress.

An early iteration of mosquito just serialized and stored all ivars, and it was fine but it wasn't flexible. There are plenty of uses for ivars which are explicitly not serialized and blacklisting is less intuitive than whitelisting.

Would you mind posting or linking to some job code? I'd love to see some of the ways it's being used and document specific situations where the current framework is getting in the way.

from mosquito.

straight-shoota commented on May 18, 2024

Yes, of course, jobs arguments shouldn't store any state. But job configuration can include custom datatypes (such as enums or value objects) which use only simple values but there are no default serialization handlers. Setting up JSON serialization for such types is trivial and reusable.

Storing json in redis instead of using redis built in hashes feels like reinventing an altogether different wheel

That's actually a very common pattern and considered good practice because JSON is much more flexible than redis hashes. Similar applications use this method. Sidekiq for example stores jobs as JSON.

Redis hashes are better when you need to access single hash keys, but that's not an issue for a job que because it always reads and writes the entire job config. In fact it might even be faster to use JSON serialization.

As an example, a serialized mosquito task in JSON could look like this:

{
  "enqueue_time": "...",
  "type": "...",
  "retry_count": "...",
  "job_data: {
    // serialized Job instance
  },
}

You wouldn't have to worry about how jobs are serialized. The Job type just needs to be able to serialize to JSON, which is usually trivial to accomplish with stdlib's tools.

Some examples are the jobs for shardsdb. They have a slim layer of indirection, because I was experimenting with different job processors and didn't want to adapt the individual jobs for each API and put taskmaster in between. But for the job definition this is completely irrelevant. Taskmaster::Job only includes JSON::Serializable. But that makes it really dead simple. You don't need to specify anything in a job implementation. No param declarations. It's just a plain Crystal type with an initializer receiving some ivars and automatic JSON serialization. ImportShard usses an ivar with a custom data type, but doesn't require any special serialization. It just works.
It's also very comfortable to write specs for these jobs: I'm simply using a mock adapter which collects all created jobs and can easily validate the Job data (example). I don't need any redis server for this, not even deserialize the jobs, JSON is just universally usable.

from mosquito.

robacarp commented on May 18, 2024

I think storing the job metadata as a json hash is a nonstarter... functions like hincrby are atomic when you use redis hashes and wildly prone to distributed race conditions when you have to read, increment, and write the whole object. Mosquito's viability as a distributed megaprocessor is still far from realization, but this would make it difficult to accomplish. For example, imagine sending 10k emails with a dozen servers, but being forced to use a 3rd party api which has a hard rate limit? Rate limiting is on my mind because it's the most recent feature addition but the ability to ask redis to atomically alter a member of a hash is not worth compromising.

Regardless, for the purposes of any user of the toolchain, the job metadata storage mechanism is irrelevant anyway. Dividing the two into separate places in redis is a prerequisite for that.

I agree that the task parameters should be more flexible and that code for serializing is outside the bounds of mosquito. I believe I wandered a bit far during the implementation. Honestly, my immediate temptation is to just punt on serialization altogether and only accept string parameters but some experimentation with the newer json stuff is worth a shot.

from mosquito.

straight-shoota commented on May 18, 2024

Sure, job config and task metadata are independent of each other. Task metadata can be stored as a redis hash and one such metadata value could simply be a JSON string containing the job config.

As a side note, I don't really follow your argument for using redis hashes. Maybe it's because I don't know your future plans. But sidekiq uses JSON serialization and that seems to work pretty well with thousands of job executions per second.

But whatever, metadata storage is an implementation detail (and easy to change in either direction, should the need arise). I care mostly about making job serialization easy for the user.

from mosquito.

Reinventing the wheel with custom serialization? about mosquito HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent