The job is popped directly from the queue, so if the worker fails, the job is totally

here you go. for list: <a href="http://code.google.com/p/redis/w

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

worker failure causes job to be lost... about resque HOT 10 CLOSED

skippy commented on May 14, 2024

worker failure causes job to be lost...

from resque.

Comments (10)

defunkt commented on May 14, 2024

If a worker dies how? Please provide an example.

Workers already place jobs into Redis keyed by their name. That is how you know which job a worker is processing.

from resque.

defunkt commented on May 14, 2024

Keep in mind that Resque is explicitly designed to never re-try jobs. Ever, under any circumstance.

There is no way to know how much of a job has been processed when a systemic error occurs. Yes, you may know you started processing a job, but how far did you get? Perhaps the error occurred after you completed the job but before you could release it. Perhaps the error occurred immediately after you reserved the job. Perhaps the error occurred halfway through processing.

If you need jobs to never fail and never slip through the cracks due to failure you may want Kestrel, SQS, beanstalkd, or Delayed Job, all of which either reserve or retry jobs.

from resque.

skippy commented on May 14, 2024

thanks for the response. I agree that a job shouldn't be retried. That is fair.

as for use-cases:

examples/god/stale.god will cause jobs to be lost
I had Mongo get locked and drop a connection, which caused the worker to stall, and the child was then cleaned up by god... with the result being a lost job.

Would you be open to accepting a patch that put jobs which workers have died into the error queue?

Thanks defunkt,
Adam

from resque.

dibyajyoti commented on May 14, 2024

Hello skippy

I'm facing exactly the same scenario which u've touched upon.

In my case, the retry logic is a part of the Rails application which is using resque. But it will be seriously helpful to have a dedicated queue where the semi processed work chunks can be placed similar to DLQ ( dead letter queue) in Apache ActiveMQ.

Another dedicated worker can actually look into this error queue and reschedule the chunks.

@defunct : would you consider implementing this error queue feature in Resque?

             Another feature that would've been really useful is a bit of fault tolerance built into the workers since they exit the moment redis server stops. In a networked scenario ensuring connectivity 100% of the time is difficult if not impossible and I face network outage very frequently. A bit of fault tolerance to workers might be helpful.

Thanks All.
dg

from resque.

trungpham commented on May 14, 2024

So what was the decision with this issue?

I really want to have a fault tolerance behavior. It should somewhat behave like amazon SQS.

First, temporary pop the job out of the queue.
Process the work.
Then permanently remove the job from the queue.

Never depend on the worker to requeue the job because anything can go wrong with it.

I believe that we can do everything on Redis server.

Instead of popping the job, we should atomically move the job from one main job queue to a temporary job queue. So after the worker is done with the job, it can remove the job from the temporary queue.

What do you think of the suggested implementation? Or do you think fault tolerance is not that important?

from resque.

defunkt commented on May 14, 2024

I'd love for the ability to handle Redis server failures.

If Redis supported the ability to atomically move an item from a list to a single key, that would be useful as it matches Resque's current design.

from resque.

trungpham commented on May 14, 2024

here you go.

for list:
http://code.google.com/p/redis/wiki/RpoplpushCommand

for set:
http://code.google.com/p/redis/wiki/SmoveCommand

from resque.

trungpham commented on May 14, 2024

@defunkt
I don't think we can do anything about redis server failure.
This only protects worker failure. In my opinion, I think the worker will fail a lot more often than the redis server, simply because there are many of them out there.

If we move the in progress job to a temporary list then we gain another UI feature, showing a list of in progress jobs to the user. :)

from resque.

trungpham commented on May 14, 2024

then there's another problem you have to solve.

if redis server goes down and the worker completes the job, but now it won't have a way to remove the job from the temporary list. What do we do in this case?

Blindly assume all the jobs completed successfully and wipe out the temporary list when redis comes back online? Or tell the worker to enter a retry loop until it can connect to redis again?

from resque.

hackhowtofaq commented on May 14, 2024

Why is this closed?

Did you find a solution on the problem?

Is there a best practice for this?

from resque.

worker failure causes job to be lost... about resque HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent