lantins / resque-retry Goto Github PK
View Code? Open in Web Editor NEWA resque plugin; provides retry, delay and exponential backoff support for resque jobs.
License: MIT License
A resque plugin; provides retry, delay and exponential backoff support for resque jobs.
License: MIT License
Hello,
I am using resque-retry to manage the retry of some Tasks. When one Task have an exception the Task is retried as expected. But then I tried to just kill the worker in the middle of the work. The task is not retried.
I was wondering if there is any way to retry a Job in that case. If the Worker is working on a Task and that worker dies โ or simply there is a connectivity problem between the Worker and the DB โ while doing it the Task should be retried.
Does it sounds feasible?
Thanks
I included resque-retry, started resque-scheduler, and extended ExponentialBackoff
class PersistMessageBodyJob
extend Resque::Plugins::ExponentialBackoff
@queue = :persist_message_bodies
def self.perform mail_hash
#my code
end
end
But none of my failures seem to retry. How would i know if they did?
Also, I do not see the new tabs within resque-web for resque-scheduler. Am i missing something?
thx.
Following is a piece of excerpt in the README
require 'resque-retry'
require 'resque-retry/server'
# require your jobs & application code.
run Resque::Server.new
I'm using this within a Rails3 app and when I go to the Retry tab, its empty though there are lists of jobs failed.
The commented line above #require your jobs & application code
doesn't make any sense to me.
How and where to place that in a rails app?
And what might be the reason that I don't see any jobs in the Retry tab ever?
Trying out resque-retry
. Bundled the gem (0.2.2) and triggering a test job from console. Even though the job is supposed to retry three times, it seems to retry infinitely.
Job:
class NewTestJob
@queue = "low"
extend Resque::Plugins::Retry
@retry_limit = 3
@retry_delay = 3
def self.perform(text)
raise "#{Time.now} - #{text}"
end
end
Triggered once like so:
Resque.enqueue NewTestJob, "banana"
And it fails and fails and fails forever: http://cl.ly/1g3v1f0q3y0T3w3q2j0g
That is without the retry error backend. Using the backend makes no difference.
We use edge resque-scheduler
, resque-lock
and resque-cleaner
, if that has any bearing.
Any insight?
It would be pretty nice if, when an exception finally does get raised, that the message that appears in redis mentions something about the retry behavior of the job. Somewhere in the message/exception include something like:
Retried X times since Y
-or-
Retried at A, B, C.
It'd be nice if a job hierarchy could be established with a default resque-retry configuration specified on the base class. Currently resque-retry configuration does not cascade down to subclasses, requiring each class to specify all configuration in full.
I noticed a few exponential backoff jobs with a retry count of 0 scheduled to retry in a few minutes. My thought was that exponential backoff jobs retry right away the first failure and I should only see jobs within the delayed resque scheduler tab with a retry count of >=1.
thx.
-karl
Hi,
I updated my environment to resque 1.21.0, resque-scheduler 2.0 and resque-retry 1.0.0.a
Now the Exception/Backtrace columns show this message: "n/a - not using resque-retry failure backend"
But checking the redis, the failure* is present there.
In this code, if there is a SystemCallError, it will only retry once. That is because a retry_limit
of 1 is still respected.
class DeliverSMS
extend Resque::Plugins::Retry
@queue = :mt_messages
@retry_exceptions = { NetworkError => 30, SystemCallError => [120, 240] }
def self.perform(mt_id, mobile_number, message)
heavy_lifting
end
end
we had many failed jobs, after clearing them all the retry keys were still in redis. after a certain amount of key the web interface is unusable (10000 keys or more). They should be clearing automatically or with a button, if possible.
I really cannot be sure to which project this bug relates to, but I'll start here.
We have a class, which is using resque-retry with ExponentialBackoff plugin and MultipleWithRetrySupression exception backend in the following way:
require 'resque-retry'
require 'resque/failure/redis'
class WorkerClass
extend Resque::Plugins::ExponentialBackoff
Resque::Failure::MultipleWithRetrySuppression.classes = [Resque::Failure::Redis]
Resque::Failure.backend = Resque::Failure::MultipleWithRetrySuppression
@queue = :foobar
# Retry strategy 2m 8m 24m ~1h ~2h ~6h ~14h ~1,5d ~3d ~7d
@backoff_strategy = [120, 480, 1440, 3840, 9600, 23040, 53760, 122880, 276480, 614400]
def self.perform
#fast pinging here
end
end
We have a modified Resque where we don't fork at all and we run the workers in the same process using Ruby fibers for concurrency. For Redis we use em-redis library, set in Resque.redis initialization.
The code processes a huge amount of jobs every day and if the job fails, we want to retry it. Sometimes the retrying works as it should (with the correct timespans), but randomly it just doesn't retry at all or stops retrying too early. There are also cases, where we have 20-30 retries for a single job.
The worker's job is to ping 3rd party servers storing the response. If the response is not successful, our ensure block raises an exception and stores the result, so in a perfect world the retry plugin would catch this and delay a new job.
I have a feeling it might also relate to the weird exception handling in the fibers and EventMachine...
Is this supposed to happen: http://cl.ly/3adf0748c51b1e6f3038 -- "Showing 0 to 20 of 4 jobs" on the Failed Jobs tab? Should resque-retry try to surpress incrementing the failed job count until the job actually fails?
I had forgotten to put require 'resque_scheduler' in my app and I didn't realize I was getting undefined method enqueue_in' for Resque Client... I think that there should probably be an explicit
require 'resque_scheduler'` in lib/resque-retry.rb as I've done in this commit:
http://github.com/agibralter/resque-retry/commit/27d8186b25ed4674e990fc5ca790186bb69427eb
I'm trying to use
Resque::Failure::MultipleWithRetrySuppression.classes = [Resque::Failure::Redis]
Resque::Failure.backend = Resque::Failure::MultipleWithRetrySuppression
to suppress retries showing up as failures in resque web. However, I'm still seeing failures in resque web.
My configuration is
@retry_limit = 1
@retry_delay = 30
class DelayError < StandardError; end
@retry_exceptions = [DelayError]
When I raise DelayError, I get a failed job entry in resque web with Exception of type DelayError. Should this be getting suppressed with the above configuration ?
Hi,
I gave @retry_limit = 3, @retry_delay = 5 and wrote a failer code in perform method, as a result i am expecting it should retry the job 3 times. But its not at all retrying to execute the job on failure.
Once i remove the @retry_delay = 5 entry from code its working fine.
require 'resque-retry'
class WordAnalyzer
extend Resque::Plugins::Retry
@Queue = "word_analysis"
@retry_limit = 3
@retry_delay = 5
def self.perform(word)
puts "About to do heavy duty analysis on #{word}"
jobProcessing # added to fail the job.
# this would be something impressive
puts "Finished with analysis on #{word}"
end
end
i am using resque 1.17.1 and resque-retry 0.2.2
resque-retry stores a copy of the stacktrace on all failures it's retrying. In hindsight, this makes a lot of sense. But I've also watched it take down two different clusters. In both cases, something went awry, exceptions piled up, redis filled up with keys from resque-retry, redis ran out of memory, redis restarted loading an old DB, those same jobs started processing again, and the whole process was repeated.
For a small hit in CPU, the stacktraces can be compressed pretty well using zlib and conserve RAM for redis.
Fill in any missing yardoc comments.
Main offending file is multiple_with_retry_suppression.rb
.
Also need to document the new @retry_job_delegate
setting.
Hello @lantins ,
When we retry a job and it succeeds why is it still under Failed jobs?
Is there a way to make it go away from the Failed jobs list?
Do we need to use any plugins for that?
Thanks,
Caglar
I want to take specific action in my code depending on the current retry_count I am at.
Are there any examples of how to read the current retry_count and do A instead of B based on where its at?
Hi,
When using backoff_strategy like @backoff_strategy = [0, 10, 20] when it reaches 10 it sets the rety time stamp to 4 hours from now.... I'm using rails 3.1.3
Is there a way to get the current attempt it's on (3rd try, etc) in the .perform method?
Thanks!
I ran into this problem when deploying code and a worker running old code was still running. It tried to run my new job, the class of which was (e.g.) MyNewJobClass
, but MyNewJobClass
didn't exist in the old code, so it throws a NameError. Normally, with Failure::Multiple (using Failure::Redis and Failure::Hoptoad), this would be logged to hoptoad and sent to the 'failed' queue. I verified this happens with Failure::Multiple. With MultipleWithRetrySuppression (using Redis and Hoptoad), however, the error is swallowed and the job is apparently dropped on the floor.
The reason for this is evident in the MultipleWithRetrySuppression code:
def klass
constantize(payload['class'])
end
def retryable?
klass.respond_to?(:redis_retry_key)
end
retryable?
is run when trying to handle the error. However, retryable?
calls klass
, which in this case throws an error, since constantize(payload['class'])
will fail (this NameError was the original NameError that caused the job to fail!) This error is not handled and so will probably crash the worker thread... but evidently at this point the job has already been counted in redis as complete (I'm not sure how this works)
An easy way to reproduce this is, in a project which uses MultipleRetryWithSuppression, to enqueue a job for your workers to consume, but with a class that doesn't exist:
class BogusJobClass
def self.queue
:normal # some queue that your workers will consume
end
end
Resque.enqueue(BogusJobClass, 1, 2, 3)
(I originally made a typo in this example, that is fixed now)
Your job will simply disappear! Try it again with your workers using Resque::Failure::Multiple and it won't swallow the error.
I can fix this when I come back from vacation in a week.
Try the advice from @bleything... try using each feature of resque-retry 'from fresh'.
Rewrite readme so its easier to follow and more complete (for what exactly you need to get running!).
As per our discussion in a separate issue, one job that is retried several times looks like several different jobs failing instead of just one. Coming from the delayed job world, when a job is retried, a retry count is incremented instead of a new job being created. Would it be possible to emulate this behavior with resque-retry? Perhaps a count could be included within the resque-web view? In addition, when the job is retried, would it be possible to remove it from the failed set within redis?
thx.
-karl
(weshop)
I found this behaviour in my production app and was able to replicate it in the example/demo app. I've cloned it to https://github.com/hughkelsey/resque-retry-demo.git and have added a gemfile, I'm on ruby 1.9.3.
The app works as expected but when I comment out retry_limit and retry_delay and add backoff_strategy the FailingWithRetryJob just fails and does not retry. Could I ask you to give it a try?
Thanks for the gem, great work.
Were using #inherited
to pull through instance variables, like the retry criteria checks.
http://github.com/lantins/resque-retry/blob/testing/lib/resque/plugins/retry.rb#L38
Need to make sure the call to #super
works as expected.
Hi,
I recently setup resque-retry on one job. After successfully enqueuing it once and failing it , it retries after the time period and then later if it fails again, it retries . Its going recursively not limiting to the retry-limit .
I tried with 0.2.1 version of the gem. I noticed this after two jobs which were set to always fail created 120 'schedules' .
When clicking the remove button on a job in the Retry tab, the corresponding job in the Delayed tab hangs around in dummy form until the scheduled time when it eventually disappears. The job count shows 0, the class is a link saying "see details" and there are no args. It's not a problem per se but it makes the interface a little confusing. I am using current master (commit e6e57b8) with resque 1.20.0 and resque-scheduler 2.0.0.
Are
extend Resque::Plugins::Retry
and
extend Resque::Plugins::ExponentialBackoff
mutually exclusive extends? ie, if I include one, should I not include the other? Or does it matter?
Hi,
Is there a function that gets called on complete failure (after all the retry attempts) or a var to access the number of retries Inside perform?
Using resque-retry and the web interface, when attempting to view the "Failed" jobs tab, I get the following error:
ArgumentError at /resque/failed
wrong number of arguments (4 for 0..2)
file: base.rb location: each line: 50
If I disable resque-retry, the Failed jobs tab works fine.
I feel I'm talking out of turn, but should we add gem 'resque-retry
to Gemfile?
To clarify, should a step be added to the installation instructions which tells the user to add gem 'resque-retry
to their project's Gemfile?
I am using resque to send emails for a list of users. The failed jobs goes to the failed queue, as expected, and resque-retry sends it again after a while. The problem happens when the jobs fails again. Now I have two of each message on failed queue. Is there a way to keep only unique jobs on failed queue?
If I use extend Resque::Plugins::ExponentialBackoff, it won't show any backtrace, just
n/a - not using resque-retry failure backend
I had to resort to the regular Resque::Failure::Multiple backend.
I noticed in your examples, you don't define @Queue on the job. Isn't this needed?
thx
I think this is a missing information, please update the README.
Add this in your Rakefile:
require 'resque_scheduler/tasks'
Then:
rake resque:scheduler
Now the Jobs with delay_time > 0 will work.
I've run into an issue where resque-restriction and resque-retry both want to use the identifier
method name. It looks like they actually pass different arguments, which is problematic to start with. But, the bigger issue is I want my restrictions to be applied at a different level than the retry logic. I.e., they should have different identifiers and short of looking at the callstack, that's currently not possible. If the method names were namespaced (e.g., restriction_identifier, retry_identifier, throttle_identifier), it'd make multi-plugin interaction considerably better.
I'm getting the following Exception when I try to set the MultipleWithRetrySuppression backend:
/Library/Ruby/Gems/1.8/gems/resque-1.10.0/lib/resque.rb:22: Resque is not a module (TypeError)
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `require'
from /Library/Ruby/Gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:158:in `require'
from /Library/Ruby/Gems/1.8/gems/resque-retry-0.1.0/lib/resque-retry.rb:1
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `require'
from /Library/Ruby/Gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:158:in `require'
from /Users/me/apps/my_app/config/initializers/resque.rb:1
My initializer looks like this:
# config/initializers/resque.rb
require 'resque-retry'
require 'resque/failure/redis'
Resque::Failure::MultipleWithRetrySuppression.classes = [Resque::Failure::Redis]
Resque::Failure.backend = Resque::Failure::MultipleWithRetrySuppression
Running Resque 1.10.0 and Resque-Retry 0.1.0
Hey,
I'm not sure this is possible, but it sure would be nice.
Is there some way to support jobs that get enqueud via enqueue_to
?
e.g.
require 'resque-retry'
class ExampleRetryJobThatHasNoQueue
extend Resque::Plugins::Retry
# @queue = :example_queue
@retry_limit = 3
@retry_delay = 60
def self.perform(*args)
# your magic/heavy lifting goes here.
end
end
Resque.enqueue_to(:example_queue,ExampleRetryJobThatHasNoQueue )
What hapenned in my case was that, the job was considered successful even though I had explicitly raised an exception in perform.
Thanks for the plugin, really useful!
Hi,
Im using the newest version of Resque, resque-scheduler and resque-retry. And when my jobs fail they go to the failed queue in resque, every failed attempt, rendering my failed queue useless since the amount of failed jobs are exploding.
Shouldnt the jobs being retried by resque-retry be in a 'retry' queue and then be added to the failed queue once when a job failed for the last time in the retry_limit?
Hello,
I just encountered problem with this gem:
undefined method `format_time' for #Resque::Server:...
My Settings:
resque (1.20.0)
resque-cleaner (0.2.9)
resque-retry (0.2.2)
resque-scheduler (2.0.0)
In new resque I thing they strip out format_time.
I had to fix it by:
module Resque
class Server
def format_time time
time.strftime("%H:%M:%S %d.%m.%Y")
end
end
end
Please look at it
BTW when I switched to master version of resque retry I got error:
resque-1.20.0/lib/resque/tasks.rb:4:in <top (required)>': undefined method
namespace' for main:Object (NoMethodError)
I am having an "uninitialized constant" error when trying to load the /retry page.
It looks like this is due to resque-retry trying to constantize the class names of my jobs in server.rb:10. However, my application models are running in a completely separate codebase, so I'm not sure what the intended logic is here.
Did I miss something in the resque-retry setup? Should it really be need to constantize my application's model names?
I don't see why it wouldn't... can you relax the dependencies in the gemspec?
Hi there,
we're using resque-retry now for about two weeks and it seems working fine in production. But while developing we've encountered arbitrary behaviour: The workers won't work. We start a scheduler task, than a worker. In our jobs are @retry_limit and @retry_schedule set but if we look in the resque-web backend the worker seems to have nothing to do. We're using rails 3.0.4 with resque 1.17.1, retry 0.2.1, scheduler 1.9.9. Is there any known issue while using this version combination?
And an other question: Is it possible to start workers without having the scheduler running? I mean, will the jobs get processed and get packed into the failure queue after one attempt or will the worker do nothing?
I'm looking forward to hear any ideas.
Thanks, Axel
Recently my bundle update
has started to hang. Adding --verbose to bundler didn't give any useful information.
I modified my gemfile, and found the line causing the problem : gem 'resque-retry', '~> 1.0.0.a'
. The strange thing is it used to work.
When I use gem 'resque-retry', :git => "git://github.com/lantins/resque-retry.git"
instead, bundle update works.
When I use gem install resque-retry --pre
instead, it also correctly installs the gem.
Hey @lantins awesome plugin. I have a question. Does rescue-retry provide me with some type of hook called to determine if after competing it's strategy the Job failed?
For example: I've configured 4 retries in my job. If I try the 4 times and the job failed all of the times, Do I have a way to determine that the job never got executed? Lets say for example write to the logs: "Hey I've tried as many times as you requested me. but in none of them I got success."
I'm using extend Resque::Plugins::ExponentialBackoff
with the default retry schedule and it works great normally, but it seems like Timeout::Error
is not being retried. In other words, I'm quite sure the service was up during one of the 6 attempts over the 6 hour time window. And when I manually retry the job from the failed job queue, it works right away.
I remember there was some issues back in the day with ruby where certain exceptions would subclass Error
instead of Exception
or something along those lines. But it seems like that is not a problem here:
irb(main):003:0> Timeout::Error.ancestors
=> [Timeout::Error, RuntimeError, StandardError, Exception, ActiveSupport::Dependencies::Blamable, Object, AttrEncrypted::InstanceMethods, ActiveSupport::Dependencies::Loadable, PP::ObjectMixin, Mongoid::Extensions::Object, Moped::BSON::Extensions::Object, Origin::Extensions::Object, JSON::Ext::Generator::GeneratorMethods::Object, Kernel, BasicObject]
irb(main):004:0>
Can you think of any reason Timeout::Error wouldn't be getting retried? Thanks!
It seems that MultipleWithRetrySuppression
doesn't work well with jobs that has overwritten #identifier
that ignores some attributes.
class MyJob
extend Resque::Plugins::Retry
def self.identifier(key, *args)
super(key)
end
end
I didn't see the tests for this use case so suppose the issue is not on my side.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.