lantins / resque-retry Goto Github PK

View Code? Open in Web Editor NEW

421.0 8.0 139.0 455 KB

A resque plugin; provides retry, delay and exponential backoff support for resque jobs.

License: MIT License

Ruby 95.41% Shell 0.58% HTML 4.00%

resque-retry's People

Stargazers

Watchers

Forkers

agibralter dbalatero weshop kbaum micha-de mattetti nz kemper morellon backupify intjonathan ilya railsware goyox86 seomoz mogotest onyx nfo intfrr kenan-memis davebenvenuti nativusdoge jredville nirvdrum cheald d11wtq balinterdi til betelgeuse mferrier dylanahsmith hiatta hughkelsey bdq mkdynamic lkang jonhyman asimy netconstructor hosts-xx scraping-xx gravis swiftype highspot lloydmeta analog-analytics gaustin bmishkin jonathanbatten spajus teeparham jwechsler pabloborges jtanium mcmoyer pugio jzaleski orenmazor stitchfix kirpen gfmtim saizai trliner michaelglass mindware r3ap3r2004 juddblair tanomsak varunnatraaj uar-david-litvak h0jezvgoxfepbq2c branliu0 stvp rogal111 klfrost biainc taskrabbit jtibbertsma noredink mishaconway paopay robsonmwoc dmitrybochkarev supportbee edenisn bethatagari isqad shippingeasy khoffma4 ahmedattyah valikos f0ster imagerelay iloveitaly kamaljoshi zenjoy hoverinc josh-shuster nimashariatian namelessnotion

resque-retry's Issues

Question

Hello,

I am using resque-retry to manage the retry of some Tasks. When one Task have an exception the Task is retried as expected. But then I tried to just kill the worker in the middle of the work. The task is not retried.

I was wondering if there is any way to retry a Job in that case. If the Worker is working on a Task and that worker dies – or simply there is a connectivity problem between the Worker and the DB – while doing it the Task should be retried.

Does it sounds feasible?

Thanks

ExponentialBackoff did not retry for me

I included resque-retry, started resque-scheduler, and extended ExponentialBackoff

class PersistMessageBodyJob

    extend Resque::Plugins::ExponentialBackoff


    @queue = :persist_message_bodies

    def self.perform mail_hash
        #my code
    end
end

But none of my failures seem to retry. How would i know if they did?

Also, I do not see the new tabs within resque-web for resque-scheduler. Am i missing something?

thx.

Confused with the Retry tab

Following is a piece of excerpt in the README

require 'resque-retry'
require 'resque-retry/server'

# require your jobs & application code.

run Resque::Server.new

I'm using this within a Rails3 app and when I go to the Retry tab, its empty though there are lists of jobs failed.

The commented line above #require your jobs & application code doesn't make any sense to me.

How and where to place that in a rails app?

And what might be the reason that I don't see any jobs in the Retry tab ever?

Job is retried infinitely

Trying out resque-retry. Bundled the gem (0.2.2) and triggering a test job from console. Even though the job is supposed to retry three times, it seems to retry infinitely.

Job:

class NewTestJob

  @queue = "low"

  extend Resque::Plugins::Retry

  @retry_limit = 3
  @retry_delay = 3

  def self.perform(text)
    raise "#{Time.now} - #{text}"
  end

end

Triggered once like so:

Resque.enqueue NewTestJob, "banana"

And it fails and fails and fails forever: http://cl.ly/1g3v1f0q3y0T3w3q2j0g

That is without the retry error backend. Using the backend makes no difference.

We use edge resque-scheduler, resque-lock and resque-cleaner, if that has any bearing.

Any insight?

add retry information/history to the final exception

It would be pretty nice if, when an exception finally does get raised, that the message that appears in redis mentions something about the retry behavior of the job. Somewhere in the message/exception include something like:

Retried X times since Y

-or-

Retried at A, B, C.

Config should be inheritable

It'd be nice if a job hierarchy could be established with a default resque-retry configuration specified on the base class. Currently resque-retry configuration does not cascade down to subclasses, requiring each class to specify all configuration in full.

Failed ExponentialBackoff jobs with a retry count of 0

I noticed a few exponential backoff jobs with a retry count of 0 scheduled to retry in a few minutes. My thought was that exponential backoff jobs retry right away the first failure and I should only see jobs within the delayed resque scheduler tab with a retry count of >=1.

thx.

-karl

Not showing exception on retry tab

Hi,
I updated my environment to resque 1.21.0, resque-scheduler 2.0 and resque-retry 1.0.0.a
Now the Exception/Backtrace columns show this message: "n/a - not using resque-retry failure backend"
But checking the redis, the failure* is present there.

An array in retry_exceptions doesn't up the retry_limit

In this code, if there is a SystemCallError, it will only retry once. That is because a retry_limit of 1 is still respected.

class DeliverSMS
  extend Resque::Plugins::Retry
  @queue = :mt_messages

  @retry_exceptions = { NetworkError => 30, SystemCallError => [120, 240] }

  def self.perform(mt_id, mobile_number, message)
    heavy_lifting
  end
end

retry-keys not removed from Redis, when clearing failed jobs in resque-web

we had many failed jobs, after clearing them all the retry keys were still in redis. after a certain amount of key the web interface is unusable (10000 keys or more). They should be clearing automatically or with a button, if possible.

ExponentialBackoff working randomly

I really cannot be sure to which project this bug relates to, but I'll start here.

We have a class, which is using resque-retry with ExponentialBackoff plugin and MultipleWithRetrySupression exception backend in the following way:

require 'resque-retry'
require 'resque/failure/redis'

class WorkerClass
  extend Resque::Plugins::ExponentialBackoff
  Resque::Failure::MultipleWithRetrySuppression.classes = [Resque::Failure::Redis]
  Resque::Failure.backend = Resque::Failure::MultipleWithRetrySuppression

  @queue = :foobar

  # Retry strategy    2m    8m   24m   ~1h   ~2h   ~6h    ~14h   ~1,5d   ~3d     ~7d
  @backoff_strategy = [120, 480, 1440, 3840, 9600, 23040, 53760, 122880, 276480, 614400]

  def self.perform
    #fast pinging here
  end
end

We have a modified Resque where we don't fork at all and we run the workers in the same process using Ruby fibers for concurrency. For Redis we use em-redis library, set in Resque.redis initialization.

The code processes a huge amount of jobs every day and if the job fails, we want to retry it. Sometimes the retrying works as it should (with the correct timespans), but randomly it just doesn't retry at all or stops retrying too early. There are also cases, where we have 20-30 retries for a single job.

The worker's job is to ping 3rd party servers storing the response. If the response is not successful, our ensure block raises an exception and stores the result, so in a perfect world the retry plugin would catch this and delay a new job.

I have a feeling it might also relate to the weird exception handling in the fibers and EventMachine...

resque-web reporting total jobs failed even when waiting for retry

Is this supposed to happen: http://cl.ly/3adf0748c51b1e6f3038 -- "Showing 0 to 20 of 4 jobs" on the Failed Jobs tab? Should resque-retry try to surpress incrementing the failed job count until the job actually fails?

shouldn't there be a require 'resque_scheduler' somewhere in the code?

I had forgotten to put require 'resque_scheduler' in my app and I didn't realize I was getting undefined method enqueue_in' for Resque Client... I think that there should probably be an explicitrequire 'resque_scheduler'` in lib/resque-retry.rb as I've done in this commit:

http://github.com/agibralter/resque-retry/commit/27d8186b25ed4674e990fc5ca790186bb69427eb

Operation `give me some loving`

Failures not suppressed in resque web

I'm trying to use

Resque::Failure::MultipleWithRetrySuppression.classes = [Resque::Failure::Redis]
Resque::Failure.backend = Resque::Failure::MultipleWithRetrySuppression

to suppress retries showing up as failures in resque web. However, I'm still seeing failures in resque web.

My configuration is

@retry_limit = 1 
@retry_delay = 30
class DelayError < StandardError; end
@retry_exceptions = [DelayError]

When I raise DelayError, I get a failed job entry in resque web with Exception of type DelayError. Should this be getting suppressed with the above configuration ?

When ever i add a retry_delay entry its not retrying the number of attempts i made.

Hi,

I gave @retry_limit = 3, @retry_delay = 5 and wrote a failer code in perform method, as a result i am expecting it should retry the job 3 times. But its not at all retrying to execute the job on failure.

Once i remove the @retry_delay = 5 entry from code its working fine.

require 'resque-retry'

class WordAnalyzer
extend Resque::Plugins::Retry

@Queue = "word_analysis"

@retry_limit = 3
@retry_delay = 5

def self.perform(word)
puts "About to do heavy duty analysis on #{word}"
jobProcessing # added to fail the job.
# this would be something impressive
puts "Finished with analysis on #{word}"
end
end

i am using resque 1.17.1 and resque-retry 0.2.2

gzip stack trace

resque-retry stores a copy of the stacktrace on all failures it's retrying. In hindsight, this makes a lot of sense. But I've also watched it take down two different clusters. In both cases, something went awry, exceptions piled up, redis filled up with keys from resque-retry, redis ran out of memory, redis restarted loading an old DB, those same jobs started processing again, and the whole process was repeated.

For a small hit in CPU, the stacktraces can be compressed pretty well using zlib and conserve RAM for redis.

Add missing yardoc

Fill in any missing yardoc comments.

Main offending file is multiple_with_retry_suppression.rb.

Also need to document the new @retry_job_delegate setting.

resque-retry doesn't remove the job from the failed queue when it completes successfully

Hello @lantins ,

When we retry a job and it succeeds why is it still under Failed jobs?
Is there a way to make it go away from the Failed jobs list?
Do we need to use any plugins for that?

Thanks,
Caglar

determining current retry_count

I want to take specific action in my code depending on the current retry_count I am at.

Are there any examples of how to read the current retry_count and do A instead of B based on where its at?

backoff_strategy wrong time

Hi,

When using backoff_strategy like @backoff_strategy = [0, 10, 20] when it reaches 10 it sets the rety time stamp to 4 hours from now.... I'm using rails 3.1.3

Current attempt number

Is there a way to get the current attempt it's on (3rd try, etc) in the .perform method?

Thanks!

MultipleWithRetrySuppression swallows error if class does not exist

I ran into this problem when deploying code and a worker running old code was still running. It tried to run my new job, the class of which was (e.g.) MyNewJobClass, but MyNewJobClass didn't exist in the old code, so it throws a NameError. Normally, with Failure::Multiple (using Failure::Redis and Failure::Hoptoad), this would be logged to hoptoad and sent to the 'failed' queue. I verified this happens with Failure::Multiple. With MultipleWithRetrySuppression (using Redis and Hoptoad), however, the error is swallowed and the job is apparently dropped on the floor.

The reason for this is evident in the MultipleWithRetrySuppression code:

      def klass
        constantize(payload['class'])
      end

      def retryable?
        klass.respond_to?(:redis_retry_key)
      end

retryable? is run when trying to handle the error. However, retryable? calls klass, which in this case throws an error, since constantize(payload['class']) will fail (this NameError was the original NameError that caused the job to fail!) This error is not handled and so will probably crash the worker thread... but evidently at this point the job has already been counted in redis as complete (I'm not sure how this works)

An easy way to reproduce this is, in a project which uses MultipleRetryWithSuppression, to enqueue a job for your workers to consume, but with a class that doesn't exist:

class BogusJobClass
  def self.queue
    :normal # some queue that your workers will consume
  end
end

Resque.enqueue(BogusJobClass, 1, 2, 3)

(I originally made a typo in this example, that is fixed now)

Your job will simply disappear! Try it again with your workers using Resque::Failure::Multiple and it won't swallow the error.

I can fix this when I come back from vacation in a week.

Rewrite documentation

Try the advice from @bleything... try using each feature of resque-retry 'from fresh'.
Rewrite readme so its easier to follow and more complete (for what exactly you need to get running!).

Difficult to monitor retries with resque-web

As per our discussion in a separate issue, one job that is retried several times looks like several different jobs failing instead of just one. Coming from the delayed job world, when a job is retried, a retry count is incremented instead of a new job being created. Would it be possible to emulate this behavior with resque-retry? Perhaps a count could be included within the resque-web view? In addition, when the job is retried, would it be possible to remove it from the failed set within redis?

thx.

-karl
(weshop)

backoff_strategy not working - Job simply goes to failed

I found this behaviour in my production app and was able to replicate it in the example/demo app. I've cloned it to https://github.com/hughkelsey/resque-retry-demo.git and have added a gemfile, I'm on ruby 1.9.3.

The app works as expected but when I comment out retry_limit and retry_delay and add backoff_strategy the FailingWithRetryJob just fails and does not retry. Could I ask you to give it a try?

Thanks for the gem, great work.

Add unit test to make sure the super() chain is correctly followed.

Were using #inherited to pull through instance variables, like the retry criteria checks.
http://github.com/lantins/resque-retry/blob/testing/lib/resque/plugins/retry.rb#L38

Need to make sure the call to #super works as expected.

resque-retry retrying recursively, not respecting retry-limit

Hi,

I recently setup resque-retry on one job. After successfully enqueuing it once and failing it , it retries after the time period and then later if it fails again, it retries . Its going recursively not limiting to the retry-limit .

I tried with 0.2.1 version of the gem. I noticed this after two jobs which were set to always fail created 120 'schedules' .

Dummy delayed job hangs around after removing retry

When clicking the remove button on a job in the Retry tab, the corresponding job in the Delayed tab hangs around in dummy form until the scheduled time when it eventually disappears. The job count shows 0, the class is a link saying "see details" and there are no args. It's not a problem per se but it makes the interface a little confusing. I am using current master (commit e6e57b8) with resque 1.20.0 and resque-scheduler 2.0.0.

extend requirements

Are

extend Resque::Plugins::Retry

and

extend Resque::Plugins::ExponentialBackoff

mutually exclusive extends? ie, if I include one, should I not include the other? Or does it matter?

Complete Failure

Hi,

Is there a function that gets called on complete failure (after all the retry attempts) or a var to access the number of retries Inside perform?

Failed jobs tab with resque-web

Using resque-retry and the web interface, when attempting to view the "Failed" jobs tab, I get the following error:

ArgumentError at /resque/failed
wrong number of arguments (4 for 0..2)
file: base.rb location: each line: 50

If I disable resque-retry, the Failed jobs tab works fine.

Require gem declaration in Gemfile?

I feel I'm talking out of turn, but should we add gem 'resque-retry to Gemfile?

To clarify, should a step be added to the installation instructions which tells the user to add gem 'resque-retry to their project's Gemfile?

Can't remove retry jobs from the see details link

Hi,

When clicking:

It sends me to a URL like admin/resque/retry/1323456271/jobs/{"class":"Auto::Ccc::FilesWorker","args":["516216c004bb012f0e53001ec94970dd",{"claim_id":525163,"id":25620907}],"queue":"auto_ccc_queue"}/remove and I get this:

Many thanks

Unique jobs on failed queue

I am using resque to send emails for a list of users. The failed jobs goes to the failed queue, as expected, and resque-retry sends it again after a while. The problem happens when the jobs fails again. Now I have two of each message on failed queue. Is there a way to keep only unique jobs on failed queue?

Server: No backtrace information with ExponentialBackoff

If I use extend Resque::Plugins::ExponentialBackoff, it won't show any backtrace, just

n/a - not using resque-retry failure backend

I had to resort to the regular Resque::Failure::Multiple backend.

defining @queue

I noticed in your examples, you don't define @Queue on the job. Isn't this needed?

thx

Turn on the scheduler [delayed jobs not processing]

I think this is a missing information, please update the README.

Add this in your Rakefile:
require 'resque_scheduler/tasks'

Then:
rake resque:scheduler

Now the Jobs with delay_time > 0 will work.

identifier method should be namespaced

I've run into an issue where resque-restriction and resque-retry both want to use the identifier method name. It looks like they actually pass different arguments, which is problematic to start with. But, the bigger issue is I want my restrictions to be applied at a different level than the retry logic. I.e., they should have different identifiers and short of looking at the callstack, that's currently not possible. If the method names were namespaced (e.g., restriction_identifier, retry_identifier, throttle_identifier), it'd make multi-plugin interaction considerably better.

Exception "Resque is not a module"

I'm getting the following Exception when I try to set the MultipleWithRetrySuppression backend:

/Library/Ruby/Gems/1.8/gems/resque-1.10.0/lib/resque.rb:22: Resque is not a module (TypeError)
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `require'
from /Library/Ruby/Gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:158:in `require'
from /Library/Ruby/Gems/1.8/gems/resque-retry-0.1.0/lib/resque-retry.rb:1
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
from /Library/Ruby/Site/1.8/rubygems/custom_require.rb:31:in `require'
from /Library/Ruby/Gems/1.8/gems/activesupport-2.3.8/lib/active_support/dependencies.rb:158:in `require'
from /Users/me/apps/my_app/config/initializers/resque.rb:1

My initializer looks like this:

# config/initializers/resque.rb

require 'resque-retry'
require 'resque/failure/redis'

Resque::Failure::MultipleWithRetrySuppression.classes = [Resque::Failure::Redis]
Resque::Failure.backend = Resque::Failure::MultipleWithRetrySuppression

Running Resque 1.10.0 and Resque-Retry 0.1.0

Support Jobs that do not define it's own queue

Hey,

I'm not sure this is possible, but it sure would be nice.
Is there some way to support jobs that get enqueud via enqueue_to ?

e.g.

require 'resque-retry'

class ExampleRetryJobThatHasNoQueue
  extend Resque::Plugins::Retry
  # @queue = :example_queue

  @retry_limit = 3
  @retry_delay = 60

  def self.perform(*args)
    # your magic/heavy lifting goes here.
  end
end

Resque.enqueue_to(:example_queue,ExampleRetryJobThatHasNoQueue )

What hapenned in my case was that, the job was considered successful even though I had explicitly raised an exception in perform.

Thanks for the plugin, really useful!

Jobs being retried goes to failed queue.

Hi,

Im using the newest version of Resque, resque-scheduler and resque-retry. And when my jobs fail they go to the failed queue in resque, every failed attempt, rendering my failed queue useless since the amount of failed jobs are exploding.

Shouldnt the jobs being retried by resque-retry be in a 'retry' queue and then be added to the failed queue once when a job failed for the last time in the retry_limit?

Problem with resque-retry/server format_time method

Hello,

I just encountered problem with this gem:
undefined method `format_time' for #Resque::Server:...

My Settings:
resque (1.20.0)
resque-cleaner (0.2.9)
resque-retry (0.2.2)
resque-scheduler (2.0.0)

In new resque I thing they strip out format_time.

I had to fix it by:

-- encoding : utf-8 --

module Resque
class Server
def format_time time
time.strftime("%H:%M:%S %d.%m.%Y")
end
end
end

Please look at it

BTW when I switched to master version of resque retry I got error:

resque-1.20.0/lib/resque/tasks.rb:4:in <top (required)>': undefined methodnamespace' for main:Object (NoMethodError)

NameError: uninitialized constant Foo

I am having an "uninitialized constant" error when trying to load the /retry page.

It looks like this is due to resque-retry trying to constantize the class names of my jobs in server.rb:10. However, my application models are running in a completely separate codebase, so I'm not sure what the intended logic is here.

Did I miss something in the resque-retry setup? Should it really be need to constantize my application's model names?

Does rescue-retry not work with 1.9.4?

I don't see why it wouldn't... can you relax the dependencies in the gemspec?

Question: What happens if the scheduler is not running?

Hi there,

we're using resque-retry now for about two weeks and it seems working fine in production. But while developing we've encountered arbitrary behaviour: The workers won't work. We start a scheduler task, than a worker. In our jobs are @retry_limit and @retry_schedule set but if we look in the resque-web backend the worker seems to have nothing to do. We're using rails 3.0.4 with resque 1.17.1, retry 0.2.1, scheduler 1.9.9. Is there any known issue while using this version combination?

And an other question: Is it possible to start workers without having the scheduler running? I mean, will the jobs get processed and get packed into the failure queue after one attempt or will the worker do nothing?

I'm looking forward to hear any ideas.

Thanks, Axel

problem installing with bundler

Recently my bundle update has started to hang. Adding --verbose to bundler didn't give any useful information.

I modified my gemfile, and found the line causing the problem : gem 'resque-retry', '~> 1.0.0.a'. The strange thing is it used to work.

When I use gem 'resque-retry', :git => "git://github.com/lantins/resque-retry.git" instead, bundle update works.

When I use gem install resque-retry --pre instead, it also correctly installs the gem.

Hook after Job strategy has been completed?

Hey @lantins awesome plugin. I have a question. Does rescue-retry provide me with some type of hook called to determine if after competing it's strategy the Job failed?

For example: I've configured 4 retries in my job. If I try the 4 times and the job failed all of the times, Do I have a way to determine that the job never got executed? Lets say for example write to the logs: "Hey I've tried as many times as you requested me. but in none of them I got success."

Doesn't seem to retry Timeout::Error

I'm using extend Resque::Plugins::ExponentialBackoff with the default retry schedule and it works great normally, but it seems like Timeout::Error is not being retried. In other words, I'm quite sure the service was up during one of the 6 attempts over the 6 hour time window. And when I manually retry the job from the failed job queue, it works right away.

I remember there was some issues back in the day with ruby where certain exceptions would subclass Error instead of Exception or something along those lines. But it seems like that is not a problem here:

irb(main):003:0> Timeout::Error.ancestors
=> [Timeout::Error, RuntimeError, StandardError, Exception, ActiveSupport::Dependencies::Blamable, Object, AttrEncrypted::InstanceMethods, ActiveSupport::Dependencies::Loadable, PP::ObjectMixin, Mongoid::Extensions::Object, Moped::BSON::Extensions::Object, Origin::Extensions::Object, JSON::Ext::Generator::GeneratorMethods::Object, Kernel, BasicObject]
irb(main):004:0>

Can you think of any reason Timeout::Error wouldn't be getting retried? Thanks!

MultipleWithRetrySuppression and overwritten .identifier

It seems that MultipleWithRetrySuppression doesn't work well with jobs that has overwritten #identifier that ignores some attributes.

class MyJob
  extend Resque::Plugins::Retry

  def self.identifier(key, *args)
     super(key)
  end
end

I didn't see the tests for this use case so suppose the issue is not on my side.

lantins / resque-retry Goto Github PK

resque-retry's People

Stargazers

Watchers

Forkers

resque-retry's Issues

-- encoding : utf-8 --

Recommend Projects

Recommend Topics

Recommend Org