Git Product home page Git Product logo

Comments (18)

defunkt avatar defunkt commented on May 14, 2024

When are you running into this problem? When your Redis server crashes / shuts down?

from resque.

dibyajyoti avatar dibyajyoti commented on May 14, 2024

I've seen Redis server (1.0.2) crashing in case of async disk write operation a couple of times. Strangely enough after each write Redis server dropped all active client connections. Now I'm using Redis 1.2.1 and haven't faced this issue.

My point in this post was we can have something similar to Apache ActiveMQ clients which won't exit even if the messaging server is unreachable from client(s) ( i.e. ActiveMQ has crashed for some reason or in case of temporary network disconnection) for extended time periods.

If we can have that kind of feature Resque workers will be much more robust.

from resque.

defunkt avatar defunkt commented on May 14, 2024

I completely agree.

I'm only asking where you specifically saw an issue so I can start duplicating the issue locally and working on the crash resistance stuff. Resque should definitely live when Redis dies.

from resque.

dibyajyoti avatar dibyajyoti commented on May 14, 2024

Most of the time this issue surfaced in the event of temporary network outages in my local LAN. And the worst possible thing here is the workers will exit without any trace.

You can recreate a similar event just by stopping the redis server midway. The idle workers will exit almost immediately. Only the worker(s) which is currently processing a job will stay alive till the job is complete. The moment its child process exits at the end of processing the job this currently working worker will exit too as it becomes idle.

Given below is a test scenario:

ps -ef|grep resque
root 5284 1 0 Jan18 ? 00:18:36 /usr/bin/ruby1.8 /usr/bin/rackup -e require "resque";load ENV["CONFIG"] if ENV["CONFIG"] /root/Documents/defunkt-resque-9313556/config.ru
root 6975 1 0 Jan19 ? 00:24:08 resque: Waiting for request_queue
root 6983 1 0 Jan19 ? 00:41:07 resque: Forked 10358 at 1265626164

ps -ef|grep redis
root 5283 1 0 Jan18 ? 00:06:46 ./redis-server

kill -9 5283

ps -ef|grep resque
root 5284 1 0 Jan18 ? 00:18:36 /usr/bin/ruby1.8 /usr/bin/rackup -e require "resque";load ENV["CONFIG"] if ENV["CONFIG"] /root/Documents/defunkt-resque-9313556/config.ru
root 6983 1 0 Jan19 ? 00:41:07 resque: Forked 10358 at 1265626164
root 11409 11318 0 10:57 pts/2 00:00:00 grep resque

We can no longer see the waiting workers as they have exited.

Now the working worker returns at the end of job processing.

ps -ef|grep resque
root 5284 1 0 Jan18 ? 00:18:36 /usr/bin/ruby1.8 /usr/bin/rackup -e require "resque";load ENV["CONFIG"] if ENV["CONFIG"] /root/Documents/defunkt-resque-9313556/config.ru
root 11442 11318 0 11:01 pts/2 00:00:00 grep resque

The remaining worker has exited.

from resque.

defunkt avatar defunkt commented on May 14, 2024

Okay, did some poking. This is slightly tricky because there are so many places a Resque worker touches Redis.

  1. During startup
  2. Grabbing jobs
  3. During shutdown

So we might need a layer in between Redis and Resque (ala Redis::Namespace) that, when a command fails, waits N seconds then retries (setting the procline appropriately).

from resque.

mrduncan avatar mrduncan commented on May 14, 2024

I threw together a new gem mrduncan/redis-retry (not happy with the name, anyone have suggestions?) inspired by redis-namespace.

It simply catches Errno:: ECONNREFUSED and keeps retrying until it either:

  1. It runs out of retries - in which case it simply throws Errno:: ECONNREFUSED
  2. The command succeeds.

The idea is that it can wrap around a Redis::Namespace object (or, the Redis::Namespace could wrap around it) and it'll handle retrying failed commands if a connection goes down momentarily.

from resque.

bitboxer avatar bitboxer commented on May 14, 2024

I have a few workers that are using a normal dsl connection. That connection is reseted every 24 hours, but the workers don't reconnect to the redis and simply sit there silently without processing new jobs.

from resque.

bmarini avatar bmarini commented on May 14, 2024

I have this problem also. The workers lose connection to redis and die, also failing to unregister themselves (no conn to redis), so the web interface falsely reports the workers as still running.

from resque.

bitboxer avatar bitboxer commented on May 14, 2024

At the moment I have a bash script that detects the reconnect of the dsl and restarts the workers. But that's far from ideal :) .

from resque.

bmarini avatar bmarini commented on May 14, 2024

Here's my backtrace:

/home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer/concurrent_timer_pool.rb:63:in `trigger_next_expired_timer_at': time's up! (Timeout::Error)
        from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer/concurrent_timer_pool.rb:68:in `trigger_next_expired_timer'
        from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer.rb:81:in `install_ruby_sigalrm_handler'
        from /opt/ruby-enterprise-1.8.7-2010.02/lib/ruby/1.8/monitor.rb:242:in `synchronize'
        from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer.rb:79:in `install_ruby_sigalrm_handler'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `call'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `initialize'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `new'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `connect_to'
        from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer.rb:56:in `timeout_after'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:281:in `with_timeout'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:212:in `connect_to'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:23:in `connect'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:240:in `ensure_connected'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:266:in `ensure_connected'
        from /opt/ruby-enterprise-1.8.7-2010.02/lib/ruby/1.8/monitor.rb:242:in `synchronize'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:262:in `synchronize'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:266:in `ensure_connected'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:59:in `process'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:197:in `logging'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:58:in `process'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:34:in `call'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis.rb:79:in `get'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-namespace-0.8.0/lib/redis/namespace.rb:188:in `send'
        from /home/deploy/.bundle/ruby/1.8/gems/redis-namespace-0.8.0/lib/redis/namespace.rb:188:in `method_missing'
        from /home/deploy/.bundle/ruby/1.8/gems/resque-1.9.9/lib/resque/worker.rb:406:in `processing'
        from /home/deploy/.bundle/ruby/1.8/gems/resque-1.9.9/lib/resque/worker.rb:338:in `unregister_worker'
        from /home/deploy/.bundle/ruby/1.8/gems/resque-1.9.9/lib/resque/worker.rb:139:in `work'

from resque.

wiemann avatar wiemann commented on May 14, 2024

I have the same issue as bmarini. My server uses stunnel to connect to redis. Any ideas how to solve this?

from resque.

andrewajames avatar andrewajames commented on May 14, 2024

Of a similar nature is the fact that a job can pull a payload from a redis server and then later generate and exception when calling working_on. This exception propagates and the payload itself ends up being lost.

from resque.

corroded avatar corroded commented on May 14, 2024

what's the status of this issue? should redis-retry be merged to resque core or should we just leave it as a resque plugin? Is there any other alternative or is this available for the current resque version?

from resque.

steveklabnik avatar steveklabnik commented on May 14, 2024

Hey there!

I'm trying to triage all of Resque's issues. Lots of them have been open for
quite a while, and that sucks. I'm gonna be working towards taking care of all
of them, and new ones from now forward.

I think merging in something like redis-retry would be good, for sure. I'd love to see a PR that addresses this somehow.

from resque.

wpeterson avatar wpeterson commented on May 14, 2024

I'd love to help you guys churn through the backlog. I'd love to start by working on fixing this issue. Sound good?

from resque.

steveklabnik avatar steveklabnik commented on May 14, 2024

just do it

from resque.

steveklabnik avatar steveklabnik commented on May 14, 2024

This was fixed in d39046f

from resque.

trevorturk avatar trevorturk commented on May 14, 2024

Note also that connection errors will be passed to the backend for easier debugging in production apps with 515887a.

from resque.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.