Comments (18)
When are you running into this problem? When your Redis server crashes / shuts down?
from resque.
I've seen Redis server (1.0.2) crashing in case of async disk write operation a couple of times. Strangely enough after each write Redis server dropped all active client connections. Now I'm using Redis 1.2.1 and haven't faced this issue.
My point in this post was we can have something similar to Apache ActiveMQ clients which won't exit even if the messaging server is unreachable from client(s) ( i.e. ActiveMQ has crashed for some reason or in case of temporary network disconnection) for extended time periods.
If we can have that kind of feature Resque workers will be much more robust.
from resque.
I completely agree.
I'm only asking where you specifically saw an issue so I can start duplicating the issue locally and working on the crash resistance stuff. Resque should definitely live when Redis dies.
from resque.
Most of the time this issue surfaced in the event of temporary network outages in my local LAN. And the worst possible thing here is the workers will exit without any trace.
You can recreate a similar event just by stopping the redis server midway. The idle workers will exit almost immediately. Only the worker(s) which is currently processing a job will stay alive till the job is complete. The moment its child process exits at the end of processing the job this currently working worker will exit too as it becomes idle.
Given below is a test scenario:
ps -ef|grep resque
root 5284 1 0 Jan18 ? 00:18:36 /usr/bin/ruby1.8 /usr/bin/rackup -e require "resque";load ENV["CONFIG"] if ENV["CONFIG"] /root/Documents/defunkt-resque-9313556/config.ru
root 6975 1 0 Jan19 ? 00:24:08 resque: Waiting for request_queue
root 6983 1 0 Jan19 ? 00:41:07 resque: Forked 10358 at 1265626164
ps -ef|grep redis
root 5283 1 0 Jan18 ? 00:06:46 ./redis-server
kill -9 5283
ps -ef|grep resque
root 5284 1 0 Jan18 ? 00:18:36 /usr/bin/ruby1.8 /usr/bin/rackup -e require "resque";load ENV["CONFIG"] if ENV["CONFIG"] /root/Documents/defunkt-resque-9313556/config.ru
root 6983 1 0 Jan19 ? 00:41:07 resque: Forked 10358 at 1265626164
root 11409 11318 0 10:57 pts/2 00:00:00 grep resque
We can no longer see the waiting workers as they have exited.
Now the working worker returns at the end of job processing.
ps -ef|grep resque
root 5284 1 0 Jan18 ? 00:18:36 /usr/bin/ruby1.8 /usr/bin/rackup -e require "resque";load ENV["CONFIG"] if ENV["CONFIG"] /root/Documents/defunkt-resque-9313556/config.ru
root 11442 11318 0 11:01 pts/2 00:00:00 grep resque
The remaining worker has exited.
from resque.
Okay, did some poking. This is slightly tricky because there are so many places a Resque worker touches Redis.
- During startup
- Grabbing jobs
- During shutdown
So we might need a layer in between Redis and Resque (ala Redis::Namespace) that, when a command fails, waits N seconds then retries (setting the procline appropriately).
from resque.
I threw together a new gem mrduncan/redis-retry (not happy with the name, anyone have suggestions?) inspired by redis-namespace.
It simply catches Errno:: ECONNREFUSED
and keeps retrying until it either:
- It runs out of retries - in which case it simply throws
Errno:: ECONNREFUSED
- The command succeeds.
The idea is that it can wrap around a Redis::Namespace object (or, the Redis::Namespace could wrap around it) and it'll handle retrying failed commands if a connection goes down momentarily.
from resque.
I have a few workers that are using a normal dsl connection. That connection is reseted every 24 hours, but the workers don't reconnect to the redis and simply sit there silently without processing new jobs.
from resque.
I have this problem also. The workers lose connection to redis and die, also failing to unregister themselves (no conn to redis), so the web interface falsely reports the workers as still running.
from resque.
At the moment I have a bash script that detects the reconnect of the dsl and restarts the workers. But that's far from ideal :) .
from resque.
Here's my backtrace:
/home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer/concurrent_timer_pool.rb:63:in `trigger_next_expired_timer_at': time's up! (Timeout::Error)
from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer/concurrent_timer_pool.rb:68:in `trigger_next_expired_timer'
from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer.rb:81:in `install_ruby_sigalrm_handler'
from /opt/ruby-enterprise-1.8.7-2010.02/lib/ruby/1.8/monitor.rb:242:in `synchronize'
from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer.rb:79:in `install_ruby_sigalrm_handler'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `call'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `initialize'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `new'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:213:in `connect_to'
from /home/deploy/.bundle/ruby/1.8/gems/SystemTimer-1.2/lib/system_timer.rb:56:in `timeout_after'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:281:in `with_timeout'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:212:in `connect_to'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:23:in `connect'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:240:in `ensure_connected'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:266:in `ensure_connected'
from /opt/ruby-enterprise-1.8.7-2010.02/lib/ruby/1.8/monitor.rb:242:in `synchronize'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:262:in `synchronize'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:266:in `ensure_connected'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:59:in `process'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:197:in `logging'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:58:in `process'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis/client.rb:34:in `call'
from /home/deploy/.bundle/ruby/1.8/gems/redis-2.0.4/lib/redis.rb:79:in `get'
from /home/deploy/.bundle/ruby/1.8/gems/redis-namespace-0.8.0/lib/redis/namespace.rb:188:in `send'
from /home/deploy/.bundle/ruby/1.8/gems/redis-namespace-0.8.0/lib/redis/namespace.rb:188:in `method_missing'
from /home/deploy/.bundle/ruby/1.8/gems/resque-1.9.9/lib/resque/worker.rb:406:in `processing'
from /home/deploy/.bundle/ruby/1.8/gems/resque-1.9.9/lib/resque/worker.rb:338:in `unregister_worker'
from /home/deploy/.bundle/ruby/1.8/gems/resque-1.9.9/lib/resque/worker.rb:139:in `work'
from resque.
I have the same issue as bmarini. My server uses stunnel to connect to redis. Any ideas how to solve this?
from resque.
Of a similar nature is the fact that a job can pull a payload from a redis server and then later generate and exception when calling working_on. This exception propagates and the payload itself ends up being lost.
from resque.
what's the status of this issue? should redis-retry be merged to resque core or should we just leave it as a resque plugin? Is there any other alternative or is this available for the current resque version?
from resque.
Hey there!
I'm trying to triage all of Resque's issues. Lots of them have been open for
quite a while, and that sucks. I'm gonna be working towards taking care of all
of them, and new ones from now forward.
I think merging in something like redis-retry would be good, for sure. I'd love to see a PR that addresses this somehow.
from resque.
I'd love to help you guys churn through the backlog. I'd love to start by working on fixing this issue. Sound good?
from resque.
from resque.
This was fixed in d39046f
from resque.
Note also that connection errors will be passed to the backend for easier debugging in production apps with 515887a.
from resque.
Related Issues (20)
- Unsupported command argument type: TrueClass. Migration from redis gem v4 to v5 HOT 5
- Web Interface failing with Internal Server Error
- Blocking background worker process over SSH HOT 2
- Make the signal for child termination configurable HOT 1
- Workers not terminating as expected HOT 1
- resque-status is not compatible with Resque > 2 HOT 2
- Experiencing a lot of database overhead with forked jobs HOT 3
- deadlock during deallocation of threads and fork
- [feature request] filter sensitive args in resque server
- DirtyExit VS PruneDeadWorkerDirtyExit - what is the difference? HOT 1
- USR2 signal is happening but my resque job is going to dirty exit HOT 2
- start_heartbeat can die while worker is still running HOT 1
- Unpatched CVE-2022-44303 - Reflected XSS HOT 4
- Please publish security advisories for XSS vulnerabilities HOT 4
- Search Feature in failed jobs HOT 2
- Cannot use with Rack 3.0 because of Sinatra (resque-web); Blocked upgrade to Rails 7.1 HOT 5
- Ruby 3.3 and stuck workers. HOT 9
- Autoscaling resque question HOT 2
- Overview UI Broken Due to Nonce Whitelist HOT 2
- Code & comment for redis initializer method could be clearer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from resque.