Thanks for the great library, I've been using it in a local project.
I've encountered an issue where a infrequent race condition is causing Threads to deadlock unexpectedly. Specifically, if a context switch occurs during the course of Scheduler#process_overdue in lib/workers/scheduler.rb, betweens the call to @pool.perform and timer.reset, and we switch to a thread that attempts to launch a timer:
- the Scheduler's mutex will already be locked (in #start_loop which does this before invoking #process_overdue)
- the Schedule will schedule the worker via pool.perform then the Schedule Thread loses context
- the Timer will lock it's own mutex during the call to #fire
- The new Timer will be launched and will block when it tries to lock the Scheduler mutex (already locked)
- When the Scheduler Thread returns to context, it will attempt to invoke Timer#reset, which will block attempting to lock the Timer mutex resulting in a deadlock.
If there are other threads running in the pool, no Deadlock error will be raised (I believe ruby will not be able to detect this)
The critical section in question is here: https://github.com/chadrem/workers/blob/master/lib/workers/scheduler.rb#L82
We can demonstrate it with the following code:
require 'workers'
timer = Workers::Timer.new(0.01) do
puts "T1"
Workers::Timer.new(0.01) do
puts "T2"
end
end
sleep(5)
Note that unless a context switch occurs exactly between the call to @pool.perform and timer.reset, this issue will not be apparent and the example above will execute successfully. To force this context switch simply add a 'sleep' to the #process_overdue function (not meant for production, just for testing / demonstrating this issue):
overdue.each do |timer|
@pool.perform(timer.callback) do
timer.fire
end
sleep 0.01 # <=========== force a "context switch"
timer.reset
@schedule << timer if timer.repeat
end
During normal operation, when the context switch does not occur, both Timers run succesfully and all output is seen. During the buggy edge case, we only see the first timer execute, before the mutexs block until the process exits.