Git Product home page Git Product logo

workers's Issues

"Nested" workers cause parent to quit when join returns

I have code that creates a worker pool, which then calls code that then in turn (in an included library) creates a new worker pool to execute concurrent shell commands, and then waits on them. However my nested worker pool causes the "parent" to quit when join returns.

Tests?

Do you have any plan for testing this library?

Error when running example with ruby 2.0.0-p648

Hi,

I have just been trying out your gem, but got an error when I ran the following example:

require 'worker'

# Initialize a worker pool.
pool = Workers::Pool.new(:on_exception => proc { |e|
  puts "A worker encountered an exception: #{e.class}: #{e.message}"
})

# Perform some work in the background.
100.times do
  pool.perform do
    sleep(rand(3))
    raise 'sad face' if rand < 0.5
    puts "Hello world from thread #{Thread.current.object_id}"
  end
end

# Wait up to 30 seconds for the workers to cleanly shutdown (or forcefully kill them).
pool.dispose(30) do
  puts "Worker thread #{Thread.current.object_id} is shutting down."
end

The error was:

/Users/njh/.rbenv/versions/2.0.0-p648/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:135:in `require': /Users/njh/.rbenv/versions/2.0.0-p648/lib/ruby/gems/2.0.0/gems/worker-0.6.0/lib/worker.rb:91: syntax error, unexpected keyword_rescue, expecting keyword_end (SyntaxError)
      rescue Exception => ex
            ^
/Users/njh/.rbenv/versions/2.0.0-p648/lib/ruby/gems/2.0.0/gems/worker-0.6.0/lib/worker.rb:99: syntax error, unexpected '.'
    @thread&.kill
             ^
        from /Users/njh/.rbenv/versions/2.0.0-p648/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:135:in `rescue in require'
        from /Users/njh/.rbenv/versions/2.0.0-p648/lib/ruby/2.0.0/rubygems/core_ext/kernel_require.rb:144:in `require'
        from worker-test.rb:28:in `<main>'

Is this a ruby 2 compatibly bug?

Workers::TaskGroup::run is not signal safe

It's possible for the TaskGroup.run call to terminate before all the tasks in the group have completed.

Repro:

require 'workers'

# test.rb
Signal.trap("USR1") do
  puts "RECEIVED SIGNAL"
end

group = Workers::TaskGroup.new
group.add { sleep(10000000) }
puts group.run
puts group.tasks[0].state

In one shell: ruby test.rb

In another shell, run:

$ ps aux | grep test.rb
jlfwong          42699   0.0  0.1  2511824  14252 s004  S+    2:16PM   0:00.36 ruby test.rb
$ kill -30 

Then switching back to the original shell, I see the following:

RECEIVED SIGNAL
false
running

My expectation in this case would be that run would just not terminate until much much later.

After 10000000 seconds, I'd expect the output to be:

RECEIVED SIGNAL
true
succeeded

From digging around in ruby source, it seems to me that ConditionVariable::wait is not signal safe. http://ruby-doc.org/stdlib-2.0.0/libdoc/thread/rdoc/ConditionVariable.html

::wait calls mutex.sleep, which in turn calls rb_mutex_sleep, which eventually ends up doing this: https://github.com/ruby/ruby/blob/55f93cb63f350c7705733f86923561363a297e00/thread.c#L1084

Which will exit as soon as the native_sleep call terminates, which is I believe upon receiving any handled signal.

I arrived here trying to debug why running my code under stackprof was causing failures (commentary on this issue here: tmm1/stackprof#91 (comment))

Feature: Add lifecycle hooks for Threads

Similar to on_exception, it would be great to have several lifecycle hooks. For example:

  • after_start - called in the Thread, immediately after a Thread is started
  • before_exit - called in the Thread, immediately before the Thread exits

This is important because it provides a way that databases can be connected/disconnected and logs can be opened/closed, etc. I could create one ThreadPool for my web app that handles database connections and log streams -- this would be very nice.

Maybe this is already easily achieved?

Readme can't work

Method process_event is in parent class without attributes. Always we can override method perform_handler, but it's useless, because if we do this, we are able only to use event perform... Last thing, if we override process_event, we have to write into body shutdown and exception handler. It will be nice to have in default switch clause perform_handler call instead of throwing an exception.

I am able to provide PR with previously mentioned behaviour.

Race Condition with Nested Timers Causing Deadlock

Thanks for the great library, I've been using it in a local project.

I've encountered an issue where a infrequent race condition is causing Threads to deadlock unexpectedly. Specifically, if a context switch occurs during the course of Scheduler#process_overdue in lib/workers/scheduler.rb, betweens the call to @pool.perform and timer.reset, and we switch to a thread that attempts to launch a timer:

  • the Scheduler's mutex will already be locked (in #start_loop which does this before invoking #process_overdue)
  • the Schedule will schedule the worker via pool.perform then the Schedule Thread loses context
  • the Timer will lock it's own mutex during the call to #fire
  • The new Timer will be launched and will block when it tries to lock the Scheduler mutex (already locked)
  • When the Scheduler Thread returns to context, it will attempt to invoke Timer#reset, which will block attempting to lock the Timer mutex resulting in a deadlock.

If there are other threads running in the pool, no Deadlock error will be raised (I believe ruby will not be able to detect this)

The critical section in question is here: https://github.com/chadrem/workers/blob/master/lib/workers/scheduler.rb#L82

We can demonstrate it with the following code:

require 'workers'

timer = Workers::Timer.new(0.01) do
  puts "T1"

  Workers::Timer.new(0.01)  do
    puts "T2"
  end
end

sleep(5)

Note that unless a context switch occurs exactly between the call to @pool.perform and timer.reset, this issue will not be apparent and the example above will execute successfully. To force this context switch simply add a 'sleep' to the #process_overdue function (not meant for production, just for testing / demonstrating this issue):

      overdue.each do |timer|
        @pool.perform(timer.callback) do
          timer.fire
        end
sleep 0.01   # <===========   force a "context switch"

        timer.reset
        @schedule << timer if timer.repeat
      end

During normal operation, when the context switch does not occur, both Timers run succesfully and all output is seen. During the buggy edge case, we only see the first timer execute, before the mutexs block until the process exits.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.