zombocom / puma_worker_killer Goto Github PK

Automatically restart Puma cluster workers based on max RAM available

Ruby 100.00%

puma_worker_killer's Issues

Just enable phased restarts

I've had no luck getting rolling restarts to work without the memory killer stuff too. I'm in clustered mode with preload_app! off. Aside from adding this line to an initialiser is there something else I should do?

PumaWorkerKiller.enable_rolling_restart 12 * 3600

As a work around I added this to the bottom of my config/puma.rb and it works well. Do you see any problems with this approach?

workers Integer(ENV['WEB_CONCURRENCY'] || 2)

rackup DefaultRackup
environment ENV['RACK_ENV'] || 'development'

Thread.new do
  until false do
    sleep 12 * 3600
    puts "----> Sending signal to #{Process.pid} for phased restart"
    Process.kill 'USR1', Process.pid
  end
end

Web worker not getting killed on Heroku

Following is my configuration for the puma.rb file. I am using a 1GB web dyno on Heroku.

# puma.rb
before_fork do
  require 'puma_worker_killer'

  PumaWorkerKiller.config do |config|
    config.ram = 1024 # mb
    config.frequency = 1.minute.to_i
    config.percent_usage = 0.97
    config.reaper_status_logs = true
  end

  PumaWorkerKiller.start if Rails.env.production?
end

Heroku Logs -

Jul 07 01:46:27.977pm info heroku web.1 Process running mem=1310M(127.9%)
Jul 07 01:46:27.982pm info heroku web.1 Error R14 (Memory quota exceeded)

Can I run both enable_rolling_restart and start

Hi,

Can I run both enable_rolling_restart and start?

Is this safe?

before_fork do
  require 'puma_worker_killer'
  PumaWorkerKiller.config do |config|
    config.ram           = 512 # mb
    config.percent_usage = 0.98
  end
  PumaWorkerKiller.enable_rolling_restart # Default is every 6 hours
  PumaWorkerKiller.start
end

Need more clarity on some of the configurations

I need to have some clarity on some of the configuration that is written in the readMe file

PumaWorkerKiller.rolling_restart_frequency = 12 * 3600 # 12 hours in seconds, or 12.hours if using Rails

If I am using PWK start not rolling_restart why should I set this in the configuration, what is the default or should I set it to false as mentioned.

Worker killer not starting on Heroku

I've read through the docs and issues on this repo and see two approaches to enabling puma_worker_killer in a Rails application:

Use the before_fork hook in config/puma.rb.
Add a puma_worker_killer.rb initializer.

I've tried both with various tweaks, and none of them actually start doing rolling restarts – or any restarts.

Here's what I currently have:

# config/puma.rb
workers Integer(ENV['WEB_CONCURRENCY'] || 2)
threads_count = Integer(ENV['MAX_THREADS'] || 5)
threads threads_count, threads_count

preload_app!

rackup      DefaultRackup
port        ENV['PORT']     || 3000
environment ENV['RACK_ENV'] || 'development'

on_worker_boot do
  # Worker specific setup for Rails 4.1+
  # See: https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#on-worker-boot
  ActiveRecord::Base.establish_connection
  Rails.logger.info('Worker Connected to ActiveRecord')
end

before_fork do
  require 'puma_worker_killer'

  if Rails.env.production?
    PumaWorkerKiller.enable_rolling_restart(8 * 3600) # 8 hours in seconds
  end
end

And the lib versions are:

    rails (4.2.0)
    puma (3.6.0)

Am I missing something to properly enable puma_worker_killer?

warning: conflicting chdir during another chdir block

After update ruby, and trying to run bundle update:

Using psych 3.1.0
Using puma 4.2.1 from https://github.com/puma/puma.git (at master@3203159)
/home/wellington/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rubygems/ext/builder.rb:183: warning: conflicting chdir during another chdir block
/home/wellington/.rbenv/versions/2.6.5/lib/ruby/2.6.0/rubygems/ext/builder.rb:191: warning: conflicting chdir during another chdir block
Using puma_worker_killer 0.1.1 from https://github.com/schneems/puma_worker_killer.git (at master@4552de0)
Using rack-cors 1.0.3 from https://github.com/cyu/rack-cors.git (at master@145a5df)
...

Thanks!

Be more explicit about where to put PumaWorkerKiller.start

"Somewhere in your main process run this code:"

Could you provide an example for Rails apps? I'm not sure exactly where would be best. Should it go in production.rb? application.rb? My puma initialiser? The unicorn-worker-killer suggests config.ru.

Seems to be not working when Puma is detached from the console

When Puma is detached from the console (daemonize true) the gem stops sending messages to the Puma log and no longer kills oversized workers. Using Puma 2.8.2, Rails 4.0.3, PWK 0.0.3.

NoMethodError: undefined method `reaper_status_logs=' for PumaWorkerKiller:Module

Using config.reaper_status_logs = true as per the docs, throws the following error:

[42188] ! Unable to load application: NoMethodError: undefined method `reaper_status_logs=' for PumaWorkerKiller:Module

Commenting this out of the config, solves the problem.

Error tracking

Hey @schneems

What are your thoughts on adding some kind of hook to the PumaWorkerKiller config block to allow for Airbrake integration on worker pruning? It'd be a good idea to have some kind of audit for how often memory is maxed out, without relying on something like NewRelic.

[ruby] <defunct>

Hi,

I like that project but I've got a problem with defunct processes.
After the kill action from gem, the killed ruby processes, still are presents in the process list with pid and with defunct state.

Without clean of that processes can be a problem.
Images:

Thanks!

Where shall I start PumaWorkerKiller if I do not preload_app?

Hi,

I am trying to switch to phase-restart, so I removed preload_app and I found with this configuration, it seems that PumaWorkerKiller is not started wherever I put the PumaWorkerKiller.start ( tried in before_fork/on_worker_boot/initializer). Nothing reporting into puma_stdout.log.

Where should I start PumaWorkerKiller under this circumstance?

Thank you very much.

Add puma worker killer config into puma config file, rather than into initializers

Add puma worker killer config into puma config file when you start puma as a daemon. If you add it into initialize file , it could be problem.
Sample like this: (in puma.rb file)
before_fork do
ActiveRecord::Base.connection_pool.disconnect!

PumaWorkerKiller.config do |config|
config.ram = 2 * 1024 # mb
config.frequency = 10 # seconds
config.percent_usage = 0.95
config.rolling_restart_frequency = 12 * 3600 # 12 hours in seconds
end
PumaWorkerKiller.start
end

Worker memory hits over RAM settings

Hi, thanks for providing this Gem.

I am current using puma 2.8.0 and rails 3.2.13, running 4 workers on cluster mode. I added puma_worker_killer to my app, put those configs and PumaWorkerKiller.start in my production.rb, but processes went beyond 512 mb which was the RAM value I set.

I took look into the code, realized this gem tries to term processes first, or kill them if failed, I don't know why my processes were so "vital". I guess there's a memory leak in my code, I can't find it right now, because I can't reproduce the problem in my staging or development env. Any ideas? What info I can put here would help?

Thanks.

Dont kill PumaWorkerKiller when calling largest_worker.term a second time

Hello, I'm trying to use PumaWorkerKiller together with Rails ActionController::Live streaming.

Puma seems to have a problem restarting a worker if some SSEs connections are opened (even if the connections are on other workers see puma/puma#569 (comment)).

In any case, there may be some delay between the SIGTERM being sent to a worker by PumaWorkerKiller and the effective restart of the worker (if only to give time for the SSEs connections to close).

It seems PumaWorkerKiller gets killed (exception?) when sending a second time largest_worker.term in PumaWorkerKiller:: PumaMemory. I can get around the issue by testing if a restart order has already happened before largest_worker.term evaluates, and not sending the order again if that's the case.

Puma Worker Killer の導入

I'm so sorry for my mistake.

New puma v4.0.0 available

Now that puma v4.0.0 is out, could you update the gemspec so it allows the upgrade?

https://github.com/schneems/puma_worker_killer/blob/a36748e2bd0fbedc2bb5f0ba6f1efa3c1729129b/puma_worker_killer.gemspec#L21

The memory reported doesn't match with the memory in Heroku metric

For example, when PumaWorkerKiller reports it's using more than 512MB, the memory_total in Heroku metrics on both CLI and dashboard reported only 53% of 512MB

Is there a way to make it more accurate or even match with Heroku metrics? Or is this still relating to zombocom/get_process_mem#7 ?

Anyway, thanks for this useful gem

worker.inspect generates extremely verbose logs

Logs are generated in spots such as this. Example log:

PumaWorkerKiller: Rolling Restart. 2 workers consuming total: 193.29296875 mb out of max: mb. Sending TERM to #<Puma::Cluster::Worker:0x007f9af3016ab8 @index=0, @pid=43203, @phase=0, @stage=:booted, @signal="TERM", @options=#<Puma::LeveledOptions:0x007f9afd54ea08 @cur={}, @set=[{}, {:log_requests=>false, :environment=>"development", :binds=>["tcp://localhost:3000"], :app=>#<Rack::ContentLength:0x007f9afd54f408 @app=#<NewRelic::Agent::Instrumentation::MiddlewareProxy:0x007f9afd52cf20 @target=#<Surkus::Application:0x007f9af3f417e0 @_all_autoload_paths=

It goes on from there for hundreds of lines from there

is it proper to call ActiveRecord::Base.establish_connection when on_worker_boot

as you can see, below is my condition. and puma will be rolling restart by the code PumaWorkerKiller.enable_rolling_restart .

I just wondering whether it is proper to call ActiveRecord::Base.establish_connection when on_worker_boot. I mean since on_worker_boot block will be executed every time rolling restart, I'm afraid that it may be run out of database's connection pool.

workers ENV.fetch("WEB_CONCURRENCY") { 2 }
before_fork do
  require 'puma_worker_killer'
  PumaWorkerKiller.enable_rolling_restart  # Default is every 6 hours

  # If you're preloading your application and using ActiveRecord, it's recommended that you close any connections to the database here to prevent connection leakage:
  ActiveRecord::Base.connection_pool.disconnect! if defined?(ActiveRecord)
end


preload_app!

# If you're preloading your application and using ActiveRecord, it's recommended that you setup your connection pool here:
on_worker_boot do
  ActiveSupport.on_load(:active_record) do
    ActiveRecord::Base.establish_connection
  end
end

No PumaWorkerKiller logs when upgrading to Puma 4.1.0

Hi all,

I noticed that I lost the typical PumaWorkerKiller log messages from stdout ("PumaWorkerKiller: Consuming X mb with master and Y workers.") when I updated the version of Puma from 4.0.1 to Puma 4.1.0.

Could it be related to this change introduced in Puma 4.1.0?

Regards.

JRuby fork not implemented issue

In https://travis-ci.org/schneems/puma_worker_killer/jobs/112527114 there is:

Error: test_rolling_restart(PumaWorkerKillerTest): NotImplementedError: fork is not available on this platform org/jruby/RubyKernel.java:1881:infork'`

which is due to fork not being implemented in JRuby. (Edit: just noticed this is an allowed failure, but I'll leave this open because I'm also curious about #2 below)

Also, should Ruby 2.2 and 2.3 be added to the TravisCI tests? (mostly just a question for the author, I imagine Ruby 2.4.0dev, 2.2, and 2.3 are fairly similar). Thanks.

Determine better ram default

FWIW, you can determine the amount of available RAM in MB in a Heroku Dyno (or probably any container that uses cgroups in which that cgroup data is exposed in the container) with:

Integer(Integer(IO.read("/sys/fs/cgroup/memory/memory.limit_in_bytes")) / 1024 / 1024)

The current amount of memory being used by all processes in a dyno can be found in /sys/fs/cgroup/memory/memory.usage_in_bytes. This is only currently exposed in private space dynos though.

Restarts at a lower limit after enabling jemalloc

Ruby 3.2.2
puma 6.3.1
puma_worker_killer 0.3.1
Ubuntu 22.04 on AWS

before_fork do
  ActiveRecord::Base.connection_pool.disconnect!

  require 'puma_worker_killer'
  PumaWorkerKiller.config do |config|
    config.ram           = 1024 * 32 # mb
    config.percent_usage = 0.75
    config.frequency     = 60
  end
  PumaWorkerKiller.start
end

Prior to enabling jemalloc, we saw puma_worker_killer restart seemingly correctly for our configuration

[464] PumaWorkerKiller: Rolling Restart. 4 workers consuming total: 22772.06640625 mb. Sending TERM to pid 1027664.

After enabling jemalloc, puma_worker_killer now restarts at a much lower level.

[465] PumaWorkerKiller: Rolling Restart. 4 workers consuming total: 7922.57421875 mb. Sending TERM to pid 2441866.

Our graphs in AWS indicate after enabling jemalloc, memory usage was significantly down and never reached the limit given to puma_worker_killer, yet puma_worker_killer seems to be running just as often.

Is this expected behaviour with jemalloc?

AutoReap thread exits

Guys,

I'm trying to use this gem in production, but it wasn't working. I added this line on the auto_reaper start method:

    def start
      @running = true

      Thread.new do
        while @running
          puts 'IM RUNNING'
          @reaper.reap
          sleep @timeout
        end
      end
    end

And it outputs the line only one time. I'm now sure how can I investigate why the thread is dying. Do you have any idea of where to look? My puma is 2.13.4. Here's my config file:

      directory '/var/app/current'
      threads ENV["PUMA_THREADS"], ENV["PUMA_THREADS"]
      workers ENV["PUMA_WORKERS"]
      preload_app!
      bind 'unix:///var/run/puma/my_app.sock'
      stdout_redirect '/var/log/puma/puma.log', '/var/log/puma/puma.log', true
      daemonize false
      on_worker_boot do
        # worker specific setup
        ActiveSupport.on_load(:active_record) do
          config = ActiveRecord::Base.configurations[Rails.env] ||
                      Rails.application.config.database_configuration[Rails.env]
          config['pool'] = 32
          ActiveRecord::Base.establish_connection(config)
        end
      end

And my PumaWorkerKiller are on config/initializers/puma_worker_killer

PumaWorkerKiller.config do |config|
  config.ram           = (ENV['INSTANCE_TOTAL_RAM'] || 3768)# mb
  config.frequency     = 30    # seconds
  config.percent_usage = 0.93
  config.rolling_restart_frequency = (rand(8.0..9.0) * 3600).to_i # restart every 8-9 hours, random so it doesn't restart all servers at the same time
end

PumaWorkerKiller.start

Don't recommend puma_auto_tune?

Should this gem not recommend puma_auto_tune until its issues calculating memory can be resolved? Or does this gem suffer from the same problem?

"Detected 1 Thread(s) started in app boot" on puma 3.4.0

Ruby 2.2.4, Rails 4.2.6.

Recently upgraded from Puma 2.15.3 to 3.4.0 and puma_worker_killer 0.0.4 to 0.0.6. I'm also using puma-heroku 1.0.0 plugin as the config I had was basically the same as that plugin's.

I have: MAX_THREADS=2 and WEB_CONCURRENCY=2

$ cat Procfile
web: bundle exec puma -C config/puma.rb
worker: bundle exec rake jobs:work

$ cat config/puma.rb
plugin "heroku"

$ cat config/initializers/puma_worker_killer.rb
# See https://blog.codeship.com/debugging-a-memory-leak-on-heroku/
# This is obviously not a real solution but buys us time.
if ENV["PUMA_WORKERS_RESTART_FREQUENCY"]
  require "puma_worker_killer"
  # `PUMA_WORKERS_RESTART_FREQUENCY` must be an integer number of hours.
  PumaWorkerKiller.enable_rolling_restart ENV["PUMA_WORKERS_RESTART_FREQUENCY"].to_i * 3600
end

And logs look like this in my staging env:

2016-05-04T20:56:15.967077+00:00 heroku[web.1]: Starting process with command `bundle exec puma -C config/puma.rb`
2016-05-04T20:56:19.876836+00:00 app[web.1]: [3] Puma starting in cluster mode...
2016-05-04T20:56:19.876909+00:00 app[web.1]: [3] * Version 3.4.0 (ruby 2.2.4-p230), codename: Owl Bowl Brawl
2016-05-04T20:56:19.876960+00:00 app[web.1]: [3] * Min threads: 2, max threads: 2
2016-05-04T20:56:19.877010+00:00 app[web.1]: [3] * Environment: production
2016-05-04T20:56:19.877053+00:00 app[web.1]: [3] * Process workers: 2
2016-05-04T20:56:19.877202+00:00 app[web.1]: [3] * Preloading application
2016-05-04T20:56:24.926386+00:00 app[web.1]: [SKYLIGHT] [0.10.3] Skylight agent enabled
2016-05-04T20:56:24.945590+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:24 +0000 web.1 (3)] INFO : Starting the New Relic agent in "production" environment.
2016-05-04T20:56:24.945689+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:24 +0000 web.1 (3)] INFO : To prevent agent startup add a NEWRELIC_AGENT_ENABLED=false environment variable or modify the "production" section of your newrelic.yml.
2016-05-04T20:56:24.945824+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:24 +0000 web.1 (3)] INFO : Reading configuration from config/newrelic.yml (/app)
2016-05-04T20:56:24.957157+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:24 +0000 web.1 (3)] INFO : Environment: production
2016-05-04T20:56:24.957261+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:24 +0000 web.1 (3)] INFO : Dispatcher: puma
2016-05-04T20:56:24.957403+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:24 +0000 web.1 (3)] INFO : Application: quimbee-staging
2016-05-04T20:56:24.957513+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:24 +0000 web.1 (3)] INFO : Deferring startup of agent reporting thread because puma may fork.
2016-05-04T20:56:25.008394+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Rails Sunspot instrumentation
2016-05-04T20:56:25.023592+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Memcache instrumentation for dalli gem
2016-05-04T20:56:25.025960+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Net instrumentation
2016-05-04T20:56:25.030194+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing ActiveRecord 4 instrumentation
2016-05-04T20:56:25.030478+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing deferred Rack instrumentation
2016-05-04T20:56:25.030664+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Rack::Builder middleware instrumentation
2016-05-04T20:56:25.033517+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing ActiveJob instrumentation
2016-05-04T20:56:25.033980+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Rails 3+ middleware instrumentation
2016-05-04T20:56:25.034186+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing DelayedJob instrumentation [part 1/2]
2016-05-04T20:56:25.036156+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing middleware-based Excon instrumentation
2016-05-04T20:56:25.038727+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Rails 4 Controller instrumentation
2016-05-04T20:56:25.038914+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Rails 4 Error instrumentation
2016-05-04T20:56:25.039093+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Installing Rails 4 view instrumentation
2016-05-04T20:56:25.039204+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:25 +0000 web.1 (3)] INFO : Finished instrumentation
2016-05-04T20:56:29.881814+00:00 app[web.1]: ** [NewRelic][2016-05-04 20:56:29 +0000 web.1 (3)] INFO : Doing deferred dependency-detection before Rack startup
2016-05-04T20:56:29.886947+00:00 app[web.1]: [3] * Listening on tcp://0.0.0.0:7282
2016-05-04T20:56:29.887162+00:00 app[web.1]: [3] ! WARNING: Detected 1 Thread(s) started in app boot:
2016-05-04T20:56:29.887296+00:00 app[web.1]: [3] ! #<Thread:0x007f40c3d331e0@/app/vendor/bundle/ruby/2.2.0/gems/puma_worker_killer-0.0.6/lib/puma_worker_killer/auto_reap.rb:12 sleep> - /app/vendor/bundle/ruby/2.2.0/gems/puma_worker_killer-0.0.6/lib/puma_worker_killer/auto_reap.rb:15:in `sleep'
2016-05-04T20:56:29.887677+00:00 app[web.1]: [3] Use Ctrl-C to stop
2016-05-04T20:56:29.912650+00:00 app[web.1]: [3] - Worker 0 (pid: 18) booted, phase: 0
2016-05-04T20:56:29.912799+00:00 app[web.1]: [3] - Worker 1 (pid: 20) booted, phase: 0
2016-05-04T20:56:30.173716+00:00 heroku[web.1]: State changed from starting to up

Should I be starting puma_worker_killer some other way?

Killing workers with plenty of memory left

I just deployed puma_worker_killer on Heroku. I have two dynos and two workers per dyno. I have log_runtime_metrics enabled which shows the memory usage per dyno. As you can see PumaWorkerKiller thinks I am using way more memory (546.97607421875 MB) than Heroku does (291.84 MB). I think Heroku is right. As a result, PumaWorkerKiller is killing my workers every minute.

2015-01-11T19:02:17.461789+00:00 heroku[web.1]: source=web.1 dyno=heroku.11366593.3f1e3a39-ca5a-4f1c-bb5e-f30627a8bb09 sample#memory_total=296.58MB sample#memory_rss=291.84MB sample#memory_cache=4.48MB sample#memory_swap=0.27MB sample#memory_pgpgin=4741439pages sample#memory_pgpgout=4665583pages
2015-01-11T19:02:36.015437+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 2 workers consuming total: 546.97607421875 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f7aefaf5278 @index=0, @pid=2206, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2015-01-11 19:02:32 +0000> consuming 181.73974609375 mb.

Blank mb used displayed in logs

Rolling Restart. 2 workers consuming total: 761.38330078125 mb out of max: mb

config.ram is set:
config.ram = 7500 # mb

Using RuboCop?

Hi @schneems,

Thanks for your awesome work on this! 🔥

However, whilst packaging this for Debian, we faced some issues. Would you be interested in using RuboCop here?
I'd be happy to raise a PR if so! 😄

Significantly incorrect memory usage reported

Was having issues with frequent cycling on a heroku project and noticed numbers like this in the logs (shortly after I restarted all of the web dynos)

2014-08-15T14:21:17.222622+00:00 app[web.3]: [2] PumaWorkerKiller: Consuming 429.48046875 mb with master and 2 workers
2014-08-15T14:21:18.443592+00:00 heroku[web.3]: source=web.3 dyno=heroku.15698018.ca54adb5-3f63-4006-ae9f-c0f235c53288 sample#load_avg_1m=0.00 sample#load_avg_5m=0.00
2014-08-15T14:21:18.443937+00:00 heroku[web.3]: source=web.3 dyno=heroku.15698018.ca54adb5-3f63-4006-ae9f-c0f235c53288 sample#memory_total=313.75MB sample#memory_rss=313.75MB sample#memory_cache=0.00MB sa

Basically, it seems to be vastly overestimating the amount of memory actually used. This is using the latest code from master, including get_process_mem 0.2.0

No restart of pumas with your instructions.

This is a Ruby on Rails Heroku production website. It has a "Memory Leak" and I am using puma_worker_killer to "fix" it until I can get the memory down.

I followed the provided link and redeployed the application. There was no indication of a restart of the pumas after the allotted time. I added a puts which did show at initialization but the statement in the before fork block is never seen again.

I see no indication in the logs. The memory while low is still an issue when it is used to make requests to the backend. In fact it was throwing Heroku H14 errors as when the application goes over the limit it starts Swap and is very slow. It is actually not better.

This is the first day. I do now know what I can do to ensure this is actually working. Again I followed the provided link Heroku section perfectly. Then when pursing the Git Issues page the author mentioned he adds a puma_worker_killer.rb file to initializers. I added that with require 'puma_worker_killer' PumaWorkerKiller.enable_rolling_restart(4 * 3600) puts "running puma killer" This tells the puma_killer to restart the pumas every 4 hours. I could use 12 to do it twice a day however I initially I left the default of 6 hours. Currently for testing purposes I made it 4 hours.

Again there is no indication the pumas are restarted and the memory was at a constant of about 50% through every 6 hour window that was supposed to restart the pumas.

When I start the server locally I see Puma starting in single mode...

How can I ensure this is working?
Why is this not working on Heroku?

Not seeing 'PumaWorkerKiller: Consuming XXX mb' stdout logging after SIGUSR1 signal to master pid

This is what I have in puma config file

workers Etc.nprocessors
preload_app! false
prune_bundler
wait_for_less_busy_worker 
fork_worker
plugin :tmp_restart

extra_runtime_dependencies ['puma_worker_killer']

on_worker_fork do
  require 'puma_worker_killer'

  PumaWorkerKiller.config do |config|
    config.ram           = 2048 # mb
    config.frequency     = 10 # seconds
  end

  PumaWorkerKiller.start
end

before_fork do
    require 'puma_worker_killer'
  
    PumaWorkerKiller.config do |config|
      config.ram           = 2048 # mb
      config.frequency     = 10 # seconds
    end
  
    PumaWorkerKiller.start
  end

Basically after I do a phased-restart (SIGUSR1), the puma killer stops logging the consumed memory.

Implications for long-running requests

I would like to try puma_worker_killer in my setup but I am using Puma to server very long requests (they sometimes span hours or even a couple of days). I cannot abort a running request mid-flight, so I need the killer to wait until the worker exhausts it's current requests, and take into account that this can take long.

Is the gem usable for this sort of use case?

Not able to restart Puma Server after using Puma Worker Killer

My system is using Mina Deploy for automation and since Puma seems to be leaking memory even for such simple application, I have setup to use Puma Worker Killer for helps.

After the first deploy with the newly configured Puma Worker Killer, the system seems to behave normally. However, when I made some code changes to the system (not related to puma or killer), and deploy again, then the system started to fail.

It seems like running Mina Puma:restart/stop/start not longer function properly. Nginx seems to be pointing to the wrong things and Nginx and Puma are not resorting to the correcting binding. Puma stop would keep the socket in place while Puma start will not work due to the sock.

I had to fix this by reverting the system to not use Puma Worker Killer for the system to work again. Did I configured the Killer incorrectly? Is there any known compatibility issue?

Allow pre_term with enable_rolling_restart

Our app is dockerized, so I use the enable_rolling_restart invocation.

I also want to use the pre_term config to send an event to Datadog via its API. Based on my reading of the code (and trying it and having it not work), I don't think this is currently possible.

Seems as if it would be a quick add?

Thanks!

Cannot kill workers

I noticed that PumaWorkerKiller wasn't able to kill the workers after the memory threshold was reached. This is what I saw in the Heroku log, and it kept repeating every 5 seconds.

I, [2019-01-01T09:22:26.184981 #303]  INFO -- : Installing Puma worker loop.
2019-01-01T14:22:26.192087+00:00 app[web.3]: [5] - Worker 0 (pid: 303) booted, phase: 0
2019-01-01T14:22:26.555152+00:00 app[web.2]: [5] PumaWorkerKiller: Out of memory. 2 workers consuming total: 555.26171875 mb out of max: 501.76 mb. Sending TERM to pid 12418 consuming 182.80078125 mb.
2019-01-01T14:22:26.555466+00:00 app[web.2]: Worker #<Puma::Cluster::Worker:0x000055ccb935d578 @index=1, @pid=12418, @phase=0, @stage=:booted, @signal="TERM", @options=#<Puma::UserFileDefaultOptions:0x000055ccaf363d10 @user_options={:binds=>["tcp://0.0.0.0:39732"], :environment=>"production", :logger=>#<Puma::Events:0x000055ccaf36c500 @formatter=#<Puma::Events::PidFormatter:0x000055ccaf491c50>, @stdout=#<IO:<STDOUT>>, @stderr=#<IO:<STDERR>>, @debug=false, @hooks={:on_booted=>[]}>, :before_worker_boot=>[#<Proc:0x000055ccaf5d8eb0@./config/puma.rb:43>, #<Proc:0x000055ccb459b3a8@/app/vendor/bundle/ruby/2.3.0/gems/scout_apm-2.4.13/lib/scout_apm/server_integrations/puma.rb:28>]}, @file_options={:config_files=>["./config/puma.rb"], :min_threads=>16, :max_threads=>16, :binds=>["tcp://0.0.0.0:39732"], :environment=>"production", :workers=>2, :preload_app=>true, :before_worker_boot=>[#<Proc:0x000055ccaf5d8eb0@./config/puma.rb:43>], :before_fork=>[#<Proc:0x000055ccaf5d8e60@./config/puma.rb:75>]}, @default_options={:min_threads=>0, :max_threads=>16, :log_requests=>false, :debug=>false, :binds=>["tcp://0.0.0.0:9292"], :workers=>0, :daemon=>false, :mode=>:http, :worker_timeout=>60, :worker_boot_timeout=>60, :worker_shutdown_timeout=>30, :remote_address=>:socket, :tag=>"app", :environment=>"production", :rackup=>"config.ru", :logger=>#<IO:<STDOUT>>, :persistent_timeout=>20, :first_data_timeout=>30}>, @first_term_sent=nil, @last_checkin=2019-01-01 09:22:21 -0500, @last_status="{}", @dead=false> being killed

And this was the memory consumption metric in the meantime:

I'm using PumaWorkerKiller version 0.1.0 and Puma 3.12.0.

Restarting main process?

Hello,

I may be missing someting totally obvious here, but:

This gem is great to mitigate memory leaks on applications using Puma. Though it only helps to restart workers, not to restart the main process when it is the one whose RAM is getting glutted.

Is this doable to restart the main process (and maybe all the workers at the same time then, not sure how it works) when this is the one consuming the most memory?

Logs shows consuming memory with master and workers

I was getting Error R14 (Memory quota exceeded) on heroku worker. I was suggested to use gem 'puma_worker_killer' when i use this i got following continuous line in log.
PumaWorkerKiller: Consuming 457.4599609375 mb with master and 2 workers
2016-02-10T09:27:27.797425+00:00 app[web.1]: [3] PumaWorkerKiller: Consuming 457.4599609375 mb with master and 2 workers
2016-02-10T09:27:32.876839+00:00 app[web.1]: [3] PumaWorkerKiller: Consuming 457.4599609375 mb with master and 2 workers
2016-02-10T09:27:37.918143+00:00 app[web.1]: [3] PumaWorkerKiller: Consuming 457.4638671875 mb with master and 2 workers
2016-02-10T09:27:42.967565+00:00 app[web.1]: [3] PumaWorkerKiller: Consuming 457.4638671875 mb with master and 2 workers
2016-02-10T09:27:48.056237+00:00 app[web.1]: [3] PumaWorkerKiller: Consuming 457.4638671875 mb with master and 2 workers
2016-02-10T09:27:53.137147+00:00 app[web.1]: [3] PumaWorkerKiller: Consuming 457.9677734375 mb with master and 2 workers

workers being constantly booted

Hi,

I'm actually not sure if this is expected behavior or a bug. After installing and deploying to Heroku, I see this in the logs:

2014-10-09T22:16:20.139448+00:00 app[web.1]: [2] PumaWorkerKiller: Consuming 405.138671875 mb with master and 3 workers
2014-10-09T22:16:20.505306+00:00 app[web.1]: [2] - Worker 2 (pid: 7470) booted, phase: 0
2014-10-09T22:16:25.442379+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 543.58837890625 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27d18d98 @index=2, @pid=7470, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:16:20 +0000> consuming 138.44970703125 mb.
2014-10-09T22:16:25.446976+00:00 app[web.1]: ** [NewRelic][10/09/14 22:16:25 +0000 web.1 (7470)] INFO : Sta
rting Agent shutdown
2014-10-09T22:16:30.402719+00:00 app[web.1]: [2] - Worker 2 (pid: 7499) booted, phase: 0
2014-10-09T22:16:30.580416+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 542.96044921875 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27ec1208 @index=2, @pid=7499, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:16:30 +0000> consuming 137.82177734375 mb.
2014-10-09T22:16:30.840538+00:00 app[web.1]: ** [NewRelic][10/09/14 22:16:30 +0000 web.1 (7499)] INFO : Starting Agent shutdown
2014-10-09T22:16:35.558627+00:00 app[web.1]: [2] - Worker 2 (pid: 7528) booted, phase: 0
2014-10-09T22:16:35.844763+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 658.6083984375 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b28103570 @index=2, @pid=7528, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:16:35 +0000> consuming 254.0869140625 mb.
2014-10-09T22:16:35.849442+00:00 app[web.1]: ** [NewRelic][10/09/14 22:16:35 +0000 web.1 (7528)] INFO : Starting Agent shutdown
2014-10-09T22:16:36+00:00 app[heroku-postgres]: source=HEROKU_POSTGRESQL_GREEN sample#current_transaction=21079 sample#db_size=16181432bytes sample#tables=13 sample#active-connections=10 sample#waiting-connections=0 sample#index-cache-hit-rate=0.99897 sample#table-cache-hit-rate=0.99898 sample#load-avg-1m=0.165 sample#load-avg-5m=0.205 sample#load-avg-15m=0.195 sample#read-iops=0 sample#write-iops=21.866 sample#memory-total=15405620kB sample#memory-free=273008kB sample#memory-cached=14363932kB sample#memory-postgres=180596kB
2014-10-09T22:16:40.393950+00:00 app[web.1]: [2] - Worker 2 (pid: 7557) booted, phase: 0
2014-10-09T22:16:40.962368+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 532.9091796875 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b280d7f10 @index=0, @pid=8352, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:16:40 +0000> consuming 131.185546875 mb.
2014-10-09T22:16:40.965767+00:00 app[web.1]: ** [NewRelic][10/09/14 22:16:40 +0000 web.1 (8352)] INFO : Starting Agent shutdown
2014-10-09T22:16:45.663400+00:00 app[web.1]: [2] - Worker 0 (pid: 7586) booted, phase: 0
2014-10-09T22:16:46.129866+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 539.349609375 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b277dfa98 @index=0, @pid=7586, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:16:45 +0000> consuming 137.6259765625 mb.
2014-10-09T22:16:46.135289+00:00 app[web.1]: ** [NewRelic][10/09/14 22:16:46 +0000 web.1 (7586)] INFO : Starting Agent shutdown
2014-10-09T22:16:50.695309+00:00 app[web.1]: [2] - Worker 0 (pid: 7615) booted, phase: 0
2014-10-09T22:16:51.235873+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 539.384765625 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27b023d8 @index=0, @pid=7615, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:16:50 +0000> consuming 137.6376953125 mb.
2014-10-09T22:16:51.240065+00:00 app[web.1]: ** [NewRelic][10/09/14 22:16:51 +0000 web.1 (7615)] INFO : Starting Agent shutdown
2014-10-09T22:16:55.663499+00:00 app[web.1]: [2] - Worker 0 (pid: 7644) booted, phase: 0
2014-10-09T22:16:56.354895+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 539.36865234375 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27ca0988 @index=0, @pid=7644, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:16:55 +0000> consuming 137.62158203125 mb.
2014-10-09T22:16:56.363089+00:00 app[web.1]: ** [NewRelic][10/09/14 22:16:56 +0000 web.1 (7644)] INFO : Starting Agent shutdown
2014-10-09T22:17:00.725144+00:00 app[web.1]: [2] - Worker 0 (pid: 7673) booted, phase: 0
2014-10-09T22:17:01.486460+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:01 +0000 web.1 (7673)] INFO : Starting Agent shutdown
2014-10-09T22:17:01.482843+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 539.376953125 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27e450b8 @index=0, @pid=7673, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:00 +0000> consuming 137.6298828125 mb.
2014-10-09T22:17:05.693166+00:00 app[web.1]: [2] - Worker 0 (pid: 7702) booted, phase: 0
2014-10-09T22:17:06.722113+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 539.423828125 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27fe7a60 @index=0, @pid=7702, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:05 +0000> consuming 137.6767578125 mb.
2014-10-09T22:17:06.726774+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:06 +0000 web.1 (7702)] INFO : Starting Agent shutdown
2014-10-09T22:17:10.747406+00:00 app[web.1]: [2] - Worker 0 (pid: 7735) booted, phase: 0
2014-10-09T22:17:12.192350+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 539.958984375 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b2444f840 @index=0, @pid=7735, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:10 +0000> consuming 137.9345703125 mb.
2014-10-09T22:17:12.244185+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:12 +0000 web.1 (7735)] INFO : Starting Agent shutdown
2014-10-09T22:17:17.374694+00:00 app[web.1]: [2] PumaWorkerKiller: Consuming 402.0244140625 mb with master and 3 workers
2014-10-09T22:17:19.637200+00:00 app[web.1]: [2] - Worker 0 (pid: 7769) booted, phase: 0
2014-10-09T22:17:22.567538+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.060546875 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b279f4d38 @index=0, @pid=7769, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:19 +0000> consuming 138.0361328125 mb.
2014-10-09T22:17:22.573370+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:22 +0000 web.1 (7769)] INFO : Starting Agent shutdown
2014-10-09T22:17:24.660573+00:00 app[web.1]: [2] - Worker 0 (pid: 7798) booted, phase: 0
2014-10-09T22:17:27.684724+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:27 +0000 web.1 (7798)] INFO : Starting Agent shutdown
2014-10-09T22:17:27.681165+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.08349609375 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27ba7a68 @index=0, @pid=7798, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:24 +0000> consuming 138.05908203125 mb.
2014-10-09T22:17:27+00:00 app[heroku-postgres]: source=HEROKU_POSTGRESQL_GREEN sample#current_transaction=21079 sample#db_size=16181432bytes sample#tables=13 sample#active-connections=8 sample#waiting-connections=0 sample#index-cache-hit-rate=0.99897 sample#table-cache-hit-rate=0.99898 sample#load-avg-1m=0.14 sample#load-avg-5m=0.185 sample#load-avg-15m=0.19 sample#read-iops=0 sample#write-iops=40.972 sample#memory-total=15405620kB sample#memory-free=277604kB sample#memory-cached=14363956kB sample#memory-postgres=174492kB
2014-10-09T22:17:30.665951+00:00 app[web.1]: [2] - Worker 0 (pid: 7827) booted, phase: 0
2014-10-09T22:17:32.803448+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.03466796875 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27d460b8 @index=0, @pid=7827, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:30 +0000> consuming 138.01025390625 mb.
2014-10-09T22:17:32.806795+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:32 +0000 web.1 (7827)] INFO : Starting Agent shutdown
2014-10-09T22:17:35.710045+00:00 app[web.1]: [2] - Worker 0 (pid: 7856) booted, phase: 0
2014-10-09T22:17:37.943205+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.119140625 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27eebeb8 @index=0, @pid=7856, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:35 +0000> consuming 138.0947265625 mb.
2014-10-09T22:17:
37.947882+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:37 +0000 web.1 (7856)] INFO : Starting Agent shutdown
2014-10-09T22:17:40.981570+00:00 app[web.1]: [2] - Worker 0 (pid: 7885) booted, phase: 0
2014-10-09T22:17:43.287548+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.416015625 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b28129b08 @index=0, @pid=7885, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:40 +0000> consuming 138.0556640625 mb.
2014-10-09T22:17:43.292923+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:43 +0000 web.1 (7885)] INFO : Starting Agent shutdown
2014-10-09T22:17:48.454556+00:00 app[web.1]: [2] PumaWorkerKiller: Consuming 402.3603515625 mb with master and 3 workers
2014-10-09T22:17:49.543304+00:00 app[web.1]: [2] - Worker 0 (pid: 7915) booted, phase: 0
2014-10-09T22:17:53.595892+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.626953125 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b2770df70 @index=0, @pid=7915, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:49 +0000> consuming 138.2666015625 mb.
2014-10-09T22:17:53.600860+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:53 +0000 web.1 (7915)] INFO : Starting Agent shutdown
2014-10-09T22:17:54.648972+00:00 app[web.1]: [2] - Worker 0 (pid: 7944) booted, phase: 0
2014-10-09T22:17:58.723879+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.59912109375 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27ac36b0 @index=0, @pid=7944, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:54 +0000> consuming 138.23876953125 mb.
2014-10-09T22:17:58.727662+00:00 app[web.1]: ** [NewRelic][10/09/14 22:17:58 +0000 web.1 (7944)] INFO : Starting Agent shutdown
2014-10-09T22:17:59.712031+00:00 app[web.1]: [2] - Worker 0 (pid: 7973) booted, phase: 0
2014-10-09T22:18:03.854347+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.61083984375 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27c62228 @index=0, @pid=7973, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:17:59 +0000> consuming 138.25048828125 mb.
2014-10-09T22:18:03.860066+00:00 app[web.1]: ** [NewRelic][10/09/14 22:18:03 +0000 web.1 (7973)] INFO : Starting Agent shutdown
2014-10-09T22:18:09.047095+00:00 app[web.1]: [2] PumaWorkerKiller: Consuming 402.3603515625 mb with master and 3 workers
2014-10-09T22:18:09.581353+00:00 app[web.1]: [2] - Worker 0 (pid: 8003) booted, phase: 0
2014-10-09T22:18:14.495266+00:00 app[web.1]: [2] PumaWorkerKiller: Out of memory. 3 workers consuming total: 540.853515625 mb out of max: 501.76 mb. Sending TERM to #<Puma::Cluster::Worker:0x007f6b27f39398 @index=0, @pid=8003, @phase=0, @stage=:booted, @signal="TERM", @last_checkin=2014-10-09 22:18:09 +0000> consuming 138.2236328125 mb.
2014-10-09T22:18:14.499914+00:00 app[web.1]: ** [NewRelic][10/09/14 22:18:14 +0000 web.1 (8003)] INFO : Starting Agent shutdown
2014-10-09T22:18:20.149909+00:00 app[web.1]: [2] PumaWorkerKiller: Consuming 402.6298828125 mb with master and 3 workers
2014-10-09T22:18:20.696828+00:00 app[web.1]: [2] - Worker 0 (pid: 8041) booted, phase: 0

It looks like workers are being identified as out of memory, and they're sent a TERM but ram usage never recedes.

This is on a relatively unused install with two 1X Dynos running puma.

I've tried running this on both ruby 2.1.2 (and 2.1.3) and now 2.0.0 with the same result. Thoughts?

Continued cycling/killing of puma workers (wrong ones?)

Running 5 puma workers on a 2X heroku dyno, we encountered a situation when the memory ran out (after many hours) where puma_worker_killer was continually killing and cycling workers with indexes 1-3. The memory was not dropping to an acceptable level, and puma_worker_killer did not try sending a TERM to the other workers. My best guess is that the reported memory usage for the individual workers was wrong (i.e. largest_worker was wrong) and so it was killing the wrong ones. This has happened multiple times for us.

It seems the potential mitigating solutions are:

only run on 1X dynos with fewer workers (but we've also seen this here)
add an option, similar to unicorn_worker_killer, allowing terminating a worker after a certain number of requests
store information about recently terminated workers and consider trying to have it terminate a different worker than largest_worker, especially if the previous attempt didn't have success, or the previous attempt was for largest_worker

Is there a way to specify a specific dyno that gets killed?

I have 10 or so separate queues with dynos in each that I'm using so I can allocate resources accordingly dependent on memory needs of each queue. I want to use puma worker killer to kill a dyno in a specific queue that goes above 98% memory usage. However, there doesn't seem to be a way to specify "only apply these puma killer settings to X worker/dyno/queue"

Is there a way to do that?

memory calculations are "not useful"

Unable to get it working with 3.12 / Rails 5.2

Hi @schneems,

I've spent some time debugging and I'm sure that there is something obvious that I've missed but am unable to figure it out:

I don't see the PumaWorkerKiller message on booting the app. Here's what I see:

2018-07-27T05:16:45.673728+00:00 app[web.1]: [4] Puma starting in cluster mode...
2018-07-27T05:16:45.673787+00:00 app[web.1]: [4] * Version 3.12.0 (ruby 2.5.1-p57), codename: Llamas in Pajamas
2018-07-27T05:16:45.673790+00:00 app[web.1]: [4] * Min threads: 5, max threads: 5
2018-07-27T05:16:45.673792+00:00 app[web.1]: [4] * Environment: production
2018-07-27T05:16:45.673794+00:00 app[web.1]: [4] * Process workers: 2
2018-07-27T05:16:45.673795+00:00 app[web.1]: [4] * Preloading application
2018-07-27T05:16:50.800023+00:00 app[web.1]: [4] * Listening on tcp://0.0.0.0:10965
2018-07-27T05:16:50.800204+00:00 app[web.1]: [4] ! WARNING: Detected 1 Thread(s) started in app boot:
2018-07-27T05:16:50.800265+00:00 app[web.1]: [4] ! #<Thread:0x0000000004d34b00@/app/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.0/lib/active_record/connection_adapters/abstract/connection_pool.rb:297 sleep> - /app/vendor/bundle/ruby/2.5.0/gems/activerecord-5.2.0/lib/active_record/connection_adapters/abstract/connection_pool.rb:299:in `sleep'
2018-07-27T05:16:50.800355+00:00 app[web.1]: [4] Use Ctrl-C to stop
2018-07-27T05:16:50.839300+00:00 app[web.1]: [4] - Worker 0 (pid: 9) booted, phase: 0
2018-07-27T05:16:50.842525+00:00 app[web.1]: [4] - Worker 1 (pid: 13) booted, phase: 0
2018-07-27T05:16:51.024199+00:00 heroku[web.1]: State changed from starting to up

Here is what the relevant section of my puma.rb section looks like:

preload_app!

# If you are preloading your application and using Active Record, it's
# recommended that you close any connections to the database before workers
# are forked to prevent connection leakage.
#
before_fork do
  ActiveRecord::Base.connection_pool.disconnect! if defined?(ActiveRecord)
  
  # Restart Puma workers on a rolling basis
  require 'puma_worker_killer'
  PumaWorkerKiller.enable_rolling_restart # Default is every 6 hours
end

Any thoughts on what might be missing?

Thank you!

Callback for tracking calculated memory usage per reap attempt

Along the similar vein as the pre_term lambda callback, would you accept a PR for a callback that runs per Reaper execution? My use case is that I want to record PumaWorkerKilller's reported memory usage to a StatsD server (Datadog specifically) so recording these values would require a callback that executes every Reap attempt rather than every time the reaper terminates a process.

If so I think I see what's going on with the codebase and testing strategy but is there anything else you would need before such a PR were accepted?

Seems not working on Puma 2.11.3

I am running rails 4.2.1 on puma 2.11.3 on ruby 2.0.0-p643. And I am using cluster mode, I have 8 workers each server.

But memory still increases and eventually eats up all my memory. ram and percent_usage in configuration both mean total ram on my server right?

I added some logs and found that getting workers from Puma::Cluster is empty. https://github.com/schneems/puma_worker_killer/blob/master/lib/puma_worker_killer/puma_memory.rb#L67

But I think I've configured it to run in cluster right?

in puma.rb

workers 8

and

$ ps -ef | grep puma
deployer   923  5685  2 17:14 ?        00:00:10 puma: cluster worker 0: 5685 [maleskine]
deployer   963  5685  2 17:14 ?        00:00:10 puma: cluster worker 1: 5685 [maleskine]
deployer  1064 29862  0 17:21 pts/2    00:00:00 grep --color=auto puma
deployer  5685     1  0 11:01 ?        00:00:02 puma 2.11.3 (unix:///tmp/maleskine.sock) [maleskine]

Is there anything I've done wrong?

puma 5, rails s error

05410000-105414000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/iso_8859_9.bundle
105414000-105418000 rw- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/iso_8859_9.bundle
105418000-10541c000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/iso_8859_9.bundle
10541c000-105420000 r-x /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/shift_jis.bundle
105420000-105424000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/shift_jis.bundle
105424000-105428000 rw- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/shift_jis.bundle
105428000-10542c000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/shift_jis.bundle
10542c000-105430000 r-x /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/gb18030.bundle
105430000-105434000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/gb18030.bundle
105434000-105438000 rw- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/gb18030.bundle
105438000-10543c000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/gb18030.bundle
10543c000-105440000 r-x /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_jp.bundle
105440000-105444000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_jp.bundle
105444000-105448000 rw- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_jp.bundle
105448000-10544c000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_jp.bundle
10544c000-105450000 r-x /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_kr.bundle
105450000-105454000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_kr.bundle
105454000-105458000 rw- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_kr.bundle
105458000-10545c000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/euc_kr.bundle
10545c000-105460000 r-x /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/big5.bundle
105460000-105464000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/big5.bundle
105464000-105468000 rw- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/big5.bundle
105468000-10546c000 r-- /Users/xx/.rvm/rubies/ruby-2.7.2/lib/ruby/2.7.0/arm64-darwin20/enc/big5.bundle
10546c000-105484000 r-x /Users/xx/.rvm/gems/ruby-2.7.2/gems/ffi-1.13.1/lib/ffi_c.bundle
105484000-105488000 r-- /Users/xx/.rvm/gems/ruby-2.7.2/gems/ffi-1.13.1/lib/ffi_c.bundle
105488000-10548c000 rw- /Users/xx/.rvm/gems/ruby-2.7.2/gems/ffi-1.13.1/lib/ffi_c.bundle
10548c000-1054a0000 r-- /Users/xx/.rvm/gems/ruby-2.7.2/gems/ffi-1.13.1/lib/ffi_c.bundle
1054a0000-1054a4000 rw-
146600000-146700000 rw-
146700000-146800000 rw-
146800000-147000000 rw-
147000000-147100000 rw-
147100000-147200000 rw-
147200000-147300000 rw-
147300000-147400000 rw-
147400000-147500000 rw-
147500000-147600000 rw-
147600000-147700000 rw-
147700000-147800000 rw-
147800000-148000000 rw-
148000000-148800000 rw-
148800000-149000000 rw-
149000000-149800000 rw-
149800000-14a000000 rw-
14a000000-14a800000 rw-
14a800000-14b000000 rw-
14b000000-14b800000 rw-
14b800000-14c000000 rw-
14c000000-14c800000 rw-
14c800000-14d000000 rw-
14d000000-14d800000 rw-
14d800000-14e000000 rw-
14e000000-14e800000 rw-
14e800000-14f000000 rw-
14f000000-14f800000 rw-
14f800000-150000000 rw-
150000000-150100000 rw-
150100000-150200000 rw-
150200000-150300000 rw-
150300000-150400000 rw-
150400000-150500000 rw-
150500000-150600000 rw-
150600000-150700000 rw-
150700000-150800000 rw-
150800000-151000000 rw-
151000000-151100000 rw-
151100000-151200000 rw-
151200000-151300000 rw-
151300000-151400000 rw-
151400000-151500000 rw-
151500000-151600000 rw-
151800000-152000000 rw-
156600000-156700000 rw-
156700000-156704000 rw-
156800000-157000000 rw-
157000000-157100000 rw-
157100000-157200000 rw-
157200000-157300000 rw-
157300000-157400000 rw-
157400000-157500000 rw-
157500000-157600000 rw-
157600000-157700000 rw-
157700000-157800000 rw-
157800000-158000000 rw-
158000000-158800000 rw-
158800000-159000000 rw-
159000000-159800000 rw-
159800000-15a000000 rw-
15a000000-15a800000 rw-
15a800000-15b000000 rw-
15b000000-15b800000 rw-
15b800000-15c000000 rw-
15c000000-15c800000 rw-
15c800000-15d000000 rw-
15d000000-15d800000 rw-
15d800000-15e000000 rw-
15e000000-15e800000 rw-
15e800000-15f000000 rw-
15f000000-15f800000 rw-
15f800000-160000000 rw-
160000000-162000000 rw-
162000000-162100000 rw-
16b5e4000-16ede8000 ---
16ede8000-16f5e4000 rw-
16f5e4000-16f5e8000 ---
16f5e8000-16f670000 rw-
180000000-190000000 r--
190000000-1a0000000 r--
1a0000000-1b0000000 r--
1b0000000-1c0000000 r--
1c0000000-1d0000000 r--
1d0000000-1e0000000 r--
1e0000000-1f0000000 r--
1f0000000-1fa000000 r--
1fa000000-1fc000000 rw-
1fc000000-1fe000000 rw-
1fe000000-200000000 r--
200000000-201238000 r--
201238000-206a60000 rw-
206a60000-208000000 r--
208000000-210000000 r--
210000000-220000000 r--
220000000-230000000 r--
230000000-240000000 r--
240000000-250000000 r--
250000000-260000000 r--
260000000-270000000 r--
270000000-280000000 r--
fc0000000-1000000000 ---
1000000000-7000000000 ---
[IMPORTANT]
Don't forget to include the Crash Report log file under
DiagnosticReports directory in bug reports.

Possible issue with leaving open database connections

Ref. Heroku support ticket: https://support.heroku.com/tickets/238155

This app was running with a config.frequency = 60, and each of their backends were using about 200 MB of RAM on the postgres server, leading to a graph that looks like this:

The marker is when they installed pgbouncer, which effectively insulates them from this problem by limiting the maximum number of connections coming from any one dyno.

I can't prove it without access to their logs, but I suspect their workers are getting TERMed, then aggressively KILLed—and the KILL leaks the database connection by not explicitly closing it. (I'm working on a POC for this soon.)

If I'm right, this problem also exists with unicorn_worker_killer.

Rolling restarts seem extremely inconsistent

Been noticing this going back to PWK 0.0.4 & Puma 2.x, but currently running PWK 0.0.5 & Puma 3.0.2

config/puma.rb

preload_app!

threads (ENV['THREADS'] || 2), (ENV['THREADS'] || 2)
workers (ENV['WORKERS'] || 2)

port        ENV['PORT'] || 3000
environment ENV['RACK_ENV'] || ENV['RAILS_ENV'] || 'development'

on_worker_boot do
  ActiveRecord::Base.establish_connection
end

config/initializers/puma_worker_killer.rb

PumaWorkerKiller.enable_rolling_restart if Rails.env.production?

Procfile

web: bundle exec puma -C config/puma.rb

App is on Heroku. I'll get a few restarts in a row, but, as I understand, it should be killing some workers every 60s. Logs from the last 6 hours, filtered for PWK entries:

Rolling restart uses pids that were killed instead of new pids

Here is the related puma_worker_killer section from our puma config:

before_fork do
  ActiveRecord::Base.connection_pool.disconnect!
  unless Rails.env.development?
    PumaWorkerKiller.config do |config|
      config.ram           = 512 # mb
      config.frequency     = 10    # seconds
      config.percent_usage = 0.98
      config.rolling_restart_frequency = 6 * 3600 # 6 hours in seconds
    end
    PumaWorkerKiller.start
  end
end

Just for informational sake, I am running puma with 2 workers with min 1 max 5 threads.

I found that the Rolling Restart feature was sending TERM to a pid that was already killed due to Out of memory, instead of the newly spawned pid.

Here is the related excerpt from our logs:

00:51:49 [15] PumaWorkerKiller: Out of memory. 2 workers consuming total: 502.4765625 mb out of max: 501.76 mb. Sending TERM to pid 3047 consuming 177.96484375 mb.
...........................
03:47:37 [15] PumaWorkerKiller: Rolling Restart. 2 workers consuming total: 493.609375 mb out of max: mb. Sending TERM to pid 3047.

Here is the related kill/boot-up excerpt:

00:51:49 [15] PumaWorkerKiller: Out of memory. 2 workers consuming total: 502.4765625 mb out of max: 501.76 mb. Sending TERM to pid 3047 consuming 177.96484375 mb.

00:51:49 [15] - Worker 1 (pid: 3940) booted, phase: 0

00:51:59 [15] PumaWorkerKiller: Consuming 456.82421875 mb with master and 2 workers.

So from what I see, the Rolling Restart should have actually killed pid 3940 instead of 3047.

zombocom / puma_worker_killer Goto Github PK

puma_worker_killer's Issues

Recommend Projects

Recommend Topics

Recommend Org