Git Product home page Git Product logo

Comments (15)

sethvargo avatar sethvargo commented on September 25, 2024

@dlethin could you please post your solo.rb and solo.json?

from chef-dk.

dlethin avatar dlethin commented on September 25, 2024
root = File.absolute_path(File.dirname(__FILE__))
file_cache_path root
cookbook_path root + '/berks-cookbooks'
file_backup_path root + "backup"
ssl_verify_mode :verify_peer

Here is my solo.json file:

{
   "qbms_refapp" : {
      "app_classpath_dir" : "/Users/dlethin/bld/scratch/chef/myapp/build/config"
   },
   "run_list": [ "recipe[qbms-refapp::basic]" ]
}

One other interesting note is that this appears to be network related in some way, for someone had the suggestion of disabling wireless connection and after doing that, the problem went a way and when wireless is turned on, it hangs.

And by hangs, I should elaborate that it takes over my mac. I can use my mouse to select other windows, but I can't seem to type anything. I have to wait 4-5 minutes for chef-solo to unhang before control returns to the laptop.

For the person on our team that consistently has this problem, he's working to remove all things chef-related and will try to install chef-dk again and see if the problem goes away or at least becomes intermittent.

Thanks.

from chef-dk.

sethvargo avatar sethvargo commented on September 25, 2024

@dlethin how big are the folders in question? Like - how big is berks-cookbooks, backup, and file_cache_path?

from chef-dk.

dlethin avatar dlethin commented on September 25, 2024

backup and file_cache_path are empty. As far as how to quantify how big berks-cookbooks dir is, its not very big:

du -sk berks-cookbooks
404 berks-cookbooks

We've been able to isolate this problem specifically to the ohai run. If we run ohai directly with debug, the problem exists there:

/opt/chefdk/embedded/bin/ruby ohai -l debug

Looking at the debug output, here are the last few lines that show where all the time is going -

[2014-06-23T16:16:01-04:00] DEBUG: /opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/ohai-7.0.2/lib/ohai/application.rb:70:in run' [2014-06-23T16:16:01-04:00] DEBUG: ohai:51:in

'
[2014-06-23T16:16:01-04:00] DEBUG: No data to collect for plugin Virtualization. Continuing...
[2014-06-23T16:16:01-04:00] DEBUG: No data to collect for plugin Initpackage. Continuing...
[2014-06-23T16:16:01-04:00] DEBUG: ip_scopes: cannot load gem, plugin disabled: cannot load such file -- ipaddr_extensions
[2014-06-23T16:16:01-04:00] DEBUG: No data to collect for plugin Blockdevice. Continuing...
[2014-06-23T16:20:17-04:00] DEBUG: No data to collect for plugin Zpools. Continuing...

Something is happening between the last two lines that is eating up 4 minutes. I would think disabling Zpools plugin might do the trick, but can't quite figure out how to do that when using the ohai command line directly.

from chef-dk.

dlethin avatar dlethin commented on September 25, 2024

I continue hacking away at this, learning a bit along the way. I backed up and edited the file:

/opt/chefdk/embedded/lib/ruby/gems/2.1.0/gems/ohai-7.0.2/lib/ohai/runner.rb, modifying the run_plugin method to look like this:

# Runs plugins and any un-run dependencies.
# If force is set to true, then this plugin and its dependencies
# will be run even if they have been run before.
def run_plugin(plugin)
  unless plugin.kind_of?(Ohai::DSL::Plugin)
    raise Ohai::Exceptions::InvalidPlugin, "Invalid plugin #{plugin} (must be an Ohai::DSL::Plugin or subclass)"
  end

  # force zpools to be skipped
  if ("#{plugin.name}" == "Zpools")
    Ohai::Log.debug("Skipping disabled plugin #{plugin.name}")
    return false
  end

  if Ohai::Config[:disabled_plugins].include?(plugin.name)
    Ohai::Log.debug("Skipping disabled plugin #{plugin.name}")
    return false
  end

...

Then I run ohai again, and it seems the zpool plugin was indeed "skipped", but it made no difference, as there was still a signficant time between the execution of the blockdevice plugin and then the skipping of zpools as shown in the output --

[2014-06-24T10:55:26-04:00] DEBUG: ohai:51:in `

'
[2014-06-24T10:55:26-04:00] DEBUG: No data to collect for plugin Virtualization. Continuing...
[2014-06-24T10:55:26-04:00] DEBUG: No data to collect for plugin Initpackage. Continuing...
[2014-06-24T10:55:26-04:00] DEBUG: ip_scopes: cannot load gem, plugin disabled: cannot load such file -- ipaddr_extensions
[2014-06-24T10:55:26-04:00] DEBUG: No data to collect for plugin Blockdevice. Continuing...
[2014-06-24T10:59:58-04:00] DEBUG: Skipping disabled plugin Zpools

I guess my next strategy is going to be to continue to figure out how ohai works and manually modify the code perhaps to inject more debug logging statements to see if I can pinpoint where this excessive block of time is.

Any pointers or best guesses would be greatly appreciated.

from chef-dk.

dlethin avatar dlethin commented on September 25, 2024

So, while I'm trying to troubleshoot oahi, there are a few things I don't understand. I can see from my output that Zpools plugin is run (unless I inject code to skip it), but I see that the zpools.rb class is in a "solaris2" directory.

Why when I'm running chef/ohai on an OSX mavericks machine would it attempt to run plugins presumably specific to solaris?

from chef-dk.

UncleTallest avatar UncleTallest commented on September 25, 2024

ZFS is originally a Solaris product and, as such, when ported by the good folks at FreeBSD kept that solaris2 identifier at the kernel level. As far as I can tell, that should be entirely transparent to both Ohai and the user. If you aren't using a zfs filesystem you should be able to safely skip the plugin.

from chef-dk.

mcquin avatar mcquin commented on September 25, 2024

A few notes for you...

In the future, if you want to disable plugins, you can include the line Ohai::Config[:disabled_plugins] = [:Zpools] in your config file. The plugin will be loaded, but it will not be run. Due to a bug in your version of Ohai, if you want to disable a plugin with a :CamelCase name you'll have to downcase everything after the first letter.

Ohai works in two stages: first, plugin directories are searched recursively and anything that looks like a plugin is loaded into a Ohai::DSL::Plugin class; second, each plugin is run and is skipped if it is listed as a disabled plugin, or there is no collect_data block corresponding to the system Ohai is running on. The message "No data to collect for plugin #{plugin.name}. Continuing..." indicates there is no applicable collect_data block, and Ohai moves on.

Now, it's likely that it's not the Zpools plugin that's hanging, but something else running between it and the BlockDevice plugin. You may want to start by adding Ohai::Log.debug("Running #{self.name}") here. You should be able to visually identify the plugin that is running slowly.

Hope this is helpful. :)

from chef-dk.

dlethin avatar dlethin commented on September 25, 2024

Thanks. The thing I didn't understand was when running the oai command line directly, where can you you put the Ohai::Config[:disabled_plugins] = [:Zpools]?

Also, I arrived at the same conclusion you did adding a debug log to print the plugin when its run. Doing that was spot on. It turns out by default ohai does not print the plugins its running. It only prints an entry for plugins that have no data to collect. So in my cause the problem was not with the block_device plugin or the zpools. plugin. It turns out there was a problem with the Passwd plugin. Disabling that plugin got us over the hump.

I may submit an issue or enhancement request to ohai project to make sure to print a debug statement when a plugin is run -- that would have saved us a tremendous amount of time in narrowing down what the problem was, although it was a good learning experience to as I'm new to ohai, chef, and ruby.

I guess the step is to continue troubleshooting the Passwd plugin -- possibly adding some additional logging statements to see what's taking so long with it.

That being said, I'm not sure if this is an issue per say with chef-dk so not sure if it make sense to continue to track this as an issue here...

But if anyone has any thoughts on further troubleshooting the Passwd plugin, it would be be much appreciated.

Cheers.

Doug

from chef-dk.

mcquin avatar mcquin commented on September 25, 2024

If your /etc/passwd file is large, it can take some time to run the Passwd plugin. Perhaps this is what you're encountering?

from chef-dk.

arionfx avatar arionfx commented on September 25, 2024

I modified the ohai Passwd plugin like below to output more debug info:

unless etc

  etc Mash.new

  etc[:passwd] = Mash.new

  etc[:group] = Mash.new

  Ohai::Log.debug("Begin access Etc.passwd!")
  Etc.passwd do |entry|
    user_passwd_entry = Mash.new(:dir => entry.dir, :gid => entry.gid, :uid => entry.uid, :shell => entry.shell, :gecos => entry.gecos)
    user_passwd_entry.each_value {|v| fix_encoding(v)}
    etc[:passwd][fix_encoding(entry.name)] = user_passwd_entry
  end
  Ohai::Log.debug("End access Etc.passwd!")

  Ohai::Log.debug("Begin access Etc.group!")
  Etc.group do |entry|
    group_entry = Mash.new(:gid => entry.gid,
                           :members => entry.mem.map {|u| fix_encoding(u)})

    etc[:group][fix_encoding(entry.name)] = group_entry
  end
  Ohai::Log.debug("End access Etc.group!")
end

And output likes this:

[2014-06-25T17:15:46-04:00] DEBUG: Begin access Etc.passwd!
[2014-06-25T17:15:46-04:00] DEBUG: End access Etc.passwd!
[2014-06-25T17:15:46-04:00] DEBUG: Begin access Etc.group!
[2014-06-25T17:20:23-04:00] DEBUG: End access Etc.group!

It looks like looping Etc.group creates the big time gap. And there are 86 lines in /etc/passwd and 109 lines in /etc/group.

One question is how can avoid this long time looping?

Thanks.

from chef-dk.

dlethin avatar dlethin commented on September 25, 2024

It looks like as we peel the onion, we'll have to try to find the source for the Etc.group method to see what is happening here. I was having a harder time trying tracking down the source code for that, however.

One thing to note is that when the network cable is unplugged and wireless settings are turned off, we breeze pass this plugin with no problems. So it seems there is a network component to the issue we are running into. One thought is that somehow perhaps there is an LDAP or Active Directory component at play here? Perhaps we are querying about some group and membership within it, and perhaps its has to process the thousands of employees at our company? Can't know for sure what's going on here until we find where the source code to Etc is and see if we can continue down our path of injecting additional logging to track down what's happening.

Thanks again for everyone's direction chasing this down.

from chef-dk.

mcquin avatar mcquin commented on September 25, 2024

Ah, LDAP does cause problems. Check out @stevendanna's mini password plugin, perhaps it will help you. http://stevendanna.github.io/blog/2013/04/13/passwd-min-ohai-plugin/.

It will eventually need to be updated to version 7 syntax, but we're still supporting version 6 so you won't ned to make changes right away.

from chef-dk.

dlethin avatar dlethin commented on September 25, 2024

I would assume this indeed is related to the cause. What's strange is that theoretically my colleague and I are supposed to have the same corporate laptop configurations so its not clear why the Passwd plugin consistently runs slow on his laptop where as it's intermittently slow for me, and not slow at all on another colleagues laptop. I would guess there is some hidden difference in the configuration of LDAP on our laptop's that we are not aware of.

Anyway, for our needs with the recipes we will be running locally on our OSX boxes using Chef-DK, we don't need password information from ohai so I'm pleased that its as simple as turning off the plugin from running.

Thanks for everyone's help troubleshooting this. I think at this point, we have enough info to suppose this is not an issue specific to ChefDK so I will close the issue.

Thanks also for providing, developing, and supporting ChefDK. Our team is just starting come on board with Chef and ChefDK specifically has the potential to meet one of our requirements for ease of installation for a certain class of users in our environment who need access to run recipes locally. Look forward to seeing this running on Windows boxes as well.

Cheers

Doug Lethin.

from chef-dk.

kevgrig avatar kevgrig commented on September 25, 2024

Thank you for the detailed investigation log. I hit a similar problem. I added the following line to the top of run_plugin in ohai/dsl/plugin/versionvii.rb:

Ohai::Log.warn("Running plugin #{self.name}")

This showed that the delay was in the GCE plugin. This plugin didn't seem important (although the source doesn't even define the acronym or any comment about what GCE does, so I'm not sure). I added the following line to /etc/chef/client.rb:

Ohai::Config[:disabled_plugins] = [:GCE]

from chef-dk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.