Git Product home page Git Product logo

phosphor-hwmon's Introduction

phosphor-hwmon

Exposes generic hwmon entries as DBus objects. More information can be found at Sensor Architecture

To Build

To build this package, do the following steps:

  1. meson setup build
  2. ninja -C build

To clean the repository run rm -rf build.

D-Bus bus names

To enable the use of Linux features like cgroups prioritization and udev/systemd control, one instance of phosphor-hwmon is intended to be run per hwmon sysfs class instance.

This requires an algorithm for selecting a stable, well-known D-Bus busname.

The algorithm is <PREFIX>-<ID>.Hwmon<N> where PREFIX is a meson configurable prefix (BUSNAME_PREFIX=xyz.openbmc_project by default), ID is either a std::hash of the /sys/devices path backing the hwmon class instance or provided suffix value from the command line, and N is the implemented phosphor-hwmon D-Bus API version.

phosphor-hwmon's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phosphor-hwmon's Issues

Investigate poll and cache for sensor values instead of on-demand reads

@nasamuffin commented on Mon Jun 26 2017

With both the aspeed fantach sensors and the 1-wire temperature sensors, we encountered long sensor read times leading to timeouts all the way up in btbridge, causing hard-to-diagnose failures from the host side IPMI handling.

These sensors may not be the only very slow sensors we run across. An arbitrary 5-second timeout in btbridge may eventually prove too short for another sensor. And reads appear very slow to the host when they may not need to be.

Joel and Cyril suggested polling sensors with a background thread and reading the most recent value out over IPMI to reduce latency, and I agree that this is the correct approach if we can ensure that the polling thread will time out appropriately if the sensor is unresponsive.

(host-ipmid maybe isn't the right place for this, but it's the area that's being affected with sneaky failures, so it seemed like as good a place as any.)


@williamspatrick commented on Mon Jun 26 2017

Isn't this an issue with either the hwmon driver itself or phosphor-hwmon? I don't see it correct to add special code to ipmi providers because we'll also have this same trouble for REST, Redfish, etc.

[c++] [hwmon] std::ifstream read file with timeout so long

In https://github.com/openbmc/phosphor-hwmon => hwmonio.cpp , I saw we use the std::ifstream to open and read a device sensor.

But, I met an issue with it. In case the sensor was disabled (Ex: the Fan was unplugged), the std::ifstream read will take a long time . The timeout in there is so long. It makes my system have a BIG delay in each checking sensor.

Other observation : In case the sensor device is ready, the time for sensor reading is expected.

Measuring std::ifstream reading:

In case unplugged sensor: 91385 microseconds
In case plugged sensor. : 507 microseconds

The patch to measure the std::ifstream reading, please see attachment !

Unexpected behavior you saw

The timeout in there is so long

Expected behavior

Do we have any better solution in this case? take less more timeout.

Screen Shot 2020-10-03 at 11 32 36 AM

Implement an alternative to NEGATIVE_ERRNO_ON_FAIL

With NEGATIVE_ERRNO_ON_FAIL, we're returning erroneous sensor reads with negative value. This is a problem when we have a sensor with a valid negative value, as we would treat this as an erroneous sensor.

Come up with with an alternative way to handle sensor read failure.

Don't exit on read failures, and don't spam error logs

Currently, on a read failure, phosphor-hwmon will create and error log and immediately exit. I've seen systems in a customer setting where systemd would successfully restart the service, and then a few seconds later there would be another failure, and this would continue on and on, creating hundreds or more event logs.

What we have a need for is:

  1. After some number of read failures spread out over a certain amount of time, create a single event log and don't exit the program.
  2. Set OperationalStatus.Functional to false on the affected sensors
  3. Don't create another event log for any failure on any file in the same directory until Chassis power is cycled.

We (IBM) plan on implementing this.

Ensure UPDATE_FUNCTIONAL_ON_FAIL always performs REMOVE RCs check

This issue was brought up from this code review: https://gerrit.openbmc-project.xyz/c/openbmc/phosphor-hwmon/+/22856

UPDATE_FUNCTIONAL_ON_FAIL was introduced as a replacement for NEGATIVE_ERRNO_ON_FAIL. To make sure there was no regression for machines relying on NEGATIVE_ERRNO_ON_FAIL, we had to ensure that:

  1. Even if a sensor read fails, Sensor::addValue continues to create a value interface and returns this value interface's shared_ptr
  2. MainLoop::getObject continues and adds the created shared_ptr to _sensorObjects

Failure to satisfy the 2 conditions meant that if a sensor was temporary unavailable during the initial boot up when getObject is executed, the sensor will never be added back again (I've heard that this might be a known bug but we did not see this when we were using NEGATIVE_ERRNO_ON_FAIL on our machines).

As discussed in the code review referenced above, we would like UPDATE_FUNCTIONAL_ON_FAIL to always perform REMOVE RCs checks which are currently in the catch blocks.

The check in MainLoop::read was easy to implement so it was added without problem as part of the above code review. However, making sure we perform the check AND satisfy the 2 conditions for MainLoop::getObject was tricky so this issue was created.

As part of this issue, please ensure that configure.ac for phosphor-hwmon is updated for UPDATE_FUNCTIONAL_ON_FAIL as it currently explicitly calls out the fact that REMOVE RCs check is skipped for MainLoop::getObject.

systemctl start max31785-hwmon-helper@ahb-apb-bus\x401e78a000-i2c\x2dbus\x40100-max31785\x4052.service' [147] timed out

Looks like this error has been hitting for a while on witherspoon based systems without much fallout. Recently, upstream yocto picked up a change that now is not compatible with this failing.

We're utilizing a CI image from https://gerrit.openbmc-project.xyz/c/openbmc/openbmc/+/47229/

Sep 27 17:20:43 witherspoon-Y230UF71K03T systemd-udevd[112]: hwmon9: Spawned process '/bin/systemctl start max31785-hwmon-helper@ahb-apb-bus\x401e78a000-i2c\x2dbus\x40100-max31785\x4052.service' [147] is taking longer than 59s to complete
Sep 27 17:20:43 witherspoon-Y230UF71K03T systemd-udevd[101]: hwmon9: Worker [112] processing SEQNUM=858 is taking a long time
Sep 27 17:22:00 witherspoon-Y230UF71K03T systemd[1]: systemd-udev-settle.service: Main process exited, code=exited, status=1/FAILURE
Sep 27 17:22:00 witherspoon-Y230UF71K03T systemd[1]: systemd-udev-settle.service: Failed with result 'exit-code'.
Sep 27 17:22:00 witherspoon-Y230UF71K03T systemd[1]: Failed to start Wait for udev To Complete Device Initialization.

You can see a bit later the hwmon service times out all together:

Sep 27 17:22:43 witherspoon-Y230UF71K03T systemd-udevd[112]: hwmon9: Spawned process '/bin/systemctl start max31785-hwmon-helper@ahb-apb-bus\x401e78a000-i2c\x2dbus\x40100-max31785\x4052.service' [147] timed out after 2min 59s, killing
Sep 27 17:22:43 witherspoon-Y230UF71K03T systemd-udevd[101]: hwmon9: Worker [112] processing SEQNUM=858 killed
Sep 27 17:22:43 witherspoon-Y230UF71K03T systemd-udevd[101]: Worker [112] terminated by signal 9 (KILL)
Sep 27 17:22:43 witherspoon-Y230UF71K03T systemd-udevd[101]: hwmon9: Worker [112] failed

Config option to require sensors be present in sysfs

A problem has been determined on some devices where its possible that sensor(s) may not have associated sysfs file(s) created when the driver is bound and hwmon is started for the device. Currently there is no check between what sensors are presented in sysfs to what's provided in the hwmon config file for the device, essentially making sensors for all devices optional for hwmon to create on dbus.

A hwmon config option is needed to require what's given in the config file to also be present in sysfs. When the sysfs file for a sensor does not exist, hwmon should either log an error and continue OR hwmon should exit. This requires additional definition on how the config entry is defined.

Avoid usage of device tree paths for config files

A few times now, something in the device tree has changed, forcing us to go and change the paths in all the hwmon config files which are based on udev information which is always dts based.

We want to come up with a design that uses a config file path that is device tree independent.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.