Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psusensor crashing due to activating already-active sensors #31

Open
ijsn-20 opened this issue Mar 8, 2024 · 1 comment
Open

psusensor crashing due to activating already-active sensors #31

ijsn-20 opened this issue Mar 8, 2024 · 1 comment

Comments

@ijsn-20
Copy link

ijsn-20 commented Mar 8, 2024

I've been observing that psuserver is crashing and re-starting for our system when the chassis first powers on, and I've narrowed it down to the service trying to open() the sensors' sysfs files when they're already open.
image

Here's how the issue manifests, as I understand it:

  • psusensor service first initializes, PSUSensorMain.cpp calls createSensors() which calls createSensorsCallback(), which will create a sensor object for the sensor map that first open()s the associated sysfs file as part of its constructor
  • Then, when chassis_state_manager.cpp receives signal that power ON is complete (in Chassis::sysStateChange()), it sets xyz.openbmc_project.State.ChassisCurrentPowerStatus to ChassisOn, which then sends a org.freedesktop.DBus.Properties.PropertiesChanged signal to the chassisMatch callback function in dbus-sensors Utils.cpp.
  • The callback checks to see if ChassisCurrentPowerStatus equals ChassisOn (Which I think it always would, if this signal is tied to changing the ChassisCurrentPowerStatus property), and if so, it sets the on variable to true (https://github.com/openbmc/dbus-sensors/blob/master/src/Utils.cpp#L526).
  • After that it calls the powerStateChanged() function in PSUSensorMain.cpp (which was assigned to hostStatusCallback in https://github.com/openbmc/dbus-sensors/blob/master/src/PSUSensorMain.cpp#L1221), with the newState argument passed as true (based on the on variable). Because newState is true, createSensors() is called with activateOnly set to true; and because we have a corresponding sensor object for each sensor in the map (and not a nullptr), it will call sensor->activate (without checking to see if it already isActive()). (https://github.com/openbmc/dbus-sensors/blob/master/src/PSUSensorMain.cpp#L928)
  • This will open() a file that's already open, which will cause PSUSensor to terminate itself. It can then recover and run normally, because the chassis is now on

I was wondering if this had been observed by anyone else, or if it's isolated to our configuration. The only changes we have to entity-manager are adding per-sensor power states to schemas/legacy.json, but even without these changes the issue is still observed. One method we had been using to prevent this was to only call sensor->activate() if sensor->isActive() returns false, but I didn't know if there was a more underlying problem at hand.

@huang8235
Copy link

huang8235 commented Jun 13, 2024

I encountered this problem in hwmontempsensor as well:

  • If the power off signal is sent after the power on timer callback and before open() in createsensors(), the timer.cancel() will fail, and the sysfs file will be closed and then opened.
  • If the chassis off signal is sent after chassis on callback createsensors(), the sysfs file will be opened again, leading to a crash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants