You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For IBM OpenPower S822LC systems, if there's a active SOL console to the BMC and it's not gracefully stopped, a 60 second timeout is started in the BMC and after that timeout, the console becomes unresponsive.
conserver is used by xCAT to monitor the console logs through the bmc to catch things like kernel panics. If consoleondemand=no then a persistent connection is created to the BMC to log SOL output to /var/log/consoles. When a service conserver restart is issued on the OpenPower systems mentioned above, conserver sends a HUP signal...
restart)
$STATUS conserver >& /dev/null
if [ "$?" != "0" ]; then
exec $0 start
else
echo -n "Restarting conserver: "
killproc conserver -HUP
fi
$SUCCESS
;;
When service conserver stop is issued
stop)
$STATUS conserver >& /dev/null
if [ "$?" != "0" ]; then
echo -n "conserver not running, not stopping "
$PASSED
exit 0
fi
echo -n "Shutting down conserver: "
killproc conserver
rm -f /var/lock/subsys/conserver
$STATUS conserver >& /dev/null
if [ "$?" == "0" ]; then
$FAILURE
exit 1
fi
$SUCCESS
;;
The killproc function is called (/etc/rc.d/init.d/functions) which will send kill -TERM.
This is not a graceful exit of ipmitool to the BMC so the timeout starts. After 60 seconds, all the console logs for the entire cluster that had an console connection to the BMC logging the SOL output is interrupted and does not recover. No more logs are saved to /var/log/messages .
Since it is more difficult to get the BMC firmware changed to correctly handle the HUP and TERM signals, we are looking at other options. xCAT ships a version of ipmitool-xcat in the xcat-deps package that we are already patching, so we could try to TERM and HUP signals and gracefully exit as ipmitool is doing with the SIGINT signal.
This may help reduce the frequency of this problem.
The text was updated successfully, but these errors were encountered:
From my investigate today, although service conserver stop is sent to stop the conserver daemon, the conserver internal still send SIGHUP to the ipmitool process.
I also find that ipmitool do not reset the termios setting when handling SIGINT signal, which cause the shell terminal can not be used any more ( I have to open a new tty session).
chenglch
added a commit
to chenglch/xcat-dep
that referenced
this issue
Aug 10, 2016
In issue: xcat2/xcat-core#1090
For IBM OpenPower S822LC systems, if there's a active SOL console to the BMC and it's not gracefully stopped, a 60 second timeout is started in the BMC and after that timeout, the console becomes unresponsive.
conserver is used by xCAT to monitor the console logs through the bmc to catch things like kernel panics. If
consoleondemand=no
then a persistent connection is created to the BMC to log SOL output to/var/log/consoles.
When aservice conserver restart
is issued on the OpenPower systems mentioned above, conserver sends a HUP signal...When
service conserver stop
is issuedThe
killproc
function is called (/etc/rc.d/init.d/functions) which will sendkill -TERM
.This is not a graceful exit of ipmitool to the BMC so the timeout starts. After 60 seconds, all the console logs for the entire cluster that had an console connection to the BMC logging the SOL output is interrupted and does not recover. No more logs are saved to
/var/log/messages
.Since it is more difficult to get the BMC firmware changed to correctly handle the HUP and TERM signals, we are looking at other options. xCAT ships a version of ipmitool-xcat in the xcat-deps package that we are already patching, so we could try to TERM and HUP signals and gracefully exit as ipmitool is doing with the SIGINT signal.
This may help reduce the frequency of this problem.
The text was updated successfully, but these errors were encountered: