Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To help improve the console issue on OpenPower systems, patch ipmitool to handle SIGHUP and SIGTERM #9

Closed
whowutwut opened this issue Aug 5, 2016 · 3 comments · Fixed by #10

Comments

@whowutwut
Copy link
Member

whowutwut commented Aug 5, 2016

In issue: xcat2/xcat-core#1090

For IBM OpenPower S822LC systems, if there's a active SOL console to the BMC and it's not gracefully stopped, a 60 second timeout is started in the BMC and after that timeout, the console becomes unresponsive.

conserver is used by xCAT to monitor the console logs through the bmc to catch things like kernel panics. If consoleondemand=no then a persistent connection is created to the BMC to log SOL output to /var/log/consoles. When a service conserver restart is issued on the OpenPower systems mentioned above, conserver sends a HUP signal...

  restart)
    $STATUS conserver >& /dev/null
    if [ "$?" != "0" ]; then
        exec $0 start
    else
        echo -n "Restarting conserver: "
        killproc conserver -HUP
    fi
    $SUCCESS
    ;;

When service conserver stop is issued

  stop) 
    $STATUS conserver >& /dev/null
    if [ "$?" != "0" ]; then
        echo -n "conserver not running, not stopping "
        $PASSED
        exit 0
    fi  
    echo -n "Shutting down conserver: "
    killproc conserver     
    rm -f /var/lock/subsys/conserver
    $STATUS conserver >& /dev/null
    if [ "$?" == "0" ]; then
        $FAILURE
        exit 1
    fi  
    $SUCCESS    
    ;;    

The killproc function is called (/etc/rc.d/init.d/functions) which will send kill -TERM.

This is not a graceful exit of ipmitool to the BMC so the timeout starts. After 60 seconds, all the console logs for the entire cluster that had an console connection to the BMC logging the SOL output is interrupted and does not recover. No more logs are saved to /var/log/messages .

Since it is more difficult to get the BMC firmware changed to correctly handle the HUP and TERM signals, we are looking at other options. xCAT ships a version of ipmitool-xcat in the xcat-deps package that we are already patching, so we could try to TERM and HUP signals and gracefully exit as ipmitool is doing with the SIGINT signal.

This may help reduce the frequency of this problem.

@whowutwut whowutwut added this to the 2.12.2 milestone Aug 5, 2016
@daniceexi daniceexi assigned zet809 and chenglch and unassigned zet809 Aug 8, 2016
@daniceexi
Copy link
Contributor

@chenglch Could you take a look of this issue?

@chenglch
Copy link
Contributor

chenglch commented Aug 9, 2016

From my investigate today, although service conserver stop is sent to stop the conserver daemon, the conserver internal still send SIGHUP to the ipmitool process.

I also find that ipmitool do not reset the termios setting when handling SIGINT signal, which cause the shell terminal can not be used any more ( I have to open a new tty session).

chenglch added a commit to chenglch/xcat-dep that referenced this issue Aug 10, 2016
- Reset tty setting after receiving INT,HUP and TERM signal.
- Add TERM, HUP handler to close the connection session to the BMC

Close-issue: xcat2#9
@whowutwut
Copy link
Member Author

Closing this issue, will start to use the ipmitool-xcat 1.8.15-3 and report new problems if we see them. Thanks @chenglch !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants