Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please provide systemd service files #9

Open
bdrung opened this issue Dec 17, 2018 · 5 comments
Open

Please provide systemd service files #9

bdrung opened this issue Dec 17, 2018 · 5 comments

Comments

@bdrung
Copy link
Contributor

bdrung commented Dec 17, 2018

It would be nice if opensm comes with systemd service files. Otherwise each distribution would have to create its own service files and might diverge.

@hnrose
Copy link
Contributor

hnrose commented Dec 18, 2018

I thought that the current OpenSM spec file supports old daemon management framework SysV (RHEL 6.X) .

What distributions are of interest ?

@bdrung
Copy link
Contributor Author

bdrung commented Dec 18, 2018

All recent distributions (Debian, Ubuntu, Fedora, etc) would benefit from a systemd service file. Let's quote lintian for a reasoning:

The specified init.d script has no equivalent systemd service.

Whilst systemd has a SysV init.d script compatibility mode, providing native systemd support has many advantages such as being able to specify security hardening features.

@jamespharvey20
Copy link

I'm the maintainer for Arch Linux's AUR opensm (and other InfiniBand) packages. (To be clear, AUR packages are maintained by any user who adopts the packages - InfiniBand packages are not part of Arch's official repositories.)

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

They use this separate script of theirs to allow multiple versions of opensm to run, on multiple ports.

opensm.service

Unit]
Description=Starts the OpenSM InfiniBand fabric Subnet Manager
Documentation=man:opensm
DefaultDependencies=false
Before=network.target remote-fs-pre.target
Requires=rdma.service
After=rdma.service

[Service]
Type=forking
ExecStart=/usr/libexec/opensm-launch

[Install]
WantedBy=network.target

opensm.launch

#!/bin/bash
#
# Launch the necessary OpenSM daemons for systemd
#
# sysconfig: /etc/sysconfig/opensm
# config: /etc/rdma/opensm.conf
#

shopt -s nullglob

prog=/usr/sbin/opensm
[ -f /etc/sysconfig/opensm ] && . /etc/sysconfig/opensm

[ -n "$PRIORITY" ] && prio="-p $PRIORITY"

if [ -z "$GUIDS" ]; then
   CONFIGS=""
   CONFIG_CNT=0
   for conf in /etc/rdma/opensm.conf.[0-9]*; do
      CONFIGS="$CONFIGS $conf"
      let CONFIG_CNT++
   done
else
   GUID_CNT=0
   for guid in $GUIDS; do
      let GUID_CNT++
   done
fi
# Start opensm
if [ -n "$GUIDS" ]; then
   SUBNET_COUNT=0
   for guid in $GUIDS; do
      SUBNET_PREFIX=`printf "0xfe800000000000%02d" $SUBNET_COUNT`
      (while true; do $prog $prio -g $guid --subnet_prefix $SUBNET_PREFIX; sleep 30; done) &                                                                      
      let SUBNET_COUNT++
   done
elif [ -n "$CONFIGS" ]; then
   for config in $CONFIGS; do
      (while true; do $prog $prio -F $config; sleep 30; done) &
   done
else
   (while true; do $prog $prio; sleep 30; done) &
fi
exit 0

I just tried running multiple interfaces for the first time myself, and ran across that their method of giving opensm a unique --subnet_prefix is broken, because this option is no longer a valid option for opensm. Running two instances of opensm -g <different GUIDS> appears to work, but I'm assuming at one point in the past, opensm might have complained if there were multiple versions running on the same subnet prefix.

If you do not want multiple interface support, opensm.launch can be simplified to:

#!/bin/bash

(while true; do /usr/bin/opensm; sleep 30; done) &
exit 0

opensm.sysconfig

# Problem #1: Multiple IB fabrics needing a subnet manager
#
# In the event that a machine has more than one IB subnet attached,
# and that machine is an opensm server, by default, opensm will
# only attach to one port and will not manage the fabric on the
# other port.  There are two ways to solve this problem:
#
# 1) Start opensm on multiple machines and configure it to manage
#    different fabrics on each machine
# 2) Configure opensm to start multiple instances on a single
#    machine
#
# Both solutions to this problem require non-standard configurations.
# In other words, you would normally have to modify /etc/rdma/opensm.conf
# and once you do that, the file will no longer be updated for new
# options when opensm is upgraded.  In an effort to allow people to
# have more than one subnet managed by opensm without having to modify
# the system default opensm.conf file, we have enabled two methods
# for modifying the default opensm config items needed to enable
# multiple fabric management.
#
# Method #1: Create multiple opensm.conf files in non-standard locations
#   Copy /etc/rdma/opensm.conf to /etc/rdma/opensm.conf.<number>
#     (do this once for each instance you want started)
#   Edit each copy of the opensm.conf file to reflect the necessary changes
#     for a multiple instance startup.  If you need to manage more than
#     one fabric, you will have to change the guid option in each file
#     to specify the guid of the specific port you want opensm attached
#     to.
#
# The advantage to method #1 is that, on the off chance you want to do
# really special custom things on different ports, like have different
# QoS settings depending on which port you are attached to, you have the
# freedom to edit any and all settings for each instance without those
# changes affecting other instances or being lost when opensm upgrades.
#
# Method #2: Specify multiple GUIDS variable entries in this file
#   Uncomment the below GUIDS variable and enter each guid you need to attach
#     to into the list.  If using this method you need to enter each
#     guid into the list as we won't attach to any default ports, only
#     those specified in the list.
#
#GUIDS="0x0002c90300048ca1 0x0002c90300048ca2"
#
# The obvious advantage to method #2 is that it's simple and doesn't
# clutter up your file system, but it is far more limited in what you
# can do.  If you enable method #2, then even if you create the files
# referenced in method #1, they will be ignored.
#
# Problem #2: Activating a backup subnet manager
#
# The default priority of opensm is set so that it wants to be the
# primary subnet manager.  This is great when you are only running
# opensm on one server, but if you want to have a non-primary opensm
# instance for failover, then you have to manually edit the opensm.conf
# file like for problem #1.  This carries with it all the problems
# listed above.  If you wish to enable opensm as a non-primary manager,
# then you can uncomment the PRIORITY variable below and set it to
# some number between 0 and 15, where 15 is the highest priority and
# the primary manager, with 0 being the lowest backup server.  This method
# will work with the GUIDS option above, and also with the multiple
# config files in method #1 above.  However, only a single priority is
# supported here.  If you wanted more than one priority (say this machine
# is the primary on the first fabric, and second on the second fabric,
# while the other opensm server is primary on the second fabric and
# second on the primary), then the only way to do that is to use method #1
# above and individually edit the config files.  If you edit the config
# files to set the priority and then also set the priority here, then
# this setting will override the config files and render that particular
# edit useless.
#
#PRIORITY=15

@ghost
Copy link

ghost commented Jan 10, 2019

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

There is no signal 15 failure for Fedora. Please see explanation in this bug page.
https://bugzilla.redhat.com/show_bug.cgi?id=1663785

@jamespharvey20
Copy link

When I ran into the problem of no systemd service file, I copied (and gave credit) to the systemd opensm.service file included by Fedora. I would really like to see this or a version of it included here, as well. Fedora also notes that there is a timing bug that intermittently causes a signal 15 failure on start, so their workaround is to use a separate script. I have no idea if this intermittent timing bug still exists.

There is no signal 15 failure for Fedora. Please see explanation in this bug page.
https://bugzilla.redhat.com/show_bug.cgi?id=1663785

Yeah, I was given bad info about that. At the link from HonggangLI, there's discussion of how it's done so opensm stays running, as it (at least in the past) closes in certain situations like a cable being unplugged. (The link is well worth a read.) If that's still opensm's native behavior, I think it would be nice if it was changed. I don't think anyone would want it to close in situations like that. It's of course different, but that would be like having dhcpd close whenever a client unplugged.

bdrung added a commit to bdrung/opensm that referenced this issue Jul 31, 2020
To unify the systemd services for opensm from the different
distributions, add `opensm.service` and `[email protected]`.

The `[email protected]` starts opensm on a given port. `opensm.service`
starts opensm for all available ports via the `[email protected]`.

fixes linux-rdma#9
Signed-off-by: Benjamin Drung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants