Skip to content

Gsatellite Future features

fscheiner edited this page May 27, 2014 · 22 revisions

Gsatellite - Future features / ideas

Not implemented yet

allow to trigger user defined events from within jobs

In addition to user defined services on specific events (JOB_START, JOB_TERMINATION) user could be able to provide user defined scripts for user defined events. From within jobs these events could be triggered by a call to a specific tool or function.

Example:

#!/bin/bash

doSomething --with data1

trigger userDefinedEvent1

doSomething --with data2

trigger userDefinedEvent2

[...]

When triggering a user defined event, either gsatlc or sputnik execute the corresponding user services.

follow job execution

Allow a user to follow job execution, i.e. watch stdout and stderr in real-time. This could be done by using tail -f on the special job files job.stdout and job.stderr. Another option would be to use watch and tail in combination, e.g.:

watch 'tail job.stdout job.stderr'

For Torque there exists a perl tool named qpeek, which allows to follow job execution. More details can be found e.g. here.

email interface

Allow job submission and manipulation with emails. The used emails have to meet at least the following requirement:

  • Emails have to be sent by a user registered on the gsatellite host(s).
  • Emails have to be signed by a user registered on the gsatellite host(s).

Unsigned emails, emails with an invalid signature or emails from senders not registered on the gsatellite host(s) are removed. Emails are encrypted by default (depending on the destination (i.e. gsatellite or user) the respective public keys are used for encryption) but encryption is not required.

Example:

A user sends a mail signed with his X.509 certificate containing the following body:

#!/bin/bash
# job type
#GSAT -T gtransfer

gt -s gsiftp://gridftp.domain1.tld:2811/~/my_files/* -d gsiftp://gridftp.domain2.tld:2811/~/my_destination/

...to the email address corresponding to the gsatellite host(s). After processing (check sender, check signature, decrypt) and submission of the job (the body of the email), the user is sent an email signed by gsatellite containing the job ID.

jobId:12345

Users should be able to use their preferred email client, given the fact that it is capable of signing an email message with an X.509 certificate.

gqwait

An additional functionality for gsatctl. This should allow to wait on a job to finish. Possible implementations could look like the following:

  1. On gqwait invocation gsatctl forks a background child that stops itself (using SIGSTOP) right after the start. At the same time gsatctl contacts gsatlc and registers a sort of callback that will wakeup (using SIGCONT) that specific child when the related job has terminated. After contacting gsatlc, gsatctl will just wait for its child to finish (e.g. with wait %1).

  2. On gqwait invocation gsatctl forks a background child that blocks execution by reading from an empty FIFO (which should have been prepared by gsatctl). At the same time gsatctl contacts gsatlc and registers a sort of callback that will terminate that specific child when the related job has terminated. After contacting gsatlc, gsatctl will just wait for its child to finish (e.g. with wait %1).

#!/bin/bash

child()
{
	echo "child" 1>&2
	
	read <FIFO
	#read
	
	return
}

echo "father" 1>&2
echo "own pid: $$" 1>&2

mkfifo FIFO

child &

echo "child pid: $!" 1>&2

wait %1

rm FIFO

exit

Implementation (1) does not work, because as soon as the child pauses itself with SIGSTOP the call to wait returns and the father exits. Implementation (2) is working and blocks execution of the father (will be gsatctl in the end) without the demand for busy waiting.

Using System V IPC for intra-node IPC

The currently used messaging system for gsatellite was designed with inter-node IPC in mind. It works on shared (inter-/intra-node IPC possible) file systems like NFS and non-shared file systems (only intra-node IPC possible). But if gsatellite is only used on a single machine the ability for inter-node IPC is not needed. Therefore different (and maybe better performing) mechanisms could be used in such a setup.

System V IPC message queues look like a perfect alternative. The Linux Programmer's Guide provides a good introduction to SysV IPC message queues and also the source code for a small tool that can be used from the shell to access these facilities.

Small file stage in/out

Small files could be staged in/out by gqsub. The file size could be checked and a user could be notified, if the file size exceeds certain limits. A use case for small file stage in/out would be tgftp batch files for batch tests.

Implemented

Notifications

Sending notifications by e.g. email like OpenPBS/Torque for job termination (-m e), begin of execution (-m b) or abort (-m a) could be a nice feature.

Implementation

This could be implemented by specific service hooks in gsatlc. E.g. gsatlc could call an onEvent() function and provide the event and other information (like job id, job exit value, etc.).

Actual onEvent() function:

# onEvent() - run system and user services on specific event
#+ @event: the event as string (e.g. QSUB)
#+ @environment: a sourceable file containing environment variables that are
#+ exported before execution of services, like e.g.:
#+
#+ GSAT_JOBNAME: user specified job name
#+
#+ GSAT_O_WORKDIR: job's work directory
#+
#+ GSAT_O_HOME: home directory of submitting user
#+
#+ GSAT_O_LOGNAME: name of submitting user
#+
#+ GSAT_O_JOBID: job id
#+
#+ GSAT_O_HOST: host on which job is currently executing
#+
#+ GSAT_O_PATH: path variable used to locate executables
#+ during job execution
#+
#+ service - A service is just a script that is executed by gsatlc if the
#+ corresponding event is triggered.
gschedule/onEvent()
{
        local _event="$1"

        # maybe also provide the return/exit value of the corresponding action
        #+ or job
        #local _returnValue="$2"

        # env vars provided by gsatlc
        local _environment="$2"

        # call user or system provided scripts from service dir named after the triggered event
        "$_GSAT_LIBEXECPATH"/run-services "$_event" "$_environment" "${_GSAT_LIBEXECPATH}/services/on${_event}" &

        "$_GSAT_LIBEXECPATH"/run-services "$_event" "$_environment" "${HOME}/.gsatellite/services/on${_event}" &

        return
}

This was made available with 898b820e9ddd087778e9877fbcc5049e7ad658f4 and c7e8e032f6ee0838f218caa7dadd21066154c1f3. Currently this is done globally (i.e. for all jobs), but an additional per job implementation is planned.

Use case - Notify users if GSI proxy certificates are going to expire soon

A background process could wake up gsatlc before a used GSI proxy certificate expires. Gsatlc could then notify a user to extend the lifetime of the used proxy certificate:

  1. get the lifetime of the proxy certificate in seconds
  2. substract 1 hour (3600s) and fork a background process that sleeps for the remaining time in seconds
  3. after the time is up, the background process sends a message to gsatlc and wakes it with SIGCONT
  4. gsatlc sends a message to the user (and hopes for the best ;)

Export specific environment

Enable special environment variables (e.g. a variable containing the job id, etc.) to the shell that executes the job.

This was made available with 3b7b5b5b712cc870ae2a9eba6cbe73937f3fe9f8