-
Notifications
You must be signed in to change notification settings - Fork 2
Gsatellite Future features
In addition to user defined services on specific events (JOB_START
, JOB_TERMINATION
) user could be able to provide user defined scripts for user defined events. From within jobs these events could be triggered by a call to a specific tool or function.
Example:
#!/bin/bash
doSomething --with data1
trigger userDefinedEvent1
doSomething --with data2
trigger userDefinedEvent2
[...]
When triggering a user defined event, either gsatlc
or sputnik
execute the corresponding user services.
Allow a user to follow job execution, i.e. watch stdout
and stderr
in real-time. This could be done by
using tail -f
on the special job files job.stdout
and job.stderr
. Another option would be to use watch
and tail
in combination, e.g.:
watch 'tail job.stdout job.stderr'
For Torque there exists a perl tool named qpeek, which allows to follow job execution. More details can be found e.g. here.
Allow job submission and manipulation with emails. The used emails have to meet at least the following requirement:
- Emails have to be sent by a user registered on the gsatellite host(s).
- Emails have to be signed by a user registered on the gsatellite host(s).
Unsigned emails, emails with an invalid signature or emails from senders not registered on the gsatellite host(s) are removed. Emails are encrypted by default (depending on the destination (i.e. gsatellite or user) the respective public keys are used for encryption) but encryption is not required.
Example:
A user sends a mail signed with his X.509 certificate containing the following body:
#!/bin/bash
# job type
#GSAT -T gtransfer
gt -s gsiftp://gridftp.domain1.tld:2811/~/my_files/* -d gsiftp://gridftp.domain2.tld:2811/~/my_destination/
...to the email address corresponding to the gsatellite host(s). After processing (check sender, check signature, decrypt) and submission of the job (the body of the email), the user is sent an email signed by gsatellite containing the job ID.
jobId:12345
Users should be able to use their preferred email client, given the fact that it is capable of signing an email message with an X.509 certificate.
An additional functionality for gsatctl
. This should allow to wait on a job to finish. Possible implementations could look like the following:
-
On
gqwait
invocationgsatctl
forks a background child that stops itself (usingSIGSTOP
) right after the start. At the same timegsatctl
contactsgsatlc
and registers a sort of callback that will wakeup (usingSIGCONT
) that specific child when the related job has terminated. After contactinggsatlc
,gsatctl
will just wait for its child to finish (e.g. withwait %1
). -
On
gqwait
invocationgsatctl
forks a background child that blocks execution by reading from an empty FIFO (which should have been prepared bygsatctl
). At the same timegsatctl
contactsgsatlc
and registers a sort of callback that will terminate that specific child when the related job has terminated. After contactinggsatlc
,gsatctl
will just wait for its child to finish (e.g. withwait %1
).
#!/bin/bash
child()
{
echo "child" 1>&2
read <FIFO
#read
return
}
echo "father" 1>&2
echo "own pid: $$" 1>&2
mkfifo FIFO
child &
echo "child pid: $!" 1>&2
wait %1
rm FIFO
exit
Implementation (1) does not work, because as soon as the child pauses itself with SIGSTOP
the call to wait
returns and the father exits. Implementation (2) is working and blocks execution of the father (will be gsatctl
in the end) without the demand for busy waiting.
The currently used messaging system for gsatellite was designed with inter-node IPC in mind. It works on shared (inter-/intra-node IPC possible) file systems like NFS and non-shared file systems (only intra-node IPC possible). But if gsatellite is only used on a single machine the ability for inter-node IPC is not needed. Therefore different (and maybe better performing) mechanisms could be used in such a setup.
System V IPC message queues look like a perfect alternative. The Linux Programmer's Guide provides a good introduction to SysV IPC message queues and also the source code for a small tool that can be used from the shell to access these facilities.
Small files could be staged in/out by gqsub. The file size could be checked and a user could be notified, if the file size exceeds certain limits. A use case for small file stage in/out would be tgftp batch files for batch tests.
Sending notifications by e.g. email like OpenPBS/Torque for job termination (-m e
), begin of execution (-m b
) or abort (-m a
) could be a nice feature.
This could be implemented by specific service hooks in gsatlc
. E.g. gsatlc
could call an onEvent()
function and provide the event and other information (like job id, job exit value, etc.).
Actual onEvent()
function:
# onEvent() - run system and user services on specific event
#+ @event: the event as string (e.g. QSUB)
#+ @environment: a sourceable file containing environment variables that are
#+ exported before execution of services, like e.g.:
#+
#+ GSAT_JOBNAME: user specified job name
#+
#+ GSAT_O_WORKDIR: job's work directory
#+
#+ GSAT_O_HOME: home directory of submitting user
#+
#+ GSAT_O_LOGNAME: name of submitting user
#+
#+ GSAT_O_JOBID: job id
#+
#+ GSAT_O_HOST: host on which job is currently executing
#+
#+ GSAT_O_PATH: path variable used to locate executables
#+ during job execution
#+
#+ service - A service is just a script that is executed by gsatlc if the
#+ corresponding event is triggered.
gschedule/onEvent()
{
local _event="$1"
# maybe also provide the return/exit value of the corresponding action
#+ or job
#local _returnValue="$2"
# env vars provided by gsatlc
local _environment="$2"
# call user or system provided scripts from service dir named after the triggered event
"$_GSAT_LIBEXECPATH"/run-services "$_event" "$_environment" "${_GSAT_LIBEXECPATH}/services/on${_event}" &
"$_GSAT_LIBEXECPATH"/run-services "$_event" "$_environment" "${HOME}/.gsatellite/services/on${_event}" &
return
}
This was made available with 898b820e9ddd087778e9877fbcc5049e7ad658f4 and c7e8e032f6ee0838f218caa7dadd21066154c1f3. Currently this is done globally (i.e. for all jobs), but an additional per job implementation is planned.
A background process could wake up gsatlc before a used GSI proxy certificate expires. Gsatlc could then notify a user to extend the lifetime of the used proxy certificate:
- get the lifetime of the proxy certificate in seconds
- substract 1 hour (3600s) and fork a background process that sleeps for the remaining time in seconds
- after the time is up, the background process sends a message to gsatlc and wakes it with SIGCONT
- gsatlc sends a message to the user (and hopes for the best ;)
Enable special environment variables (e.g. a variable containing the job id, etc.) to the shell that executes the job.
This was made available with 3b7b5b5b712cc870ae2a9eba6cbe73937f3fe9f8