Skip to content

CRAB vs HammerCloud

Stefano Belforte edited this page Dec 4, 2020 · 9 revisions

HammerCloud tools uses CRAB to submit jobs to CMS sites for continuous site monitoring.

This page describes which features CRAB has which are meant explicitely for HC use and are not part of general user documentation

how HammerCloud submissions are recognized and handled in CRAB.

If you do not have a place where to keep this, I guess I can make a twiki page in CRAB, but I do not want to encourage users to play with activity flag

User (i.e. you i.e. HC) sets the config. paramenter General.activity

If that contrains the string "hc" (case insensitive) CRAB flags it as an HammerCloud task and sets these classAds for reporting to MONIT so that they become keys in ES/Grafana/Kibana searches

CMS_WMTool = 'HammerCloud'
CMS_TaskType = same string as found in General.activity above
CMS_Type  = 'Test'

Be aware that CMS_Type = 'Test' is used also by WMA

besides what is reported, there's the matter of what/where is run CRAB uses a parameter in TaskWorker config [2]

config.TaskWorker.ActivitiesToRunEverywhere = ['hctest', 'hcdev']

to disable black lists [3] and stageout check [4]

so if you want e.g. to use 'hctestNew' and still want it to run at blacklisted sites, you need to tell the CRAB operators in advance so that we change config.

Alternatively you can explicitly put in crabConfig :

config.Site.ignoreGlobalBlacklist = True
config.General.transferOutputs = False
config.General.transferLogs = False

(no transfers.. no need to check [5])

[1] https://github.com/dmwm/CRABServer/blob/32066a9248142e7851ebf9ebe0dd12f95679bef4/src/python/TaskWorker/Actions/DagmanCreator.py#L424-L453

[2] https://gitlab.cern.ch/ai/it-puppet-hostgroup-vocmsglidein/-/blob/master/code/templates/crab/crabtaskworker/TaskWorkerConfig.py.erb#L94

[3] https://github.com/dmwm/CRABServer/blob/32066a9248142e7851ebf9ebe0dd12f95679bef4/src/python/TaskWorker/Actions/DagmanCreator.py#L797-L805

[4] https://github.com/dmwm/CRABServer/blob/32066a9248142e7851ebf9ebe0dd12f95679bef4/src/python/TaskWorker/Actions/StageoutCheck.py#L14-L21 https://github.com/dmwm/CRABServer/blob/32066a9248142e7851ebf9ebe0dd12f95679bef4/src/python/TaskWorker/Actions/StageoutCheck.py#L96-L100

[5] https://github.com/dmwm/CRABServer/blob/32066a9248142e7851ebf9ebe0dd12f95679bef4/src/python/TaskWorker/Actions/StageoutCheck.py#L102-L105

slow job release

for HammerCloud CRAB can release jobs in a task slowly so that they are hopefully executed in a constant flow at the sites, rather than all at the same time in O(100) job bunches

In a nutshell

  • standard operations: users submits a 100-job tasks, 100jobs are queued in HTCondor "asap" via a quick succession of condor_submit (this is done by DAGMAN)
  • slow release: user specifies in crabConfig.py this line config.Debug.extraJDL=['+CRAB_JobReleaseTimeout=Nsec'] where Nsec is an integer indicating a number of seconds
    • Then (still via DAGMAN, inserting a delay in each DAG node):
      • task starts in schedd at time T0
      • job #1 is submitted to HTCondor at T0 + Nsec
      • job #2 is submitted to HTCondor at T0 + 2*Nsec
      • ...
      • job #N is submitted to HTCondor at T0 + N*Nsec
    • there is no guarantee and no way to predict when jobs will start running, new submissions do not wait for previous jobs to complete

and here is the code, which is all in all clear enough

https://github.com/dmwm/CRABServer/blob/32066a9248142e7851ebf9ebe0dd12f95679bef4/src/python/TaskWorker/Actions/DagmanCreator.py#L581-L599

https://github.com/dmwm/CRABServer/blob/32066a9248142e7851ebf9ebe0dd12f95679bef4/src/python/TaskWorker/Actions/PreJob.py#L489-L506