You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Node ran out of disk space. In the Torque, it was switched to state down with an explanatory note. After the error condition disappeared, it didn't come back to up/free automatically. The pbs_mom on particular node had to be manually restarted to repair the state.
$ pbsnodes
node1.localdomain
state = down
power_state = Running
np = 8
ntype = cluster
status = opsys=linux,uname=...,sessions=10062 10109 10274,nsessions=3,nusers=1,idletime=1444966,totmem=8010380kb,availmem=7536480kb,physmem=8010380kb,ncpus=8,loadave=0.00,message=ERROR: torque spool filesystem full,gres=,netload=15776691386,state=free,varattr= ,cpuclock=Fixed,version=6.1.1.1,rectime=1515096744,jobs=
note = ERROR: torque spool filesystem full
mom_service_port = 15002
mom_manager_port = 15003
Consider proactive operation to fix stale states automatically.
Also, long-term down nodes should be monitored as part of #10.
The text was updated successfully, but these errors were encountered:
Node ran out of disk space. In the Torque, it was switched to state down with an explanatory note. After the error condition disappeared, it didn't come back to up/free automatically. The
pbs_mom
on particular node had to be manually restarted to repair the state.Consider proactive operation to fix stale states automatically.
Also, long-term down nodes should be monitored as part of #10.
The text was updated successfully, but these errors were encountered: