| Author |
Topic  |
|
|
ark19
Junior Member
 
27 Posts |
Posted - Jan 11 2012 : 1:55:54 PM
|
Hi BIAC,
Is there anyway to prevent a daemon job submitted to the cluster from killing itself when it reaches the allotted max run time? We're running a job that's taking longer than expected and getting root emails that it will soon kill itself, but by this point we'd rather it didn't.
Seems like this answer must be obvious but we don't know!
Thanks,
Annchen |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Jan 11 2012 : 3:54:14 PM
|
assuming you are talking about a job that you submitted with a cutoff time you can change the request afterwards with qalter. If you previously requested other resources you'll have to include them all.
for instance: qalter -l h_rt=8000,h_vmem=2G 3860963
qalter -l resource_list JOBID
This changes the runtime ( h_rt ) to 8000 seconds, but since h_vmem was previously set, I had to include that resource as well. You can't change h_vmem, since its a consumable resource, but you just need to feed it the same value from when the script was submitted the first time. |
 |
|
|
ark19
Junior Member
 
27 Posts |
Posted - Jan 11 2012 : 4:08:38 PM
|
Ooh, thanks Chris, that's good to know.
I think I'm a little unsure about how the distinction between the daemon process (that we start with our python script) and the individual jobs (bash scripts) fits in here. It seems that this solution would save an individual job, but we're wondering if we can save the whole process before it gets killed, I think by this line in the python script: os.system('kill '+str(ID))
I apologize for my confusion here, and thanks again! |
 |
|
|
syam.gadde
BIAC Staff
    
USA
421 Posts |
Posted - Jan 11 2012 : 4:18:05 PM
|
I haven't tried it myself, but if you want your daemon job to be suspended before it hits the self-destruct button you can try:qmod -sj jobid where jobid is your job number. To resurrect it, try:qmod -usj jobid However, make sure to unsuspend the job when things are done as the suspended job still takes up a cluster slot. |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Jan 11 2012 : 4:32:00 PM
|
Oh, i see, but unfortunately i don't think you can change anything while the process is running because you started it with a set run time
max_run_time = 720
I thought you meant a currently running cluster job, but you're talking about some process you started on the head node to submit things. Those python jobs running on the head node aren't under SGE control.
|
 |
|
|
dvsmith
Advanced Member
    
USA
218 Posts |
Posted - Jan 11 2012 : 4:54:19 PM
|
Hey Annchen,
If you're using one of the more recent Python submission scripts from our lab, they should have some lines in there that will let you change/update variables on each loop through the script.
#Check for changes in user settings
user_settings=("/home/%s/user_settings.txt") % (username)
if os.path.isfile(user_settings):
f=file(user_settings)
settings=f.readlines()
f.close()
for line in settings:
exec(line)
Assuming you have the above lines, you could just create the ~/user_settings.txt file and add in a line that reads: warning_time = 99999 #send out a warning after this many hours informing you that the deamon is still running
Cheers, David
|
 |
|
|
ark19
Junior Member
 
27 Posts |
Posted - Jan 12 2012 : 09:44:52 AM
|
| Great, thanks!! |
 |
|
| |
Topic  |
|