Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 prevent job daemon from killing itself?
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

ark19
Junior Member

27 Posts

Posted - Jan 11 2012 :  1:55:54 PM  Show Profile  Send ark19 an AOL message  Reply with Quote
Hi BIAC,

Is there anyway to prevent a daemon job submitted to the cluster from killing itself when it reaches the allotted max run time? We're running a job that's taking longer than expected and getting root emails that it will soon kill itself, but by this point we'd rather it didn't.

Seems like this answer must be obvious but we don't know!

Thanks,

Annchen

petty
BIAC Staff

USA
453 Posts

Posted - Jan 11 2012 :  3:54:14 PM  Show Profile  Reply with Quote
assuming you are talking about a job that you submitted with a cutoff time you can change the request afterwards with qalter. If you previously requested other resources you'll have to include them all.

for instance:
qalter -l h_rt=8000,h_vmem=2G 3860963

qalter -l resource_list JOBID

This changes the runtime ( h_rt ) to 8000 seconds, but since h_vmem was previously set, I had to include that resource as well. You can't change h_vmem, since its a consumable resource, but you just need to feed it the same value from when the script was submitted the first time.
Go to Top of Page

ark19
Junior Member

27 Posts

Posted - Jan 11 2012 :  4:08:38 PM  Show Profile  Send ark19 an AOL message  Reply with Quote
Ooh, thanks Chris, that's good to know.

I think I'm a little unsure about how the distinction between the daemon process (that we start with our python script) and the individual jobs (bash scripts) fits in here. It seems that this solution would save an individual job, but we're wondering if we can save the whole process before it gets killed, I think by this line in the python script:
os.system('kill '+str(ID))

I apologize for my confusion here, and thanks again!
Go to Top of Page

syam.gadde
BIAC Staff

USA
421 Posts

Posted - Jan 11 2012 :  4:18:05 PM  Show Profile  Reply with Quote
I haven't tried it myself, but if you want your daemon job to be suspended before it hits the self-destruct button you can try:
qmod -sj jobid
where jobid is your job number. To resurrect it, try:
qmod -usj jobid
However, make sure to unsuspend the job when things are done as the suspended job still takes up a cluster slot.
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Jan 11 2012 :  4:32:00 PM  Show Profile  Reply with Quote
Oh, i see, but unfortunately i don't think you can change anything while the process is running because you started it with a set run time

max_run_time = 720

I thought you meant a currently running cluster job, but you're talking about some process you started on the head node to submit things. Those python jobs running on the head node aren't under SGE control.


Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Jan 11 2012 :  4:54:19 PM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
Hey Annchen,

If you're using one of the more recent Python submission scripts from our lab, they should have some lines in there that will let you change/update variables on each loop through the script.

#Check for changes in user settings
user_settings=("/home/%s/user_settings.txt") % (username)
if os.path.isfile(user_settings):
	f=file(user_settings)
	settings=f.readlines()
	f.close()
	for line in settings:
		exec(line)

Assuming you have the above lines, you could just create the ~/user_settings.txt file and add in a line that reads:
warning_time = 99999 #send out a warning after this many hours informing you that the deamon is still running

Cheers,
David

Go to Top of Page

ark19
Junior Member

27 Posts

Posted - Jan 12 2012 :  09:44:52 AM  Show Profile  Send ark19 an AOL message  Reply with Quote
Great, thanks!!
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.47 seconds. Snitz Forums 2000