Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
 All Forums
 Support Forums
 Cluster Support
 prevent job daemon from killing itself?

Note: You must be registered in order to post a reply.
To register, click here. Registration is FREE!

Screensize:
UserName:
Password:
Format Mode:
Format: BoldItalicizedUnderlineStrikethrough Align LeftCenteredAlign Right Horizontal Rule Insert HyperlinkInsert EmailInsert Image Insert CodeInsert QuoteInsert List
   
Message:

* HTML is OFF
* Forum Code is ON
Smilies
Smile [:)] Big Smile [:D] Cool [8D] Blush [:I]
Tongue [:P] Evil [):] Wink [;)] Clown [:o)]
Black Eye [B)] Eight Ball [8] Frown [:(] Shy [8)]
Shocked [:0] Angry [:(!] Dead [xx(] Sleepy [|)]
Kisses [:X] Approve [^] Disapprove [V] Question [?]

 
Check here to subscribe to this topic.
   

T O P I C    R E V I E W
ark19 Posted - Jan 11 2012 : 1:55:54 PM
Hi BIAC,

Is there anyway to prevent a daemon job submitted to the cluster from killing itself when it reaches the allotted max run time? We're running a job that's taking longer than expected and getting root emails that it will soon kill itself, but by this point we'd rather it didn't.

Seems like this answer must be obvious but we don't know!

Thanks,

Annchen
6   L A T E S T    R E P L I E S    (Newest First)
ark19 Posted - Jan 12 2012 : 09:44:52 AM
Great, thanks!!
dvsmith Posted - Jan 11 2012 : 4:54:19 PM
Hey Annchen,

If you're using one of the more recent Python submission scripts from our lab, they should have some lines in there that will let you change/update variables on each loop through the script.

#Check for changes in user settings
user_settings=("/home/%s/user_settings.txt") % (username)
if os.path.isfile(user_settings):
	f=file(user_settings)
	settings=f.readlines()
	f.close()
	for line in settings:
		exec(line)

Assuming you have the above lines, you could just create the ~/user_settings.txt file and add in a line that reads:
warning_time = 99999 #send out a warning after this many hours informing you that the deamon is still running

Cheers,
David

petty Posted - Jan 11 2012 : 4:32:00 PM
Oh, i see, but unfortunately i don't think you can change anything while the process is running because you started it with a set run time

max_run_time = 720

I thought you meant a currently running cluster job, but you're talking about some process you started on the head node to submit things. Those python jobs running on the head node aren't under SGE control.


syam.gadde Posted - Jan 11 2012 : 4:18:05 PM
I haven't tried it myself, but if you want your daemon job to be suspended before it hits the self-destruct button you can try:
qmod -sj jobid
where jobid is your job number. To resurrect it, try:
qmod -usj jobid
However, make sure to unsuspend the job when things are done as the suspended job still takes up a cluster slot.
ark19 Posted - Jan 11 2012 : 4:08:38 PM
Ooh, thanks Chris, that's good to know.

I think I'm a little unsure about how the distinction between the daemon process (that we start with our python script) and the individual jobs (bash scripts) fits in here. It seems that this solution would save an individual job, but we're wondering if we can save the whole process before it gets killed, I think by this line in the python script:
os.system('kill '+str(ID))

I apologize for my confusion here, and thanks again!
petty Posted - Jan 11 2012 : 3:54:14 PM
assuming you are talking about a job that you submitted with a cutoff time you can change the request afterwards with qalter. If you previously requested other resources you'll have to include them all.

for instance:
qalter -l h_rt=8000,h_vmem=2G 3860963

qalter -l resource_list JOBID

This changes the runtime ( h_rt ) to 8000 seconds, but since h_vmem was previously set, I had to include that resource as well. You can't change h_vmem, since its a consumable resource, but you just need to feed it the same value from when the script was submitted the first time.

BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.3 seconds. Snitz Forums 2000