Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 dropped jobs
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

clithero
Junior Member

37 Posts

Posted - May 04 2010 :  1:10:37 PM  Show Profile  Reply with Quote
Hey all,
I'm running a bunch of featquery jobs and I'm seeing a lot of randomly dropped jobs. They have the following traits
- no error in log files
- job dies before log file can be moved out of home directory
- run fine when I resubmit (unless it drops again).
I can tell my script to rerun them, but thought I would mention this.
Anyone else having this issue today?
Thanks,
John

francis.favorini
Forum Admin

USA
618 Posts

Posted - May 04 2010 :  3:26:58 PM  Show Profile  Visit francis.favorini's Homepage  Reply with Quote
There seemed to be some network flakiness starting last evening and ending this morning. Might be the cause of your issue.

IT Director, Brain Imaging and Analysis Center
Go to Top of Page

Elizabeth.Selgrade
Starting Member

USA
1 Posts

Posted - May 12 2010 :  12:13:12 PM  Show Profile  Reply with Quote
Hi everyone,

I'm having the same issue that John had. Some jobs run fine, while some nearly identical jobs get dropped. Any idea of what's up?

Thanks,
Liz

ESS
Go to Top of Page

clithero
Junior Member

37 Posts

Posted - May 28 2010 :  2:05:37 PM  Show Profile  Reply with Quote
Hello all,

I have been submitting FSL jobs and I'm seeing jobs from all levels (first, second, or third) randomly not complete....no output files on Munin generated, or even the fsf template. Most jobs go through as normal. Sometimes takes 2-3 times to get the job to run. Since the jobs don't finish, the log files stay in my home directory on Einstein. They all simply have the following in them (warned they might be binary when I open them):

ESC[HESC[J

Any thoughts?

Thanks,
John
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - May 28 2010 :  2:20:11 PM  Show Profile  Reply with Quote
same thing was happening to lars this morning ... there were some windows characters in his script (not sure how they got there) ... but his output files looked the same.

i did a dos2unix on his script and it re-ran without any issues.
Go to Top of Page

djp16
Starting Member

9 Posts

Posted - Jun 02 2010 :  1:25:59 PM  Show Profile  Reply with Quote
I had a couple myself. I noticed they occurred on nodes 19 and 24, if it matters. I ran the same scripts with success on later submission.
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Jun 02 2010 :  1:55:34 PM  Show Profile  Reply with Quote
mount manager was frozen on both of those nodes, thanks.
Go to Top of Page

djp16
Starting Member

9 Posts

Posted - Jun 04 2010 :  5:26:34 PM  Show Profile  Reply with Quote
node 4 is having problems - June 4
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.45 seconds. Snitz Forums 2000