Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 Path issues
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

autevsky
Starting Member

7 Posts

Posted - Jan 11 2012 :  4:37:07 PM  Show Profile  Reply with Quote
Hi,

I've tried submitting a job to the cluster but received this error below, even though I know the directory and file do exist (and the characters at the bottom also caught my attention).

Any thoughts or potential fixes?
Thanks!

----JOB [gICA_test.sh.3860637] START [Wed Jan 11 15:24:32 EST 2012] on HOST [node4]----
/home/avu4
/mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU
/opt/gridengine/hugin/spool/node4/job_scripts/3860637: line 74: cd: /mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU: No such file or directory
grep: /mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU/fsf_templates/gica_template_n4.fsf: No such file or directory
while executing
"exec sh -c "grep -a 'fmri(inmelodic)' $filename | tail -n 1 | awk '{ print \$3 }'" "
(procedure "feat5:load" line 5)
invoked from within
"feat5:load -1 1 ${fsfroot}.fsf"
(file "/usr/local/packages/fsl-4.1.8/bin/feat" line 132)
mkdir: cannot create directory `/mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis': Function not implemented
----JOB [gICA_test.sh.3860637] STOP [Wed Jan 11 15:24:32 EST 2012]----
mv: cannot move `/home/avu4/gICA_test.sh.3860637.out' to `/mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU/Logs/gICA_test.sh.3860637.out': No such file or directory


syam.gadde
BIAC Staff

USA
421 Posts

Posted - Jan 11 2012 :  4:41:53 PM  Show Profile  Reply with Quote
We saw these errors quite often in the past but haven't seen many recently, and they were not consistent. If you re-run the job does it succeed? In the past I think it was because the memory on the node was oversubscribed and the automounters died. Chris slightly reduced the amount of memory available to cluster jobs on each node and that seemed to have reduced the incidence of this.
Go to Top of Page

autevsky
Starting Member

7 Posts

Posted - Jan 11 2012 :  4:52:04 PM  Show Profile  Reply with Quote
Thanks. I had run it twice before posting originally, and now just tried another four times, and consistently received the same error.
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Jan 11 2012 :  4:56:54 PM  Show Profile  Reply with Quote
your submission script had some window's characters in it on line 74. Which is what those crazy characters you mentioned represent.

I ran dos2unix on your submission script, then resubmitted it ( as you ). Its running now. Try not to edit anything in windows, or make sure to run dos2unix on them before running them on the linux machines.


Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Jan 11 2012 :  5:16:55 PM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
Weird... We actually tried dos2unix on it before submitting today and yesterday. That did the trick yesterday, but didn't work today. Is that possibly related to the syncing across nodes? We weren't sure why it would work yesterday and not today.
Go to Top of Page

autevsky
Starting Member

7 Posts

Posted - Jan 11 2012 :  7:02:30 PM  Show Profile  Reply with Quote
Thanks! As posted above, we had tried running dos2unix before the second try, and still had no luck. But thanks for your help!


quote:
Originally posted by petty

your submission script had some window's characters in it on line 74. Which is what those crazy characters you mentioned represent.

I ran dos2unix on your submission script, then resubmitted it ( as you ). Its running now. Try not to edit anything in windows, or make sure to run dos2unix on them before running them on the linux machines.




Go to Top of Page

syam.gadde
BIAC Staff

USA
421 Posts

Posted - Jan 11 2012 :  9:31:01 PM  Show Profile  Reply with Quote
I am able to replicate this as you (avu4) on several nodes. The first access fails, but it succeeds right after that on the same node. We saw this many times in the past. It's interesting that John and David are also accessing Imagene.02 (successfully I presume) on all these nodes too. Maybe there is a connection. I will be investigating this further. In the meanwhile, can you try a little trick and see if it helps for now? Add an ls command at the top of your script to your experiment directory:
ls /mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02
sleep 5s
The ls may fail, but any subsequent accesses (sleeping 5 seconds to be safe) should succeed. That's the idea at least.
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.45 seconds. Snitz Forums 2000