| Author |
Topic  |
|
|
autevsky
Starting Member
7 Posts |
Posted - Jan 11 2012 : 4:37:07 PM
|
Hi,
I've tried submitting a job to the cluster but received this error below, even though I know the directory and file do exist (and the characters at the bottom also caught my attention).
Any thoughts or potential fixes? Thanks!
----JOB [gICA_test.sh.3860637] START [Wed Jan 11 15:24:32 EST 2012] on HOST [node4]---- /home/avu4 /mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU /opt/gridengine/hugin/spool/node4/job_scripts/3860637: line 74: cd: /mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU: No such file or directory grep: /mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU/fsf_templates/gica_template_n4.fsf: No such file or directory while executing "exec sh -c "grep -a 'fmri(inmelodic)' $filename | tail -n 1 | awk '{ print \$3 }'" " (procedure "feat5:load" line 5) invoked from within "feat5:load -1 1 ${fsfroot}.fsf" (file "/usr/local/packages/fsl-4.1.8/bin/feat" line 132) mkdir: cannot create directory `/mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis': Function not implemented ----JOB [gICA_test.sh.3860637] STOP [Wed Jan 11 15:24:32 EST 2012]---- mv: cannot move `/home/avu4/gICA_test.sh.3860637.out' to `/mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02/Analysis/TaskData/groupICA_AU/Logs/gICA_test.sh.3860637.out': No such file or directory [H[J |
|
|
syam.gadde
BIAC Staff
    
USA
421 Posts |
Posted - Jan 11 2012 : 4:41:53 PM
|
| We saw these errors quite often in the past but haven't seen many recently, and they were not consistent. If you re-run the job does it succeed? In the past I think it was because the memory on the node was oversubscribed and the automounters died. Chris slightly reduced the amount of memory available to cluster jobs on each node and that seemed to have reduced the incidence of this. |
 |
|
|
autevsky
Starting Member
7 Posts |
Posted - Jan 11 2012 : 4:52:04 PM
|
| Thanks. I had run it twice before posting originally, and now just tried another four times, and consistently received the same error. |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Jan 11 2012 : 4:56:54 PM
|
your submission script had some window's characters in it on line 74. Which is what those crazy characters you mentioned represent.
I ran dos2unix on your submission script, then resubmitted it ( as you ). Its running now. Try not to edit anything in windows, or make sure to run dos2unix on them before running them on the linux machines.
|
 |
|
|
dvsmith
Advanced Member
    
USA
218 Posts |
Posted - Jan 11 2012 : 5:16:55 PM
|
| Weird... We actually tried dos2unix on it before submitting today and yesterday. That did the trick yesterday, but didn't work today. Is that possibly related to the syncing across nodes? We weren't sure why it would work yesterday and not today. |
 |
|
|
autevsky
Starting Member
7 Posts |
Posted - Jan 11 2012 : 7:02:30 PM
|
Thanks! As posted above, we had tried running dos2unix before the second try, and still had no luck. But thanks for your help!
quote: Originally posted by petty
your submission script had some window's characters in it on line 74. Which is what those crazy characters you mentioned represent.
I ran dos2unix on your submission script, then resubmitted it ( as you ). Its running now. Try not to edit anything in windows, or make sure to run dos2unix on them before running them on the linux machines.
|
 |
|
|
syam.gadde
BIAC Staff
    
USA
421 Posts |
Posted - Jan 11 2012 : 9:31:01 PM
|
I am able to replicate this as you (avu4) on several nodes. The first access fails, but it succeeds right after that on the same node. We saw this many times in the past. It's interesting that John and David are also accessing Imagene.02 (successfully I presume) on all these nodes too. Maybe there is a connection. I will be investigating this further. In the meanwhile, can you try a little trick and see if it helps for now? Add an ls command at the top of your script to your experiment directory:ls /mnt/BIAC/munin.dhe.duke.edu/Huettel/Imagene.02
sleep 5s The ls may fail, but any subsequent accesses (sleeping 5 seconds to be safe) should succeed. That's the idea at least. |
 |
|
| |
Topic  |
|