Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
 All Forums
 Support Forums
 Analysis Software Support
 New cluster path errors

Note: You must be registered in order to post a reply.
To register, click here. Registration is FREE!

Screensize:
UserName:
Password:
Format Mode:
Format: BoldItalicizedUnderlineStrikethrough Align LeftCenteredAlign Right Horizontal Rule Insert HyperlinkInsert EmailInsert Image Insert CodeInsert QuoteInsert List
   
Message:

* HTML is OFF
* Forum Code is ON
Smilies
Smile [:)] Big Smile [:D] Cool [8D] Blush [:I]
Tongue [:P] Evil [):] Wink [;)] Clown [:o)]
Black Eye [B)] Eight Ball [8] Frown [:(] Shy [8)]
Shocked [:0] Angry [:(!] Dead [xx(] Sleepy [|)]
Kisses [:X] Approve [^] Disapprove [V] Question [?]

 
Check here to subscribe to this topic.
   

T O P I C    R E V I E W
ark19 Posted - Oct 18 2011 : 11:04:39 AM
Hi BIAC users,

I'm a bit miffed by something I've suddenly run into. We recently began using our independent version of SPM (stored on Munin) to process our data, and had everything working just fine (ran about 250 subjects), until suddenly now we immediately get a variety of errors when we try to run the pipeline. These include:

/opt/gridengine/hugin/spool/node10/job_scripts/2453696: line 149: cd: /mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Analysis/SPM/Processed/20100215_10346: Not a directory
/opt/gridengine/hugin/spool/node10/job_scripts/2453696: line 151: spm_batch1_1.m: No such file or directory


and

/mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Scripts/Tools/spm8/matlabbatch/private/cfg_mlbatch_defaults.m:
Can't open file.


and

cp: cannot stat
`/mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Data/Func/11111111_11111/run004_04/V0055.hdr':
Bad address
cp: cannot stat
`/mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Data/Func/11111111_11111/run004_04/V0115.img':
Bad address


The errors even differ if I run the same subject with the same settings at different times, so I'm having trouble making sense of them. My only guesses are that perhaps this is somehow due to the new mount paths? I now see /mnt/BIAC/.users/ark19/munin.dhe.duke.edu/Hariri in addition to/mnt/BIAC/munin.dhe.duke.edu/Hariri and don't really understand how that distinction works.

Any insight you might have into this issue is very appreciated before I go back to square one!

Thanks,

Annchen
15   L A T E S T    R E P L I E S    (Newest First)
syam.gadde Posted - Nov 16 2011 : 12:19:10 PM
At least on the cluster, you won't be able to remove the directory if you are on the same node as another process that has the directory open. But if you are on a different node, it happily allows you to remove the directory (and the process continues to run until it is done with the directory).
francis.favorini Posted - Nov 15 2011 : 5:07:22 PM
If any process has a directory as its working directory (e.g. command prompt/shell in that dir), you will not be able to delete it.
ark19 Posted - Nov 15 2011 : 1:30:33 PM
Yes, I did find that puzzling and still don't know exactly what was happening, just that after a while the folder was gone and I could proceed. Thanks!
syam.gadde Posted - Nov 15 2011 : 11:44:24 AM
What's really interesting is that you were able to see the directory (and others) with 'ls' but you were not able to remove it. The only thing I can think of is that some other process was somehow holding on to that directory, but I tried to replicate that situation and I didn't get any errors (and it let me remove the directory).
ark19 Posted - Nov 14 2011 : 11:21:10 PM
Yes, the exact same script with the exact same subject worked once and then did not an hour later no matter how hard I fought with it. Now I tried it again another 2 hours later (changing nothing) and it worked! Sorry to bother you with this one - I had to post because I suspected the old issue. Thanks as always!
petty Posted - Nov 14 2011 : 9:07:39 PM
the script ran for this same subject previously?

Also, that folder no longer exists.
ark19 Posted - Nov 14 2011 : 6:53:52 PM
Ok, thanks. Any ideas about what could be happening here? This exact script ran perfectly 2 hours ago - now I get the above error messages, and further, an empty folder for this subject (20110915_13694) is created in /DNS.01/Analysis/SPM/Processed
and somehow I cannot delete it:

[ark19@node53 Processed]$ ls -l
...
drwx------ 7 ark19 root 2048 Nov 12 18:23 20110914_13690
drwx------ 0 ark19 root 0 Nov 14 16:55 20110915_13694
drwx------ 7 ark19 root 2048 Nov 14 14:25 20110916_13702
...
[ark19@node53 Processed]$ rm 20110915_13694
rm: cannot remove `20110915_13694': No such file or directory
[ark19@node53 Processed]$ rm -rf 20110915_13694
[ark19@node53 Processed]$ ls -l
...
drwx------ 7 ark19 root 2048 Nov 12 18:23 20110914_13690
drwx------ 0 ark19 root 0 Nov 14 16:55 20110915_13694
drwx------ 7 ark19 root 2048 Nov 14 14:25 20110916_13702
...
petty Posted - Nov 14 2011 : 6:02:38 PM
nope, that node is behaving normally and i was able to access your "Processed" folder as you on node3.
ark19 Posted - Nov 14 2011 : 5:08:20 PM
I'm seeing this error again:

mkdir: cannot create directory `/mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Analysis/SPM/Processed/20110915_13694': No such file or directory
/opt/gridengine/hugin/spool/node3/job_scripts/2807590: line 130: /mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Analysis/SPM/Processed/20110915_13694/spm_batch1_1.m: No such file or directory

Is it the same issue as before?

Thanks!
petty Posted - Oct 29 2011 : 9:06:53 PM
everyones jobs, the nodes are completely booked to the max and all the resources have been reserved.

they aren't over prescribed ... if you look there's available slots on most nodes .. which means that all the memory has been allocated.
dvsmith Posted - Oct 29 2011 : 7:44:35 PM
just 10,000 (one for each permutation i need to do)

load because of everyone's jobs or just my jobs? mine jobs only use 3 GBs of RAM and they're generally pretty quick (e.g., 30-45 minutes) except when they break like this. are they failing because of the specific node they land on (e.g., one with one of trong-kha's jobs)?
petty Posted - Oct 29 2011 : 7:31:31 PM
is this one of those things where you have like a million files in the same directory?

the load on all the nodes is so high, maybe its getting bogged down with file access
dvsmith Posted - Oct 29 2011 : 6:50:49 PM
All of the attr files are in the same directory -- but it's unlikely that any two jobs are accessing the same file at the same time. In this analysis, there is one permuted_attr_%s.txt file for each job.

I was only talking about my jobs in that list.
petty Posted - Oct 29 2011 : 6:41:29 PM
is each one of your jobs accessing the same file/directory/etc at the same time ?

Also, in that long list of jobs, how can you know other people's jobs are doing anything?
dvsmith Posted - Oct 29 2011 : 6:33:14 PM
Yeah (sorry for not being clear about that). I have a print command before the script really does anything intensive and it's never getting that far in these stuck jobs.


###packages###
import os, sys, glob, time
import numpy as N
#import pylab as P #needs display
from mvpa.suite import *
#from matplotlib.font_manager import FontProperties #needs display

###parameters and pathways###
num_sub = 140
exp_dir = 'MAINDIR' #to be replaced by sed
data_dir = os.path.join(exp_dir,'Data') #might to change this, too


###naming conventions###
test_name = "neglect" #to be replaced by sed

dummyROI = "000_000_000"
###naming conventions###
perm = "SED_PERM_SED" #to be replaced by sed

#S/Attributes/permuted_labels/permuted_attr_00001.txt
###attributes###
#participants with and without Neglect#
attr_dir = os.path.join(exp_dir, 'Attributes','permuted_labels')
attr_file = "%s/permuted_attr_%s.txt" %(attr_dir, perm)
attr = SampleAttributes(attr_file)



mask_type = "old" #to be replaced by sed

#data_types = [ 'raw', 'normed' ]
data_type = "normed"
	
##Load MRI data##
if data_type == 'normed':
	wb_file = "%s/size_LesionData_CN.nii.gz" %(data_dir)
else:
	wb_file = "%s/LesionData_CN.nii.gz" %(data_dir)

msg = "used %s data" %(data_type)
print msg



I am pretty sure it's not systematically failing when it reads in my permuted design
SampleAttributes(attr_file)
because the same exact permutation works for other jobs.

BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.43 seconds. Snitz Forums 2000