| Author |
Topic  |
|
ark19
Junior Member
 
27 Posts |
Posted - Oct 18 2011 : 11:04:39 AM
|
Hi BIAC users,
I'm a bit miffed by something I've suddenly run into. We recently began using our independent version of SPM (stored on Munin) to process our data, and had everything working just fine (ran about 250 subjects), until suddenly now we immediately get a variety of errors when we try to run the pipeline. These include:
/opt/gridengine/hugin/spool/node10/job_scripts/2453696: line 149: cd: /mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Analysis/SPM/Processed/20100215_10346: Not a directory /opt/gridengine/hugin/spool/node10/job_scripts/2453696: line 151: spm_batch1_1.m: No such file or directory
and
/mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Scripts/Tools/spm8/matlabbatch/private/cfg_mlbatch_defaults.m: Can't open file.
and
cp: cannot stat `/mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Data/Func/11111111_11111/run004_04/V0055.hdr': Bad address cp: cannot stat `/mnt/BIAC/munin.dhe.duke.edu/Hariri/DNS.01/Data/Func/11111111_11111/run004_04/V0115.img': Bad address
The errors even differ if I run the same subject with the same settings at different times, so I'm having trouble making sense of them. My only guesses are that perhaps this is somehow due to the new mount paths? I now see /mnt/BIAC/.users/ark19/munin.dhe.duke.edu/Hariri in addition to/mnt/BIAC/munin.dhe.duke.edu/Hariri and don't really understand how that distinction works.
Any insight you might have into this issue is very appreciated before I go back to square one!
Thanks,
Annchen |
|
|
syam.gadde
BIAC Staff
    
USA
421 Posts |
Posted - Oct 18 2011 : 11:06:12 AM
|
| Thanks for reporting this -- we've received other similar reports. We are trying to isolate the issue and working on fixing it. |
 |
|
|
ark19
Junior Member
 
27 Posts |
Posted - Oct 18 2011 : 11:20:38 AM
|
Thanks for your quick reply - it's quite a relief. I'll hold off on analyses for then now. JFYI, we first saw these errors (including also 'Directory access failure') late last week, got some more subjects through over the weekend, then they cropped up again yesterday evening. |
 |
|
|
syam.gadde
BIAC Staff
    
USA
421 Posts |
Posted - Oct 18 2011 : 12:28:14 PM
|
I restarted several automounters to get more informative debugging messages and then couldn't replicate the error. I'm thinking this error only happens when the automounters have been running for a while. This would explain why problems didn't occur immediately after restarts of the automounter, which happened Wednesday and Saturday.
I've restarted them all, and will continue to log on selected nodes. If we need to, we'll restart the automounters periodically, and will make sure one of our debugging nodes doesn't get restarted so we can try to continue to fix this problem and hopefully avoid the need this workaround.
Please continue to report any problems, especially for nodes 1-17 (which have logging enabled).
|
 |
|
|
ark19
Junior Member
 
27 Posts |
Posted - Oct 18 2011 : 4:43:18 PM
|
Now I'm getting the following error:
proxy location does not exist at /usr/local/bin/biacmount line 25. /opt/gridengine/hugin/spool/node60/job_scripts/2461755: line 40: EXPERIMENT: Returned NULL Experiment
Shall I continue to hold off?
Thanks as always!
|
 |
|
|
diaz
BIAC Alum
    
USA
212 Posts |
Posted - Oct 18 2011 : 5:24:55 PM
|
We're getting a similar error lnexp PicName.01 proxy location does not exist at /user/local/bin/findexp line25. /usr/local/bin/lnexp: line23: EXPERIMENT: Returned NULL Experiment
|
Michele T. Diaz, Ph.D. Associate Director Brain Imaging and Analysis Center |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Oct 18 2011 : 5:59:35 PM
|
something has corrupted the proxy filesystem's mount point.
You can still access your experiment with mntshare //server/share for now. |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Oct 18 2011 : 6:21:47 PM
|
things appear to be normal again ... please let us know if this continues as we're still trying to figure out what is going on.
|
 |
|
|
ark19
Junior Member
 
27 Posts |
Posted - Oct 18 2011 : 7:58:44 PM
|
| Awesome - it appears to be working for me so far. Many thanks! |
 |
|
|
dvsmith
Advanced Member
    
USA
218 Posts |
Posted - Oct 19 2011 : 9:05:41 PM
|
It's happening again... proxy location does not exist at /usr/local/bin/biacmount line 25. /opt/gridengine/hugin/spool/node19/job_scripts/2498375: line 48: EXPERIMENT: Returned NULL Experiment
|
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Oct 19 2011 : 9:53:51 PM
|
| restarting it on all nodes. |
 |
|
|
dvsmith
Advanced Member
    
USA
218 Posts |
Posted - Oct 19 2011 : 10:28:59 PM
|
| thanks! it's working again for now... |
 |
|
|
clithero
Junior Member
 
37 Posts |
Posted - Oct 28 2011 : 2:00:26 PM
|
Hi guys,
I am getting a bunch of the same errors: proxy location does not exist at /usr/local/bin/biacmount line 25. /opt/gridengine/hugin/spool/node12/job_scripts/2498375: line 48: EXPERIMENT: Returned NULL Experiment |
 |
|
|
rkozink
Junior Member
 
31 Posts |
Posted - Oct 28 2011 : 2:06:21 PM
|
| We recently were receiving that error message too. We changed the EXPERIMENT=`biacmount $EXPERIMENT` line to EXPERIMENT=`findexp $EXPERIMENT` and it's now working fine. |
 |
|
|
petty
BIAC Staff
    
USA
453 Posts |
Posted - Oct 28 2011 : 2:08:12 PM
|
| biacmount and findexp are actually the exact same function ( biacmount is a symbolic link to findexp ) ... Syam just restarted the mounter on this particular node. |
Edited by - petty on Oct 28 2011 2:09:11 PM |
 |
|
|
syam.gadde
BIAC Staff
    
USA
421 Posts |
Posted - Oct 28 2011 : 2:14:38 PM
|
I looked through all the nodes and node12 seemed to be the only one affected by this problem. The automounter just disappeared. I restarted it.
Two other nodes are just unreachable for some unrelated reason.
Thanks for alerting us to this. |
 |
|
Topic  |
|