Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 RAM per node on the Cluster
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

dvsmith
Advanced Member

USA
218 Posts

Posted - Mar 07 2008 :  12:40:14 PM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
Hi,

How much RAM does each node of the Cluster have? Is this RAM shared across all the processors on the node? For example, if a node had 2GBs of RAM, would 4 jobs on it get about 500MBs of RAM?

I'm trying to figure out why I've been getting some memory errors running MELODIC on the cluster, so any information along these lines would be great.

Thanks,
David

syam.gadde
BIAC Staff

USA
421 Posts

Posted - Mar 07 2008 :  12:49:26 PM  Show Profile  Reply with Quote
I believe they each have 8GB of memory, shared among the 4 cores.
Go to Top of Page

francis.favorini
Forum Admin

USA
618 Posts

Posted - Mar 07 2008 :  1:54:39 PM  Show Profile  Visit francis.favorini's Homepage  Reply with Quote
That is correct. Also, unless you are running 64-bit code, you won't be able to address all 8 GB of memory. At most, 32-bit processes can access 4 GB, but it will probably be less, due to how the OS allocates memory.

-Francis

IT Director, Brain Imaging and Analysis Center
Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Mar 07 2008 :  3:05:23 PM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
So, am I right in saying that 4 jobs processing on 4 separate processors/cores would essentially be using 1GB of RAM each? Sorry for such a basic question, but it seems like if I'm running jobs that like to have 2GBs of RAM each, then this might be problematic with 4 jobs (requiring 2 GBs each) running in parallel. Is this completely off base?

Thanks,
David
Go to Top of Page

syam.gadde
BIAC Staff

USA
421 Posts

Posted - Mar 07 2008 :  4:12:30 PM  Show Profile  Reply with Quote
More likely each would comfortably get 1.5GB or more per process.

Looking through the docs, you might be able to request more per process (and thereby restricting the number of these processes that might be able to run on a node). I'm not sure how you do this, but there is a reference to something like:

qsub -l mem=4096

to request 4GB of memory.
Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Mar 07 2008 :  4:48:18 PM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
Thanks, Syam.

I'll try this out. This should help with some of the issues I hope. I'll investigate the issues with MELODIC a little further -- I need to make sure it just wasn't always crashing on node4. Christian Beckmann (MELODIC guy at FMRIB) assured me it was something wrong with our system or a memory problem, but Francis thought it was a MELODIC issue (http://www.biac.duke.edu/forums/topic.asp?TOPIC_ID=1109). I think Francis is right, but I want to definitively rule out the memory explanation before writing FSL again.

Let me throw one more situation out there regarding node4: What if several people had interactive jobs and regular qsub jobs were still going to node4. It seems as though memory issues could arise since the people with interactive jobs should be consuming a portion of the available RAM. I've noticed lots of jobs crashing on node4, so could this be an explanation? If not, what's an alternative hypothesis for the occasional job failures on node4?

Thanks,
David


Go to Top of Page

mcarter
New Member

22 Posts

Posted - Mar 07 2008 :  6:50:26 PM  Show Profile  Reply with Quote

This seems like it might be even uglier when we are running some of the more memory intensive DTI apps. Is there anyway we can take a look at memory usage and see if we are capping out and compare those times to failed jobs? The failure rate is getting to be high enough that some jobs are being run 5 or so times (not repeating what was completed) so it would free up some computational time if we could get it worked out.
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Mar 10 2008 :  12:04:42 PM  Show Profile  Reply with Quote
do these failed jobs happen to be writing to goldman?

of everything i've run on the cluster, DTI/freesurfer/feat, the only thing thats failed has been because of flakey connection to goldman and a random node6/freesurfer problem.

the dti stuff is pretty memory intensive and i've had no issues.
Go to Top of Page

dvsmith
Advanced Member

USA
218 Posts

Posted - Mar 10 2008 :  6:49:28 PM  Show Profile  Visit dvsmith's Homepage  Reply with Quote
They are writing to Goldman; however, I am pretty confident that it's not a connection issue. I used to see that all the time and I would essentially have to do everything twice (I hardly ever see it anymore now). The errors are completely different.

I think the potential problem McKell was pointing out concerned situations where you might have 3 DTI jobs on a node and there's one additional processor with presumably a lot less memory available. In that situation, would another job be shorted some memory if it were sent to the 4th processor on the node? I'm guessing no if you've been able to have 4 DTI jobs sit on a node without any trouble, but this is something we were thinking about.

Thanks,
David
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.48 seconds. Snitz Forums 2000