Duke-UNC Brain Imaging and Analysis Center
BIAC Forums | Profile | Register | Active Topics | Members | Search | FAQ
Username:
Password:
Save Password   Forgot your Password?
 All Forums
 Support Forums
 Cluster Support
 RAM +swap limit running tbss_x
 New Topic  Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

morey
BIAC Faculty

USA
25 Posts

Posted - Nov 10 2010 :  2:35:57 PM  Show Profile  Reply with Quote

We are running a TBSS crossing fibers program (tbbss_x) and running out of memory on the cluster which is proabably understandable given the extraordinarily large RAM + Swap requirements according to the FSL designers. See the email thread below. Do you have any thoughts on this ?

Raj

-----Original Message-----
From: Rajendra Morey
Sent: Wednesday, November 10, 2010 2:32 PM
To: Courtney Haswell
Subject: RE: Tbss_x

Hi Courtney :

I believe the only thing we tried was running it on the grid engine at Yale. This gave the identical error as the BIAC cluster. This was a few days before Liz was leaving and there were other project that were more pressing so nothing more was done.

40 GB of memory in RAM +swap is an astounding amount of memory. I am wandering if the we can get more than that . If the cluster nodes have a 64 bit addressable space than this may be possible, otherwise not sure where to go with this.

Raj
-----Original Message-----
From: Courtney Haswell
Sent: Wednesday, November 10, 2010 2:04 PM
To: Rajendra Morey
Subject: Tbss_x

Raj,

Liz asked the FSl group about the tbss_x error. There was a reply on June 30 that said "You might want to make sure you are not running out of memory. When ran on a group of 90 subjects the swap_subjectwise process would die with less than about 40GB memory available (RAM+swap), and the subsequent steps would fail."

The process that is failing takes a lot of memory and is writing information to tmp. Do you know you guys looked into how much memory is alloted on a node on the cluster? I didn't want to re-research it if it was already determined not to be a problem.

Thanks,
Courtney

Rajendra Morey MD

petty
BIAC Staff

USA
453 Posts

Posted - Nov 10 2010 :  3:14:58 PM  Show Profile  Reply with Quote
Well, you home directy has a limit of 1GB, so don't write anything there.

The temporary space is limited to 32GB per user, (that's all nodes combined)

Interactive nodes have 8GB or ram, and compute nodes have 32GB of ram. The OS is running as a ramdisk, so you can typically subtract 2GB from available ram.
Go to Top of Page

morey
BIAC Faculty

USA
25 Posts

Posted - Nov 11 2010 :  3:58:46 PM  Show Profile  Reply with Quote
chris, if I am understanding you correctly that the operating system is using disk space dynamically as if it were RAM and I am also understanding the maximum allocated for "RAM" per user is 32 GB and in theory this limit could be increased by some system setting in the OS. If this is so, could you increase this for us very temporarily so that we can see if our analysis moves past the point where it currently fails. This will help a great deal in resolving this problem given it is the last bit of analysis we need for this paper.

Rajendra Morey MD
Go to Top of Page

petty
BIAC Staff

USA
453 Posts

Posted - Nov 11 2010 :  4:02:45 PM  Show Profile  Reply with Quote
the operating system has no actual disk space .. its all in ram. the temporary space i mentioned is available on an nfs mount and each user is limited to 32GB collectively if writing files to temp. This appears like any other space on the disk and the nodes have 32GB of actual ram installed in them, which can't change.

so if you were the only user on 1 node, you would have approximately 30GB of actual ram available to you and 32GB of space available on the temp disk.

you aren't hitting any limits on temp, because i would've received an error from the cluster

Edited by - petty on Nov 11 2010 4:07:57 PM
Go to Top of Page
  Previous Topic Topic Next Topic  
 New Topic  Reply to Topic
 Printer Friendly
Jump To:
BIAC Forums © 2000-2010 Brain Imaging and Analysis Center Go To Top Of Page
This page was generated in 0.33 seconds. Snitz Forums 2000