Skip to content

Concepts

Oak, longer-term HPC storage for research data#

Oak is a scalable shared storage platform available for use by all Stanford faculty members and their research teams to support departmental or sponsored research.

Oak is a Lustre parallel filesystem mounted on all compute nodes on Stanford's Sherlock and SCG HPC clusters.

Oak is ideally suited for large shared datasets, curated post-processed results from job campaigns, and final results used for publication.

Oak is intended for low or moderate-risk data only!

Oak is not suitable for data classified as High Risk based on the Stanford Data Risk Classifications.

Therefore it is NOT a storage option for any data that include PHI or PII, or any data that the faculty member sponsoring this Oak account for his/her research group identifies as high risk.

Please also note that your use of this system falls under the "6.2.1 Computer and Network Usage Policy." In particular, sharing authentication credentials is strictly prohibited. See Stanford Administrative Guide for details. Violation of this policy will result in termination of access to Oak.

Oak is NOT intended for I/O-heavy jobs!

Oak likes big sequential I/Os, and does not perform well with many small (inefficient) read/write requests.

Performing I/O-heavy jobs on Oak will result in your jobs being limited in I/O and running slow. You might also impact other users' jobs running on the cluster. For I/O intensive jobs, please stage files from Oak to a high performance filesystem at the beginning of a series of jobs, and save the desired results back to Oak at the end of the job campaign. On Sherlock, $L_SCRATCH and $SCRATCH serve that purpose and unlike Oak, are built on 100% solid-state storage devices.

Directory Quotas#

Directory quotas are how we measure the amount of data an Oak space is billed for at the end of each billing cycle. As such, quotas are based on the location of files, rather than the file's user or group owner.

As an example, if you copied a 10TB file from /oak/stanford/groups/janestan to /oak/stanford/groups/leland, you could end up in a situation where the janestan group pays for 10TB that actually resides in the leland directory!

With Directory Quotas, janestan would not be obligated to pay for files that exist outside of their Oak group, and their quota would only include files that reside in their Oak group directory. Conversely, the leland group quota would now account for the extra 10TB in their Oak group directory, even though the group owner of those files might still be set as oak_janestan.