Glossary
Oak Filespace Types#
| Filespace Type | What is it for? | Who's Eligible? | Additional Considerations |
|---|---|---|---|
groups | Filespace dedicated to a specific PI and their collaborators | Faculty doing sponsored research | ⭐ Most popular space ✅ Accessible From Sherlock ✅ Accessible From SCG ✅ Eligible for our optional Backup service |
projects | Filespace dedicated to a group of collaborators who span multiple PI groups. | Anyone with a PTA doing sponsored research at Stanford | ✅ Accessible From Sherlock ✅ Accessible From SCG ✅ Eligible for our optional Backup service |
orgs | Filespace shared among a Stanford Organization or Center. Usually spanning multiple PIs or subgroups | Entities with a corresponding entry in the Stanford Org Code Hierarchy. | ✅ Accessible From Sherlock ✅ Accessible From SCG ✅ Eligible for our optional Backup service |
schools | Filespace shared among a Stanford School. Usually spanning multiple PIs or subgroups | Please email SRCC-support@stanford.edu for additional information | ✅ Accessible From Sherlock ✅ Accessible From SCG ✅ Eligible for our optional Backup service |
datasets | Filespace dedicated to hosting shared datasets. | Please email SRCC-support@stanford.edu for additional information | ✅ Accessible From Sherlock ✅ Accessible From SCG ❌ Eligible for our optional Backup service ✅ Accessible without an Oak account (SUNet ID Required) |
scg | Filespace used by the Genetics Bioinformatics Service Center's SCG cluster. | The SCG Informatics Cluster resources are available for use by labs engaged in genetics and bioinformatics research. | May also be referred to as lab_$PINAME. ✅ Accessible From SCG ❌ Not accessible from Sherlock ❌ Not eligible for our backups ℹ️ More information at https://login.scg.stanford.edu/ |
filesystems (ZFS) | General-purpose NAS replacement, LFS share. | Anyone with a PTA doing sponsored research at Stanford. | ❌ Not accessible from Sherlock or SCG. ❌ Not eligible for Oak Backups. ⚠️ SMB or NFS Gateway required (additional fees apply). |
Other common terms#
Gateway#
- Gateways are services that provide access to data on Oak through various applications or protocols. Oak is made up of many pieces of storage equipment. These pieces of equipment communicate using a high-performance fiber network technology called Infiniband.
- Gateways have connections to our Infiniband network, as well as the broader Stanford network. This allows our users to enjoy the performance advantages that our Infiniband network provides, while ensuring access through services and protocols that wouldn't otherwise be available in traditional network environments
Infiniband#
- Infiniband is a networking standard that features high bandwidth and low latency. Current Infiniband devices are capable of transferring data at up to 800 Gbits/sec with less than a microsecond (μs) latency. As of this writing, the most recent Infiniband versions we use are NDR (Next-Generation Data Rate) at 400Gb/s and HDR (High Data Rate) at 200 Gbits/sec.
Primary Technical Contact#
- A designated individual in a lab (typically the Principal Investigator (PI) or a data manager) who serves as the main point of contact for the Oak team. This person is available to address questions regarding the organization of data within an Oak space or to relay technical information to the group. While communication with the primary technical contact is infrequent, it may occur during events such as maintenance windows or when a new Oak Gateway is set up.
SUNet ID#
- A SUNet ID is a Stanford University Network ID, a short alphanumeric username (e.g.
jsmith) used to log in to Stanford systems, including Oak. It is tied to a person's Stanford account and is used for authentication across university services such as email, VPN, and research computing resources.
Stanford University ID (SUID)#
- A Stanford University ID (SUID) is an 8-digit number (e.g.
05123456) that uniquely identifies an individual in Stanford's administrative systems. Unlike the SUNet ID, the SUID is a numeric identifier used in HR, student records, and other administrative contexts. It is not used for system login or authentication.
PTA#
- PTA is an acronym used for a project-task-award combination representing an account in Stanford's Oracle Financials. PTAs are used to categorize expenditures by funding source. More information is available at Stanford's Fingate page
ACL (Access Control List)#
- An ACL is a set of permissions attached to a file or directory that extends standard Unix owner/group/other permissions. ACLs allow access to be granted or denied for specific named users and groups. On Oak, ACLs are managed using the
setfaclandgetfaclcommands and are commonly used to share data within a filespace with collaborators outside the primary workgroup.
Globus#
- Globus is a research data management platform widely used in research computing for reliable, high-performance file transfers. Oak provides a Globus endpoint, making it easy to transfer data between Oak and other Globus-connected systems, including other universities and national research facilities. More information is available on the Globus gateway page.
DTN (Data Transfer Node)#
- The DTN is a dedicated gateway server (
dtn.oak.stanford.edu) that supports non-interactive data transfer protocols includingscp,rsync,sftp, andsshfs. It is the recommended method for moving large amounts of data between Oak and a local workstation or external system. More information is available on the DTN gateway page.
Kerberos / Keytab#
- Kerberos is a network authentication protocol used to verify identity securely over an untrusted network. Stanford uses Kerberos to authenticate access to Oak's NFS gateway. A keytab file is a local credential store containing encrypted keys that allow a system to authenticate automatically without prompting for a password, commonly used for scripted or automated NFS access.
Inode#
- An inode is a data structure used by a filesystem to store metadata about a file or directory, such as its size, ownership, and permissions. Oak enforces both a volume quota (in TB) and an inode quota, since each file and directory consumes one inode regardless of size. Creating large numbers of small files can exhaust an inode quota even when volume space remains available.
NFS (Network File System)#
- NFS is a network protocol that allows a remote filesystem to be mounted and accessed as if it were local storage. Oak's NFS gateway enables Linux workstations and servers on the Stanford network to mount Oak directly. NFS access requires Kerberos authentication.
Native Oak#
- Native Oak refers to Oak file spaces that align with the original design of the Oak filesystem. This includes spaces such as
group,project,org,school, anddatasets. In contrast, SCGlab_*andfilesystem(ZFS) spaces are not part of the original system design and are subject to certain limitations. These limitations include restricted access from the Sherlock cluster and ineligibility for the Oak Backup service.
SCG (Genetics Bioinformatics Service Center)#
- SCG is a high-performance computing cluster operated by Stanford's Genetics Bioinformatics Service Center, focused on genetics and bioinformatics research. SCG has its own Oak filespace type (
scg, also referred to aslab_$PINAME) with specific limitations: SCG spaces are not accessible from Sherlock and are not eligible for Oak Backups. More information is available at login.scg.stanford.edu.
Sherlock#
- Sherlock is Stanford's shared high-performance computing (HPC) cluster, operated by the Stanford Research Computing Center (SRCC). Oak is well-suited for long-term storage of research data and results. When running active job campaigns on Sherlock, the recommended workflow is to copy data from Oak to Sherlock's all-flash
$SCRATCHfilesystem for the duration of the work, then move results back to Oak for long-term retention.
SMB (Server Message Block)#
- SMB is a network file sharing protocol that allows Oak storage to be mounted as a network drive on Windows, macOS, and Linux workstations. Oak's SMB gateway is accessible from the Stanford network or via VPN. SMB is also referred to as CIFS (Common Internet File System).
Workgroup / Workgroup Manager#
- A workgroup is a named group in Stanford's Workgroup Manager identity system used to control access to Oak filespaces. Each Oak space is associated with a workgroup (e.g.
oak:groupname), and members of that workgroup are granted access. PIs and designated administrators can add or remove members using the Workgroup Manager web interface.