Skip to content

Backups#

Oak is not backed up by default#

Oak does not provide local or remote data backup by default, and should be considered as a single copy.

This page outlines an optional managed backup service that helps you send your Oak data to an approved Stanford Cloud vendor.

Minimum requirements#

Oak Backup Considerations#

Most of the expenses for Oak backups will arise from your selected cloud service provider. This portion of your backup is not billed through Stanford Research Computing (SRC), meaning we are unable to manage your cloud account or estimate your storage costs. We only charge a small monthly fee (<$10/mo) to maintain the necessary physical and virtual infrastructure that runs our backup service. For our latest fee schedule, please visit the SRCC rates page at https://uit.stanford.edu/rates/rcstorage.

Once you have a Stanford cloud account (AWS, Google Cloud, or Wasabi), we can work with you to set up a weekly backup of your Oak space, fully managed by SRCC, using an open-source backup software called restic.

We can only backup Oak data to a Stanford cloud account that is NOT approved for high-risk data due to Stanford’s Minimum Security Standards (MinSec).

If you don’t have any preference for cloud storage, we usually recommend Stanford Wasabi Cloud (us-west1) for Oak backups, as there is no cost for data recovery (egress fees), and Stanford has negotiated a competitive discount with them. Our second choice is Google Cloud COLDLINE, but egress fees apply for restores.

Here's what's required to set up our backup service:

Decide what you want to back up#

In order to make an informed decision when you consider cloud storage options, you'll want to consider the following:

  • How much data to you want backed up?
  • How frequently do you expect you'll need to restore data?
  • When restores happen, how much of your total backup will you need to retrieve?
  • How long can you tolerate being without your data?
  • To make sure our backup solution is right for you, it's a good idea to make sure that, in the worst case scenario, your expectations for partial and complete restorations of data match up with what our service (and your cloud storage target) can provide.

Choose a Cloud Backup Destination#

  • Our backup service requires that you provide us with a cloud storage target.
  • Responsibility: You
  • Cost: Varies on cloud service, amount of data stored/retrieved
    • Stanford's Cardinal Cloud service offers discounts with various cloud vendors. For details about Stanford-cloud storage options, see: https://uit.stanford.edu/cloud-vendor/reduce-cost (SUNet ID Required)
    • The cost of cloud storage is difficult to predict and will depend on how frequently you update data on Oak. Restic generally does a good job of data deduplication, and we usually see a reduction of at around 15% vs. the space used on Oak for the initial backup. After that, backups happen in weekly increments, so it may vary depending on how much (and how frequently) you modify data on Oak.

Work with the Oak team to complete your backup#

Once your preferred cloud storage option has been chosen, get in touch with us so we can send you instructions on how to create and configure your storage bucket and grant us access to perform the backups from Oak.

Default Settings for Oak Backups#

  • New snapshots are created weekly.
    • Larger backup selections (hundreds of TB or tens of millions of files) that take longer than a week to complete will attempt a new snapshot about one week from their last completed snapshot.
  • We do not forget or prune snapshots.
    • If you want to implement a snapshot retention policy, please let us know.

Restoring data#

The credentials needed to access your backups will be shared with you when we set up your service. These files contain secrets, such as the password for the restic backup repository.

There can be only one owner for these files. The PI responsible for the space is typically the person given control of this credential file.

We use standard Linux permissions to control who can access these credential files and as such, who can directly access to backups. You can grant access to other users in the group by using standard Linux permissions (ACLs). Note that users who can access these credential files are able to restore any file from the backups, even other users' files and files they couldn't read on Oak directly. We recommend that access to backups is restricted to necessary users only ( PI, lab and data managers). Oak administrators can change the ownership of these files at any time, just email us at srcc-support@stanford.edu to make a request.

To restore files yourself, you will need to use Sherlock. It is recommended to run your command in an interactive job. This is because restic tends to use more memory than the login nodes have to offer. It's easy with Sherlock's sdev command.

Start a 1-cpu interactive job:#

$ sdev

(more cpu/memory might be needed when restoring large backup directories – if you have performance or memory-related issues when using Restic, please contact us and we'll assist)

Access restic from Sherlock, first load the restic module:#

$ module load system restic

Verify the version of restic you have loaded#

All backup repositories use version 2 with added compression. Restic 0.14+ is required to read them. Verify with:

$ restic version

Restic command examples#

Restic commands might take some time to complete as it is both accessing remote, cloud resources and maintaining a local cache on Sherlock in your own $SCRATCH. The first restic commands usually take more time to populate this cache.

To list all snapshots (e.g. backups), just do:#

$ restic snapshots

Every week, you should see a new snapshot showing up. Each snapshot represents an incremental backup of your Oak space.

To list all files backed up from your latest snapshots, use:#

$ restic ls latest

To restore a file from the latest backup, please use restic restore and specify a destination path (here the /tmp directory):#

In Restic, the path to the Oak directory (eg. /oak/stanford/groups/sunetid) is always replaced with /backup

$ restic restore latest --include /backup/.../file --target /tmp

You will find your restored file in /tmp/backup/.../file

Troubleshooting the local restic cache#

When using the restic module on Sherlock, the local cache is set to $SCRATCH/.cache/restic

The cache is used to minimize the number of requests to the cloud, which can be expensive.

Because $SCRATCH is purged after 90 days, the cache might become inconsistent. If this is the case, restic will complain, so just delete the cache completely and try again.

cd $SCRATCH/.cache
rm -rf restic

Additional Information#

If you have any questions about this backup service, please send us an email at srcc-support@stanford.edu.