Difference between revisions of "HPC system: Teton"

From arccwiki
Jump to: navigation, search
Line 16: Line 16:
|Teton HugeMem || Intel Broadwell || x86_64 || 10 || 2 || 32 || 1 || 2.1 || 1024 || N/A || N/A || SSD || 240 || EDR || RHEL 7.4
|Teton HugeMem || Intel Broadwell || x86_64 || 10 || 2 || 32 || 1 || 2.1 || 1024 || N/A || N/A || SSD || 240 || EDR || RHEL 7.4
|Teton KNL || Intel Broadwell || x86_64 || 12 || 1 || 72 || 4 || 1.5 || 384 + 16 || N/A || N/A || SSD || 240 || EDR || RHEL 7.4
|Teton KNL || Intel Knights Landing || x86_64 || 12 || 1 || 72 || 4 || 1.5 || 384 + 16 || N/A || N/A || SSD || 240 || EDR || RHEL 7.4
|Teton DGX || Intel Broadwell || x86_64 || 1 || 2 || 20 || 2 || 2.2 || 512 || NVIDIA V100 32G || 8 ||SSD || 7 TB || EDR || Ubuntu 16.04 LTS  
|Teton DGX || Intel Broadwell || x86_64 || 1 || 2 || 20 || 2 || 2.2 || 512 || NVIDIA V100 32G || 8 ||SSD || 7 TB || EDR || Ubuntu 16.04 LTS  

Revision as of 19:52, 20 February 2019

The Teton HPC cluster is the successor to Mount Moran. Teton contains several new compute nodes. All Mount Moran nodes have been reprovisioned within the Teton HPC Cluster. The system is available by SSH using hostname teton.arcc.uwyo.edu or teton.uwyo.edu. We ask that everybody who uses ARCC resources cite the resources accordingly. See Citing Teton. Newcomers to research computing should also consider reading the Research Computing Quick Reference.


Teton is a Intel x86_64 cluster interconnected with Mellanox InfiniBand and has a 1.3 PB IBM Spectrum Scale global parallel filesystem available across all nodes. The system requires UWYO two-factor authentication (2FA) for login via SSH. The default shell is BASH with Lmod modules system is leveraged for dynamic user environments to help switch software stacks rapidly and easily. The Slurm workload manager is employed to schedule jobs, provide submission limits, and implement fairshare as well as provide the Quality of Service (QoS) levels for research groups who have invested in the cluster. Teton has a Digital Object Identifier (DOI) (https://doi.org/10.15786/M2FY47) and we request that all use of Teton appropriately acknowledges the system. Please see Citing Teton for more information.

Available Nodes

Type Series Arch Count Sockets Cores Threads / Core Clock (GHz) RAM (GB) GPU Type GPU Count Local Disk Type Local Disk Capacity (GB) IB Network Operating System
Teton Regular Intel Broadwell x86_64 180 2 32 1 2.1 128 N/A N/A SSD 240 EDR RHEL 7.4
Teton BigMem GPU Intel Broadwell x86_64 8 2 32 1 2.1 512 NVIDIA P100 16G 2 SSD 240 EDR RHEL 7.4
Teton HugeMem Intel Broadwell x86_64 10 2 32 1 2.1 1024 N/A N/A SSD 240 EDR RHEL 7.4
Teton KNL Intel Knights Landing x86_64 12 1 72 4 1.5 384 + 16 N/A N/A SSD 240 EDR RHEL 7.4
Teton DGX Intel Broadwell x86_64 1 2 20 2 2.2 512 NVIDIA V100 32G 8 SSD 7 TB EDR Ubuntu 16.04 LTS
Moran Regular Intel Sandbridge/Ivybridge x86_64 284 2 16 1 2.6 64 or 128 k20 on some 2 HD 1T FDR RHEL 7.4
Moran BigMem Intel Sandbridge/Ivybridge x86_64 2 2 16 1 2.6 512 N/A 2 HD 1T FDR RHEL 7.4
Moran Debug Intel Sandbridge/Ivybridge x86_64 2 2 16 1 2.6 64 k20m 2 HD 1T FDR RHEL 7.4
Moran HugeMem Intel Sandbridge/Ivybridge x86_64 2 2 16 1 2.6 1024 k20 2 HD 1T FDR RHEL 7.4
Moran DGX Intel Broadwell x86_64 1 2 20 2 2.2 512 NVIDIA V100 16G 8 SSD 7 TB EDR Ubuntu 16.04 LTS
TOTAL Nodes 502

See Partitions for information regarding Slurm Partitions on Teton.

Global Filesystems

The Teton global filesystem is configured with ~160 TB SSD tier for active data and 1.2 PB HDD capacity tier. The system policy engine moves data automatically between pools. The system will automatically migrate data to HDD when the SSD tier reaches 70% used capacity. Teton has several spaces that are available for users to access described in the table below.

  • home - /home/username ($HOME)
- Space for configuration files and software installations. This file space is intended to be small and always resides on SSDs. The /home file space is snapshotted to recover from accidental deletions.
  • project - /project/project_name/[username]
- Space to collaborate among project members. Data here is persistent and is exempt from purge policy.
  • gscratch - /gscratch/username ($SCRATCH)
- Space to perform computing for individual users. Data here is subject to a purge policy defined below. Warning emails will be sent when possible deletions may start to occur. No snapshots.
Global Filesystems
Filesystem Quota (GB) Snapshots Backups Purge Policy Additional Info
home 25 Yes No No Always on SSD
project 1024 No No No Aging data will move to HDD
gscratch 5120 No No Yes Aging data will move to HDD

Purge Policy - File spaces within the Teton cluster filesystem may be subject to a purge policy. The policy has not yet been defined. However, ARCC reserves the right to purge data in this area after 30 to 90 days of no access or from creation time. Before performing an actual purge event, the owner of the file(s) will be notified by email several times for files which are subject to being purged.

Storage Increases

  • Project PIs can purchase additional scratch and/or project space at a cost of $50 one-time setup fee and $100 / TB / year.
  • Additionally, researchers can request allocation increases at no cost for scratch and/or project space by submitting proposals that must be renewed every 6 months and include the following information:
    • the scientific gain and insights that will be or have been obtained by using the system,
    • how data is organized and accessed in efforts to maximize performance and usage.
  • To request more information, please contact ARCC.

Special Filesystems

Certain filesystems exist on different nodes of the cluster where specialized requirements exist. The table below summarizes these specialized filesystems.

Specialty Filesystems
Filesystem Mount Location Notes
petaLibary /petalibrary/homes Only on login nodes
/petalibrary/Commons Only on login nodes
Bighorn /bighorn/home Only on login nodes, read-only
/bighorn/project Only on login nodes, read-only
/bighorn/gscratch Only on login nodes, read-only
node local scratch /lscratch Only on compute nodes; Moran is 1 TB HDD; Teton is 240 GB SSD
memory filesystem /dev/shm RAM based tmpfs available as part of RAM for very rapid I/O operations; small capacity

The node local scratch or lscratch filesystem is purged at the end of each job.

The memory filesystems can really enhance performance of small I/O operations. If you have localized single node I/O jobs that have very intensive random access patterns, this filesystem may improve performance of your compute job.

The petaLibrary filesystems are only available from the login nodes, not on the compute nodes. A storage space on the Teton global filesystems does not imply storage space on the ARCC petaLibrary or vice versa. For more information about the petaLibrary please see the following link petaLibrary

The Bighorn filesystems will be provided for a limited amount of time in order for researchers to move data to either the petaLibrary, Teton storage or to some other storage media. The actual date that these mounts will be removed is still TBD.

Project and Account Requests

For research projects, UWYO faculty members (Principal Investigators) can request a Project be created on Teton. PIs can then add access to the project for UWYO students, faculty and external collaborators. User Accounts on Teton require a valid UWYO e-mail address and an UWYO-Affiliated PI sponsor. UWYO faculty members can sponsor their own accounts, while students, post-doctoral researchers, or research associates must use their PI as their sponsor. Non-UWYO external collaborators must be sponsored by a current UWYO faculty member.

Follow this link Account_Policy for addition information and policy statements on accout usage. Use the link under "Account Requests" to request that either a project or user(s) be created or added. From this same page you can request that users be added to an existing project.

Note, that for external collaborators a special UWYO account must be created by the ASO office before access can be granted to Teton. There is a one time fee for having these account created. Please allow extra time for the ASO office to create the account.

Once the form is submitted, and the information verified, the project and user account(s) will be created. Users will receive email notification once a project has been created and/or when they are added to a project.

To request access for instructional use, send email to arcc-info@uwyo.edu with the course number, section and student list. If the PI prefers generic accounts can be created instead of providing a student list. Instructional accounts are usually valid for a single semester and access to the project is terminated at the beginning of the next semester.

System Access

SSH Access

Teton has login nodes for users to access the cluster. Login nodes are available publicly using the hostname teton.arcc.uwyo.edu or teton.uwyo.edu. SSH can be done natively on MacOS or Linux based operating systems using the terminal and the ssh command. Although X11 forwarding is supported, and if you need graphical support, we recommend using FastX if at all possible. Additionally, you may want to configure your OpenSSH client to support connection multiplexing if you require multiple terminal sessions. For those instances where you have unreliable network connectivity, you may want to use either tmux or screen once you login to keep sessions alive during disconnects. This will allow you to later reconnect to these sessions.

ssh USERNAME@teton.arcc.uwyo.edu
ssh -l USERNAME teton.arcc.uwyo.edu
ssh -Y -l USERNAME teton.arcc.uwyo.edu                          # For secure forwarding of X11 displays
ssh -X -l USERNAME teton.arcc.uwyo.edu                          # For forwarding of X11 displays

OpenSSH Configuration File (BSD,Linux,MacOS)

By default, the OpenSSH user configuration file is $HOME/.ssh/config which can be edited to enhance workflow. Since Teton uses round-robin DNS to provide access to two login nodes and requires two-factor authentication, it can be advantageous to add SSH multiplexing to your local environment to make sure subsequent connections are made to the same login node. This also provides a way to shorten up the hostname and access methods for SCP/SFTP/Rsync capabilities. An example entry looks like where USERNAME would be replaced by your actual UWYO username:

Host teton
  Hostname teton.arcc.uwyo.edu
  controlmaster auto
  controlpath ~/.ss/ssh-%r@%h:%p

WARNING: While ARCC allows SSH multiplexing, other research computing sites may not. Do not assume this will always work on systems not administered by ARCC.

Access from Microsoft Windows

ARCC currently recommends that users install MobaXterm to access the Teton cluster. It provides appropriate access to the system with SSH and SFTP capability, allowing X11 if required. The home version of MobaXterm should be sufficient. There is also PuTTY if a more minimal application is desired.

Addtional options include, a cygwin installation with SSH installed or the Windows Subsystem for Linux with an OpenSSH client installed on very recent versions of windows, enabling the OpenSSH client. Finally, a great alternative is to use our FastX capability.

FastX Access

If your currently on the UW campus, you can also leverage FastX to provide you with a more robust remote graphics capability via a installable client for Windows, Mac, or Linux or through a web browser. Navigate to https://fastx.arcc.uwyo.edu and log in with your 2FA credentials. There are also native clients for FastX for Windows, MacOS, and Linux which can be downloaded here. For more information, see the documentation on using FastX.

Available Shells

Teton has several shells available for use. The default is bash]. To change your default shell, please submit the request through standard ARCC request methods.

Shell Path Version Notes
bash /bin/bash 4.2.46 Recommended
zsh /bin/zsh 5.0.2
csh /bin/csh 6.18.01 Implemented by TCSH
tcsh /bin/tcsh 6.18.01

Data Transfer & Access

  1. Teton Cluster Filesystem
    1. SMB / CIFS Access
    2. NFS Access
  2. ARCC Bighorn (Mt Moran) Filesystem
  3. ARCC petaLibrary Filesystem

Job Scheduling Slurm

  1. Required Inputs and Default Values and Limits
    1. Default Values
    2. Default Limits
  2. Partitions

Running jobs on the Teton cluster require the user to specify a list of partitions a job may run in. The user should use "moran,teton" or "teton,moran" most of the time. The order specifies the order SLURM searches for nodes to allocate.

Speciality partitions are used to specify particular resources, i.e. GPU nodes that require them.

  1. General Partitions
  2. Investor Partitions
  3. Special Partitions

Purchasing Investor Nodes for Teton

Quick Links

Here are some quick links to some additional documentation on using the system.

Base Operations

Access to software

Schedule jobs and query system

Workflow Software

  • SSH Connection Multiplexing
  • Software Multiplexers - Keep your sessions alive

Programming on HPC cluster

Extra Help

  • Requesting software builds
  • Requesting project accounts
  • Requesting user accounts
  • Requesting class accounts
  • Requesting increased storage allocation