Difference between revisions of "HPC system: purchasing nodes"

From arccwiki
Jump to: navigation, search
(Speciality Nodes)
Line 55: Line 55:
  
 
These nodes are special order nodes only.
 
These nodes are special order nodes only.
 +
 +
== Grant Statement ==
 +
 +
Following is a statement about the current Teton configuration which can be used as part of a grant request.
 +
 +
"The University of Wyoming hosts excellent computational resources for the proposed genomic analysis work. The Advanced Research Computing Center hosts a large, 146.36 Tflop cluster that serves the University of Wyoming. This cluster currently hosts 284 nodes each with two, 8-core Intel processors (totaling >4500 cores), and 400 TB of raw storage, and is continually growing in both computing and storage capacity.”

Revision as of 22:22, 21 February 2019

Unlike traditional clusters, Teton is a collaborative system wherein the majority of nodes are purchased and shared by the cluster users, known as condo investors.

The model for sustaining Teton is premised on faculty and principal investigators purchasing (Investors) compute nodes (individual servers) from their grants or other available funds which are then added to the cluster. This allows Investor-owned nodes to take advantage of the high speed Infiniband interconnect and high performance GPFS parallel filesystem storage associated with Teton. Operating costs for managing and housing Investor-owned compute nodes are waived in exchange for letting other users make use of any idle compute cycles on the Investor-owned nodes. Investors have priority access to computing resources equivalent to those purchased with their funds, but can access more nodes for their research if needed. This provides the Investor with much greater flexibility than owning a standalone cluster.

We use job pre-emption so that an investor has immediate access to his invested nodes. Any jobs running on an investors node will be stopped and re-queued for running at a later time.

Teton also has a number of community nodes available to all users. Jobs running on these nodes will not be pre-empted, unless a jobs node allocation includes investor nodes and which the user is not part of the investment.

The Details

Compute nodes are purchased and maintained based on a 5-year life-cycle. Investors owning the nodes will be notified during year 4 that there investment nodes will expire at the end of the 5th year. Nodes left in the cluster after five years may be removed and disposed of at the discretion of the ARCC director

Once an Investor has decided to participate, the Investor or his designate works with the ARCC team to procure a desired number of compute nodes. There is a 1-node minimum buy-in for any given compute node type i.e. Standard, Bigmem, Hugemem or GPU node. The standard node is the least expensive while GPU nodes are the most expensive. Generally, procurement takes about two to three months from start to finish. Once the nodes have been provisioned an investor partition will be created and the investor will be notified.

An investor may submit jobs to the general partitions on the cluster before the new nodes are provisioned. Jobs are subject to general partition limitations and guaranteed access to purchased node(s) and cores is not provided until purchased nodes are provisioned.

Please contact the ARCC at arcc-info@uwyo.edu for information and current pricing.

Node Types

Teton is currently architected with two generations of hardware the Lenovo DX and NX series of hardware. Starting in Janurary 2019 all expansions to Teton will be from the Lenovo ThinkSystem series of hardware.

This is a 2U chassis which supports either 4 standard nodes or 2 GPU nodes.

There are three node types: standard, Bigmem and Hugemem. A dual GPU tray may be added to any of these standard nodes. Node types are:

  • Standard Node: 128GB of 8 x 16GB TruDDR4 2666 MHz (2Rx8 1.2V) RDIMM memory
  • Bigmem: 512GB of 16 x 32GB TruDDR4 2666 MHz (2Rx8 1.2V) RDIMM memory
  • Hugemem: 1024GB of 16 x 64GB TruDDR4 2666 MHz (2Rx8 1.2V) RDIMM memory

Following is the base node specification.

  • Lenovo ThinkSystem SD530 dual socket compute node
  • Two Intel Xeon Gold 6130 16 core 2.1GHz Processor
  • One 2.5" Intel S4510 240GB 6Gb SATA Hot Swap SSD
  • One Mellanox ConnectX-4 1x100GbE/EDR IB QSFP28 VPI Adapter
  • 5Yr Next Business Day warranty
  • Ground shipping to Laramie

Additional items required for each node:

  • One 1m Mellanox EDR IB Passive Copper QSFP28 Cable
  • IBM Spectrum Scale Standard Edition Client License for 5 years
  • Once 1Gig network cable

Any of the above nodes Can be configured with dual Nvidia Tesla V100 16GB gpu's.

Speciality Nodes

There are two speciality nodes available:

  • KNL Nodes
  • Nvidia DGX GPU nodes.

Other special nodes can be architected, please contact the ARCC for help and information on obraining prices.

These nodes are special order nodes only.

Grant Statement

Following is a statement about the current Teton configuration which can be used as part of a grant request.

"The University of Wyoming hosts excellent computational resources for the proposed genomic analysis work. The Advanced Research Computing Center hosts a large, 146.36 Tflop cluster that serves the University of Wyoming. This cluster currently hosts 284 nodes each with two, 8-core Intel processors (totaling >4500 cores), and 400 TB of raw storage, and is continually growing in both computing and storage capacity.”