Introduction to HPC
What is high performance computing (HPC)?
HPC (High Performance Computing) ignites parallel processing for running advanced applications efficiently, reliably, and quickly. In order to do this, we need to aggregate computing power in a way that delivers much higher performance than we can get out of a typical server or workstation. This enables us to deal with large tasks in science, engineering and/or business.
Some refer to HPC as Supercomputing
High performance computing architecture is built from 3 key hardware parts:
- Compute Node (CPU, GPU)
- Networking (Switch, Cables)
- Storage (Parallel File System)
And of course, a software for management and power scheduling.
1. Supermicro solutions of compute node divides into:
High Density Solution:
- Higher wattage / better cooling in the data centre
- Typically 20KW+ power per rack
- Multi node spec with more nodes per rack
- Suitable for FatTwin, TwinPro, 1UTwin, and SuperBlade
1U/2U Rackmount Solution:
- Low wattage / cooling in the data centre
- Single node configuration with less nodes per rack
- Flexibility on implementation of Add-On-Cards
- Suitable for Ultra series, 4 way servers
Hybrid Computing Solution:
- GPU / Intel Phi combined with System CPU / Memory
- Off load compute intensive portions of the application to the GPU which frees System CPU to run code
- CPU has several cores vs GPUs thousands of cores
- Suitable for GPU/Phi optimized solutions
- Next generation Xeon Phi, KNL solution
2. Networking in HPC:
- High Bandwidth Low Latency
- Involves Infiniband (QDR / FDR / EDR) and next generation Omni-Path
- Often involves Director switch / Core switch
- Minimize network hops
- Commonly uses Fat tree network topology
- Usually use fibre cables
- Typically single data port and management port
3.Storage
- Separate Storage servers are used
- Usually uses Lustre, the parallel distributed file system
- Can provide 72 or 90 drive bay storage servers
- Lots of drives required
- Scalable, tunable
- SuperStorage, SSG series or JBOD solution
A Software Solution:
Supermicro Server Management Software:
- SSM for large scale server management on the hardware level
- Health monitoring, Firmware update, Remote Control, etc.
Cluster Management Software:
- Manage a cluster from software side for scalability
- All in one package such as IBM Tivoli, StackIQ, Bright Cluster Manager, Microsoft Cluster Server, HP Insight Cluster Management Utility
Scheduler:
- Process control for job submission and queuing
- Platform LSF, Grid Engine, MOAB, SLURM
Compiler, Debugger, Asset monitoring etc.
- Intel Cluster Studio, PGI compilers
- Allinea performance tools
- Zabbix
One of the most crucial factors in creating a successful HPC environment is what we, at EIM, specialize in:
Integration and Installation:
Network design
- Ethernet
- Infiniband
- IPMI
Site inspection and cabling design:
- Floor plan
- Cable type/length/label
Onsite installation:
- Server/Switch racking
- Cabling (power/network)
- Power on test
- Benchmark test
For additional information and HPC consultation, just contact us.