HPC Specifications
Nodes
- SXM Nodes: 4
- Total Nodes: 12
- Node Configuration: Each node has 8 GPUs and 2 Processors.
- Node Memory: Each node has 640 GB of RAM.
Node Interconnect
- Interconnect: InfiniBand
- Speed: 200 Gb/s communication between nodes.
Software and OS
- Operating System: Rocky Linux 8.6
- Job Scheduling: SLURM (Simple Linux Utility for Resource Management)
Storage
- RAID Configuration: RAID-6 is implemented for reliable data storage. RAID-6 provides fault tolerance by allowing the system to continue functioning even if two disk drives fail simultaneously.
- Filesystem: The system uses the WekaIO parallel filesystem, specifically designed for AI/ML tasks.
Storage Performance Metrics
- Read IOPS (Input/Output Operations Per Second): 4 million
- Write IOPS (Input/Output Operations Per Second): 630,000
- Read Bandwidth (BW): 67GB/s
- Write Bandwidth (BW): 33GB/s
Our HPC System
Is a computational powerhouse tailored for AI, ML, and high-performance computing. Its ideal for demanding workloads, this configuration delivers substantial computational power, making it well-suited for complex simulations and data-intensive applications.