High Performance Compute on AWS

3 min readMay 29, 2021

HPC has never been easier on you and your wallet. Today you can create large number of resources in no time and destroy it afterwords — no heavy investment or struggle and you have to pay for only what you use.

HPC is used in industries like genomics, finance, ML, risk modelling, self driving cars, augmented reality, AI, VR, weather predictions etc.

Services on AWS to achieve HPC

Data Transfer

HPC usually works with terabytes and petabytes of data, to send such huge amount of data to cloud you need a mechanism faster than just relying on fast internet speeds. With AWS you can achieve this in 3 ways:

Snowball: If you have petabytes of data to transfer to cloud, this is the best method to do so. You can request a snowball from AWS which is a physical storage device covered with a protective shield that can withstand high amount of pressure to keep the storage protected and transfer all your data by connecting it to a power cable. AWS then takes it back to its data centre and transfers your data to cloud.

If you have slightly lesser amount of data which would not require getting a snowball, you might consider using one of the below:

AWS DataSync: It is a virtual machine inside your data centre provided by AWS that helps gives you high performance and low latency while transferring your data to AWS.
Direct Connect: It is a way to provide a dedicated network connection from your data centre to AWS. Having a dedicated connection means more bandwidth and higher throughput while transferring data.

Compute & Networking

GPU or CPU optimized EC2 instances: AWS provides ec2 instances that are optimized for high GPU or CPU workloads. It would be a good choice to opt for one specific for your need.
Cluster Placement Group: Placing all your services in a cluster placement i.e having them in same availability zone increases the network capability and provides higher throughput and low latency. Which is essential when the compute components communicate frequently.
Elastic Fabric Adapters: HPC requires extremely efficient networking, while the basic ENI has enough capability to run your usual ec2 instances, for HPC using EFA can provide benefits like OS bypass which essentially means that your program can directly talk to the kernel enhancing performance speed.

Storage

Instance Attached Storage

Elastic Block Storage (EBS): Use provisioned IOPS SSD for better performance
Instance Store: Provides millions of IOPS and low latency

Network Attached Storage

AWS S3: Distributed object based storage, not File System
Amazon EFS: Network File System (NFS), use provisioned IOPS
FSx for Lustre: HPC optimized distributed File System, millions of IOPS, backed by S3

Automation

AWS Batch: multi-node parallel jobs that spans multiple EC2 instances, easy scheduling, can run hundreds of thousands of batch computing jobs.
AWS Parallel Cluster: Automate creation of VPC, subnets, cluster type, instance type etc using a simple text file which is easy to manage and deploy.