OCP on AWS - Using Instance Disks for ephemeral storage

This document describe the steps used to evaluate the performance of different disks on EC2 Instances in AWS. The disk types includes ephemeral (local disks) and block storages gp2, gp3, io1 and io2.

The tool used will be the FIO, and the intention is to stress the disk, using the baseline burst IO balance of gp2 to define the total time to run the tests. For example, if the EBS gp2 of 200GiB takes 20 minutes to consume all the burst balance for the stress tests, we will repeat the same time, increasing 5 minutes, for other disks which hasn't that limitation.

Table Of Contents:

Create the environment
- Create the MachineConfig
- Create the MachineSet with Instance Type with ephemeral storage
- Create the MachineSet with extra EBS with type gp2
- Create the MachineSet with extra EBS with type gp3
- Create the MachineSet with extra EBS with type io1
- Create the MachineSet with extra EBS with type io2
Run the Benchmark
Analyse the results
Review

Create the environment

Create MachineConfig

Steps to create the MachineConfig to mount the extra device.

Create MachineSet for ephemeral disk Instance

export INSTANCE_TYPE="m6id.xlarge"
create_machineset

Create MachineSet for gp2 disk Instance

export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: gp2
          volumeSize: 230
"
create_machineset

Create MachineSet

export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: gp3
          volumeSize: 230
"
create_machineset

Create MachineSet

export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: io1
          volumeSize: 230
          iops: 3000
"
create_machineset

Create MachineSet

export INSTANCE_TYPE="m6i.xlarge"
export EXTRA_BLOCK_DEVICES="
      - deviceName: /dev/xvdb
        ebs:
          volumeType: io2
          volumeSize: 230
          iops: 3000
"
create_machineset

Run the benchmark

Analyse the Results

Review

Results

Costs

References

Create the MachineConfig

The MachineConfig should create the systemd units to:

create the filesystem on the new device
mount the device on the path /var/lib/containers
restore the SELinux context

Steps:

Export the device path presented to your instance for ephemeral device (in general /dev/nvme1n1):

export DEVICE_NAME=nvme1n1

Create the MachineConfig manifest

cat <<EOF | envsubst | oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 98-var-lib-containers
spec:
  config:
    ignition:
      version: 3.1.0
    systemd:
      units:
      - contents: |
          [Unit]
          Description=Make File System on /dev/${DEVICE_NAME}
          DefaultDependencies=no
          BindsTo=dev-${DEVICE_NAME}.device
          After=dev-${DEVICE_NAME}.device var.mount
          Before=systemd-fsck@dev-${DEVICE_NAME}.service

          [Service]
          Type=oneshot
          RemainAfterExit=yes
          ExecStart=-/bin/bash -c "/bin/rm -rf /var/lib/containers/*"
          ExecStart=/usr/lib/systemd/systemd-makefs xfs /dev/${DEVICE_NAME}
          TimeoutSec=0

          [Install]
          WantedBy=var-lib-containers.mount
        enabled: true
        name: systemd-mkfs@dev-${DEVICE_NAME}.service
      - contents: |
          [Unit]
          Description=Mount /dev/${DEVICE_NAME} to /var/lib/containers
          Before=local-fs.target
          Requires=systemd-mkfs@dev-${DEVICE_NAME}.service
          After=systemd-mkfs@dev-${DEVICE_NAME}.service

          [Mount]
          What=/dev/${DEVICE_NAME}
          Where=/var/lib/containers
          Type=xfs
          Options=defaults,prjquota

          [Install]
          WantedBy=local-fs.target
        enabled: true
        name: var-lib-containers.mount
      - contents: |
          [Unit]
          Description=Restore recursive SELinux security contexts
          DefaultDependencies=no
          After=var-lib-containers.mount
          Before=crio.service

          [Service]
          Type=oneshot
          RemainAfterExit=yes
          ExecStart=/sbin/restorecon -R /var/lib/containers/
          TimeoutSec=0

          [Install]
          WantedBy=multi-user.target graphical.target
        enabled: true
        name: restorecon-var-lib-containers.service
EOF

Create the MachineSet

The second steps is to create the MachineSet to launch the instance with ephemeral disks available. You should choose one from AWS offering. In general instances with ephemeral disks finishes the type part with the letter "d", for example the instance of the Compute optimized family (C) in the 6th-generation of Intel processors (i) with ephemeral storage, will be the type C6id.

In my case I will use the instance type and size c6id.xlarge which provides a ephemeral storage of 237 GB NVMe SSD.

export INSTANCE_TYPE=c6id.xlarge

Get the CLUSTER_ID:

export CLUSTER_ID="$(oc get infrastructure cluster \
    -o jsonpath='{.status.infrastructureName}')"

Create the MachineSet:

create_machineset() {
  # Required environment variables:
  ## DISK_TYPE         : Used to create the node label and name suffix of MachineSet
  ## CLUSTER_ID        : Can get from infrastructure object
  ## INSTANCE_TYPE     : InstanceType
  # Optional environment variables:
  ## EXTRA_EBS_DEVICE  : New EBS definition to be created  (default: '')
  ## AWS_REGION        : AWS Region (default: us-east-1)
  ## AWS_ZONE          : Availability Zone part of AWS_REGION  (default: us-east-1a)
  cat <<EOF | envsubst | oc create -f -
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  labels:
    machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
  name: ${CLUSTER_ID}-worker-${DISK_TYPE}
  namespace: openshift-machine-api
spec:
  replicas: 0
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
      machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-worker-${DISK_TYPE}
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: ${CLUSTER_ID}
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: ${CLUSTER_ID}-worker-${DISK_TYPE}
    spec:
      metadata:
        labels:
          disk_type: "${DISK_TYPE}"
      providerSpec:
        value:
          ami:
            id: ami-0722eb0819717090f
          apiVersion: machine.openshift.io/v1beta1
          blockDevices:
          - ebs:
              encrypted: true
              iops: 0
              kmsKey:
                arn: ""
              volumeSize: 120
              volumeType: gp3
${EXTRA_BLOCK_DEVICES:-}
          credentialsSecret:
            name: aws-cloud-credentials
          deviceIndex: 0
          iamInstanceProfile:
            id: ${CLUSTER_ID}-worker-profile
          instanceType: ${INSTANCE_TYPE}
          kind: AWSMachineProviderConfig
          placement:
            availabilityZone: ${AWS_ZONE:-us-east-1a}
            region:  ${AWS_REGION:-us-east-1}
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - ${CLUSTER_ID}-worker-sg
          subnet:
            filters:
            - name: tag:Name
              values:
              - ${CLUSTER_ID}-private-${AWS_ZONE:-us-east-1a}
          tags:
          - name: kubernetes.io/cluster/${CLUSTER_ID}
            value: owned
          userDataSecret:
            name: worker-user-data
EOF
}

Wait for the node be created

oc get node -l disk_type=ephemeral -w

Make sure the device has been mounted correctly to the mount path /var/lib/containers

oc debug node/$(oc get nodes -l disk_type=${disk_type} -o jsonpath='{.items[0].metadata.name}') -- chroot /host /bin/bash -c "df -h /var/lib/containers"

Review

Running fio-etcd

We will use the quick FIO test using the tool that is commonly used to evaluate the disk for etcd.

export label_disk=ephemeral
export node_name=$(oc get nodes -l disk_type=${label_disk} -o jsonpath='{.items[0].metadata.name}')
export base_path="/var/lib/containers/_benchmark_fio"

Run quick FIO test (used for etcd):

Running on ephemeral device

export disk_type=ephemeral
export base_path="/var/lib/containers/_benchmark_fio"

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "mkdir -p ${base_path}; podman run --volume ${base_path}:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf" > ./results-${disk_type}-fio_etcd.txt

Running on the root volume (EBS):

export disk_type=ebs
export base_path="/var/lib/misc/_benchmark_fio"

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "mkdir -p ${base_path}; podman run --volume ${base_path}:/var/lib/etcd:Z quay.io/openshift-scale/etcd-perf" > ./results-${disk_type}-fio_etcd.txt

Running stress test with FIO

Run stress FIO test:

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "echo \"[0] <=> \$(hostname) <=> \$(date) <=> \$(uptime) \"; \
    lsblk; \
    mkdir -p ${base_path}; \
    for offset in {1..2} ; do \
        echo \"Running [\$offset]\"; \
        podman run --rm \
            -v ${base_path}:/benchmark:Z \
            ljishen/fio \
                --ioengine=psync \
                --rw=randwrite \
                --direct=1 \
                --bs=16k \
                --size=1G \
                --numjobs=5 \
                --time_based \
                --runtime=60 \
                --group_reporting \
                --norandommap \
                --directory=/benchmark \
                --name=data_${disk_type}_\${offset} \
                --output-format=json \
                --output=/benchmark/result_\$(hostname)-${disk_type}-\${offset}.json ;\
        sleep 10; \
        rm -f ${base_path}/data_${disk_type}_* ||true ; \
        echo \"[\$offset] <=> \$(hostname) <=> \$(date) <=> \$(uptime) \"; \
    done; \
    tar cfz /tmp/benchmark-${disk_type}.tar.gz ${base_path}*/*.json" \
    2>/dev/null | tee -a ${log_stdout}

oc debug node/${node_name} -- chroot /host /bin/bash -c \
    "cat /tmp/benchmark-${disk_type}.tar.gz" \
    2>/dev/null > ./results-fio_stress-${disk_type}-${node_name}.tar.gz