CC Open Source Blog

NVMEe on Debian on AWS

author's gravatar

by Timid Robot Zehta on 2020-04-03

Problem

The current Creative Commons infrastructure buildouts use Debian GNU/Linux AWS EC2 instances with EBS volumes. Depending on chance (or race conditions), the mapping of block devices can be different from one host to another or between reboots.

Occasionally, devices can respond to discovery in a different order in subsequent instance starts, which causes the device name to change. (Amazon EBS and NVMe on Linux Instances - Amazon Elastic Compute Cloud)

Our Solution

Modern Amazon Linux AMIs resolve this by providing a udev rule, but Debian GNU/Linux does not yet do this. To ensure our systems are configured correctly, At Creative Commons, we use the device specified during provisioning (ex. /dev/xvdf) to identify the correct NVMEe device. We then format it with a label that can be used mounting during subsequent reboots.

Thankfully, AWS documents the the device specified during provisioning (ex. /dev/xvdf):

For Nitro-based instances, the block device mappings that are specified in the Amazon EC2 console when you are attaching an EBS volume or during AttachVolume or RunInstances API calls are captured in the vendor-specific data field of the NVMe controller identification. (Amazon EBS and NVMe on Linux Instances - Amazon Elastic Compute Cloud)

We use SaltStack (creativecommons/sre-salt-prime) to:

  1. Install the nvme-cli package
  2. Use the nvme command to detect which /dev/nvme?n? contains spec (ex. xvdf) in the NVMe vendor specific data
  3. Create a symlink (ex. /dev/xvdf -> /dev/nvme1n1) so that SaltStack can use /dev/xvdf for the initial setup
  4. Perform the intial setup
  5. Delete the symlink since:
    1. The initial setup formatted the volume with a label that is used to mount the filesystem
    2. There is no guarantee the symlink will be accurate on subsequent reboots and it might cause confusion

The states/mount/init.sls state includes a complex shell command (with Jinja2 variables) that loops through the NVMe devices and finds the correct one:

for n in /dev/nvme?n?
do
    if nvme id-ctrl -v ${n} | grep -q '^0000:.*{{ spec_short }}'
    then
        ln -s ${n} {{ spec_long }}
    fi
done

Example variable values:

Jinja2 Variable Example Value
{{ spec_short }} xvdf
{{ spec_long }} /dev/xvdf

Other Solutions

While doing additional research for this blog post, I found additional solutions to the same problem. They're all good, but I apprecite the simplicity of a temporary symlink for setup versus maintaining custom udev rules (maybe I can help contribute a udev based solution to Debian or Debian's EC2 image). I can also easily imagine a more complex solution being a better fit if/when our infrastructure provisioining become more complex.