12. Best practices

12.1. Other best practices information

This isn’t the last word. Also consider:

12.2. Filesystems

There are two performance gotchas to be aware of for Charliecloud.

12.2.1. Metadata traffic

Directory-format container images and the Charliecloud storage directory often contain, and thus Charliecloud must manipulate, a very large number of files. For example, after running the test suite, the storage directory contains almost 140,000 files. That is, metadata traffic can be quite high.

Such images and the storage directory should be stored on a filesystem with reasonable metadata performance. Notably, this excludes Lustre, which is commonly used for scratch filesystems in HPC; i.e., don’t store these things on Lustre. NFS is usually fine, though in general it performs worse than a local filesystem.

In contrast, SquashFS images, which encapsulate the image into a single file that is mounted using FUSE at runtime, insulate the filesystem from this metadata traffic. Images in this format are suitable for any filesystem, including Lustre.

12.2.2. File copy performance

ch-image does a lot of file copying. The bulk of this is manipulating images in the storage directory. Importantly, this includes large files stored by the build cache outside its Git repository, though this feature is disabled by default.

Copies are costly both in time (to read, transfer, and write the duplicate bytes) and space (to store the bytes). However significant optimizations are sometimes available. Charliecloud’s internal file copies (unfortunately not sub-programs like Git) can take advantage of multiple optimized file-copy paths offered by Linux:

in-kernel copy

Copy data inside the kernel without passing through user-space. Saves time but not space.

server-side copy

Copy data on the server without sending it over the network, relevant only for network filesystems. Saves time but not space.

reflink copy (best)

Copy-on-write via “reflink”. The destination file gets a new inode but shares the data extents of the source file — i.e., no data are copied! — with extents unshared later if/when are written. Saves both time and space (and potentially quite a lot).

To use these optimizations, you need:

  1. Python ≥3.8, for os.copy_file_range() (docs), which wraps copy_file_range(2) (man page), which selects the best method from the three above.

  2. A new-ish Linux kernel (details vary).

  3. The right filesystem.

The following table summarizes our (possibly incorrect) understanding of filesystem support as of October 2023. For current or historical information, see the Linux source code for in-kernel filesystems or specific filesystem release nodes, e.g. ZFS. A checkmark ✅ indicates supported, ❌ unsupported. We recommend using a filesystem that supports reflink and also (if applicable) server-side copy.

in-kernel

server-side

reflink (best)

local filesystems

BTRFS

n/a

OCFS2

n/a

XFS

n/a

ZFS

n/a

✅ [1]

network filesystems

CIFS/SMB

?

NFSv3

NFSv4

✅ [2]

other situations

filesystems not listed

copies between filesystems

❌ [3]

Notes:

  1. As of ZFS 2.2.0.

  2. If the underlying exported filesystem also supports reflink.

  3. Recent kernels (≥5.18 as well as stable kernels if backported) support in-kernel file copy between filesystems, but for many kernels it is not stable, so Charliecloud does not currently attempt it.

12.3. Installing your own software

This section covers four situations for making software available inside a Charliecloud container:

  1. Third-party software installed into the image using a package manager.

  2. Third-party software compiled from source into the image.

  3. Your software installed into the image.

  4. Your software stored on the host but compiled in the container.

Note

Maybe you don’t have to install the software at all. Is there already a trustworthy image on Docker Hub you can use as a base?

12.3.1. Third-party software via package manager

This approach is the simplest and fastest way to install stuff in your image. The examples/hello Dockerfile does this to install the package openssh-client:

RUN dnf install -y --setopt=install_weak_deps=false openssh-clients \
 && dnf clean all

COPY . hello

You can use distribution package managers such as dnf, as demonstrated above, or others, such as pip for Python packages. Be aware that the software will be downloaded anew each time you execute the instruction (unless you add an HTTP cache, which is out of scope of this documentation).

Note

RPM and friends (yum, dnf, etc.) have traditionally been rather troublesome in containers, and we suspect there are bugs we haven’t ironed out yet. If you encounter problems, please do file a bug!

12.3.2. Third-party software compiled from source

Under this method, one uses RUN commands to fetch the desired software using curl or wget, compile it, and install. Our example (examples/Dockerfile.almalinux_8ch) does this with ImageMagick:

FROM almalinux:8

# This image has three purposes: (1) demonstrate we can build a AlmaLinux 8
# image, (2) provide a build environment for Charliecloud EPEL 8 RPMs, and (3)
# provide image packages necessary for Obspy and Paraview.
#
# Quirks:
#
#   1. Install the dnf ovl plugin to work around RPMDB corruption when
#      building images with Docker and the OverlayFS storage driver.
#
#   2. Enable PowerTools repo, because some packages in EPEL depend on it.
#
#   3. Install packages needed to build el8 rpms.
#
#   4. Issue #1103: Install libarchive to resolve cmake bug
#
#   5. AlmaLinux lost their GPG key, so manual intervention is required to
#      install current packages [1].
#
# [1]: https://almalinux.org/blog/2023-12-20-almalinux-8-key-update/
RUN rpm --import https://repo.almalinux.org/almalinux/RPM-GPG-KEY-AlmaLinux
RUN dnf install -y --setopt=install_weak_deps=false \
                epel-release \
                'dnf-command(config-manager)'
RUN dnf config-manager --enable powertools
RUN dnf install -y --setopt=install_weak_deps=false \
                dnf-plugin-ovl \
                autoconf \
                automake \
                gcc \
                git \
                libarchive \
                libpng-devel \
                make \
                python3 \
                python3-devel \
                python3-lark-parser \
                python3-requests \
                python3-sphinx \
                python3-sphinx_rtd_theme \
                rpm-build \
                rpmlint \
                rsync \
                squashfs-tools \
                squashfuse \
                wget \
                which \
 && dnf clean all

# Need wheel to install bundled Lark, and the RPM version doesn’t work.
RUN pip3 install wheel

# AlmaLinux's linker doesn’t search these paths by default; add them because we
# will install stuff later into /usr/local.
RUN echo "/usr/local/lib" > /etc/ld.so.conf.d/usrlocal.conf \
 && echo "/usr/local/lib64" >> /etc/ld.so.conf.d/usrlocal.conf \
 && ldconfig

# Install ImageMagick
# The latest, 7.1.0, fails to install with a cryptic libtool error. ¯\_(ツ)_/¯
ARG MAGICK_VERSION=7.0.11-14
RUN wget -nv -O ImageMagick-${MAGICK_VERSION}.tar.gz \
    "https://github.com/ImageMagick/ImageMagick/archive/refs/tags/${MAGICK_VERSION}.tar.gz" \
 && tar xf ImageMagick-${MAGICK_VERSION}.tar.gz \
 && cd ImageMagick-${MAGICK_VERSION} \
 && ./configure --prefix=/usr/local \
 && make -j $(getconf _NPROCESSORS_ONLN) install \
 && rm -Rf ../ImageMagick-${MAGICK_VERSION}

# Add mount points for files and directories for paraview and obspy comparison
# tests.
RUN mkdir /diff \
 && echo "example bind mount file" > /a.png \
 && echo "example bind mount file" > /b.png

So what is going on here?

  1. Use the latest AlmaLinux 8 as the base image.

  2. Install some packages using dnf, the OS package manager, including a basic development environment.

  3. Install wheel using pip and adjust the shared library configuration. (These are not needed for ImageMagick but rather support derived images.)

  4. For ImageMagick itself:

    1. Download and untar. Note the use of the variable MAGICK_VERSION and versions easier.

    2. Build and install. Note the getconf trick to guess at an appropriate parallel build.

    3. Clean up, in order to reduce the size of the build cache as well as the resulting Charliecloud image (rm -Rf).

Note

Because it’s a container image, you can be less tidy than you might normally be. For example, we install ImageMagick directly into /usr/local rather than using something like GNU Stow to organize this directory tree.

12.3.3. Your software stored in the image

This method covers software provided by you that is included in the image. This is recommended when your software is relatively stable or is not easily available to users of your image, for example a library rather than simulation code under active development.

The general approach is the same as installing third-party software from source, but you use the COPY instruction to transfer files from the host filesystem (rather than the network via HTTP) to the image. For example, examples/mpihello/Dockerfile.openmpi uses this approach:

# ch-test-scope: full
FROM openmpi

# This example
COPY . /hello
WORKDIR /hello
RUN make clean && make

These Dockerfile instructions:

  1. Copy the host directory examples/mpihello to the image at path /hello. The host path is relative to the context directory, which is tarred up and sent to the Docker daemon. Docker builds have no access to the host filesystem outside the context directory.

    (Unlike HPC, Docker comes from a world without network filesystems. This tar-based approach lets the Docker daemon run on a different node from the client without needing any shared filesystems.)

    The usual convention, including for Charliecloud tests and examples, is that the context is the directory containing the Dockerfile in question. A common pattern, used here, is to copy in the entire context.

  2. cd to /hello.

  3. Compile our example. We include make clean to remove any leftover build files, since they would be inappropriate inside the container.

Once the image is built, we can see the results. (Install the image into /var/tmp as outlined in the tutorial, if you haven’t already.)

$ ch-run /var/tmp/mpihello-openmpi.sqfs -- ls -lh /hello
total 32K
-rw-rw---- 1 charlie charlie  908 Oct  4 15:52 Dockerfile
-rw-rw---- 1 charlie charlie  157 Aug  5 22:37 Makefile
-rw-rw---- 1 charlie charlie 1.2K Aug  5 22:37 README
-rwxr-x--- 1 charlie charlie 9.5K Oct  4 15:58 hello
-rw-rw---- 1 charlie charlie 1.4K Aug  5 22:37 hello.c
-rwxrwx--- 1 charlie charlie  441 Aug  5 22:37 test.sh

12.3.4. Your software stored on the host

This method leaves your software on the host but compiles it in the image. This is recommended when your software is volatile or each image user needs a different version, for example a simulation code under active development.

The general approach is to bind-mount the appropriate directory and then run the build inside the container. We can re-use the mpihello image to demonstrate this.

$ cd examples/mpihello
$ ls -l
total 20
-rw-rw---- 1 charlie charlie  908 Oct  4 09:52 Dockerfile
-rw-rw---- 1 charlie charlie 1431 Aug  5 16:37 hello.c
-rw-rw---- 1 charlie charlie  157 Aug  5 16:37 Makefile
-rw-rw---- 1 charlie charlie 1172 Aug  5 16:37 README
$ ch-run -b .:/mnt/0 --cd /mnt/0 /var/tmp/mpihello.sqfs -- \
  make mpicc -std=gnu11 -Wall hello.c -o hello
$ ls -l
total 32
-rw-rw---- 1 charlie charlie  908 Oct  4 09:52 Dockerfile
-rwxrwx--- 1 charlie charlie 9632 Oct  4 10:43 hello
-rw-rw---- 1 charlie charlie 1431 Aug  5 16:37 hello.c
-rw-rw---- 1 charlie charlie  157 Aug  5 16:37 Makefile
-rw-rw---- 1 charlie charlie 1172 Aug  5 16:37 README

A common use case is to leave a container shell open in one terminal for building, and then run using a separate container invoked from a different terminal.