3. Charliecloud command reference

This section is a comprehensive description of the usage and arguments of the Charliecloud commands. Its content is identical to the commands’ man pages.

3.1. ch-build

Build an image and place it in the builder’s back-end storage.

3.1.1. Synopsis

$ ch-build [-b BUILDER] [--builder-info] -t TAG [ARGS ...] CONTEXT

3.1.2. Description

Build an image named TAG described by a Dockerfile. Place the result into the builder’s back-end storage.

Using this script is not required for a working Charliecloud image. You can also use any builder that can produce a Linux filesystem tree directly, whether or not it is in the list below. However, this script hides the vagaries of making the supported builders work smoothly with Charliecloud and adds some conveniences (e.g., pass HTTP proxy environment variables to the build environment if the builder doesn’t do this by default).

Supported builders, unprivileged:

  • ch-image: Our internal builder.

Supported builders, privileged:

  • docker: Docker.

Experimental builders (i.e., the code is there but not tested much):

  • buildah: Buildah in “rootless” mode with no setuid helpers, using ch-run (via ch-run-oci) for RUN instructions. This mode is fully unprivileged.

  • buildah-runc: Buildah in “rootless” mode with setuid helpers, using the default runc for RUN instructions.

  • buildah-setuid: Buildah in “rootless” mode with setuid helpers, using ch-run (via ch-run-oci) for RUN instructions.

Specifying the builder, in descending order of priority:

-b, --builder BUILDER

Command line option.

$CH_BUILDER

Environment variable

Default

docker if Docker is installed; otherwise, ch-image.

Other arguments:

--builder-info

Print the builder to be used and its version, then exit.

-f, --file DOCKERFILE

Dockerfile to use (default: $CONTEXT/Dockerfile)

-t TAG

Name (tag) of Docker image to build.

--help

Print help and exit.

--version

Print version and exit.

Additional arguments are accepted and passed unchanged to the underlying builder.

3.1.3. Bugs

The tag suffix :latest is somewhat misleading, as by default neither ch-build nor bare builders will notice if the base FROM image has been updated. Use --pull to make sure you have the latest base image.

3.1.4. Examples

Create an image tagged foo and specified by the file Dockerfile located in the context directory. Use /bar as the Docker context directory. Use the default builder.

$ ch-build -t foo /bar

Equivalent to above:

$ ch-build -t foo --file=/bar/Dockerfile /bar

Instead, use /bar/Dockerfile.baz:

$ ch-build -t foo --file=/bar/Dockerfile.baz /bar

Equivalent to the first example, but use ch-image even if Docker is installed:

$ ch-build -b ch-image -t foo /bar

Equivalent to above:

$ export CH_BUILDER=ch-image
$ ch-build -t foo /bar

3.2. ch-build2dir

Build a Charliecloud image from Dockerfile and unpack it into a directory.

3.2.1. Synopsis

$ ch-build2dir -t TAG [ARGS ...] CONTEXT OUTDIR

3.2.2. Description

Build a Docker image named TAG described by a Dockerfile (default $CONTEXT/Dockerfile) and unpack it into OUTDIR/TAG. This is a wrapper for ch-build, ch-builder2tar, and ch-tar2dir; see also those man pages.

Arguments:

ARGS

additional arguments passed to ch-build

CONTEXT

Docker context directory

OUTDIR

directory in which to place image directory (named TAG) and temporary tarball

-t TAG

name (tag) of Docker image to build

--help

print help and exit

--version

print version and exit

3.2.3. Examples

To build using ./Dockerfile and create image directory /var/tmp/foo:

$ ch-build2dir -t foo . /var/tmp

Same as above, but build with a different Dockerfile:

$ ch-build2dir -t foo -f ./Dockerfile.foo . /var/tmp

3.3. ch-builder2tar

Flatten a builder image into a Charliecloud image tarball.

3.3.1. Synopsis

$ ch-builder2tar [-b BUILDER] [--nocompress] IMAGE OUTDIR

3.3.2. Description

Flatten the builder image tagged IMAGE into a Charliecloud tarball in directory OUTDIR.

The builder-specified environment (e.g., ENV statements) is placed in a file in the tarball at $IMAGE/ch/environment, in a form suitable for ch-run --set-env.

See ch-build(1) for details on specifying the builder.

Additional arguments:

-b, --builder BUILDER

Use specified builder; if not given, use $CH_BUILDER or default.

--nocompress

Do not compress tarball.

--help

Print help and exit.

--version

Print version and exit.

3.3.3. Example

$ ch-builder2tar hello /var/tmp
57M /var/tmp/hello.tar.gz
$ ls -lh /var/tmp
-rw-r-----  1 reidpr reidpr  57M Feb 13 16:14 hello.tar.gz

3.4. ch-checkns

Check ch-run prerequisites, e.g., namespaces and pivot_root(2).

3.4.1. Synopsis

$ ch-checkns

3.4.2. Description

Check ch-run prerequisites, e.g., namespaces and pivot_root(2).

3.4.3. Example

$ ch-checkns
ok

3.5. ch-dir2squash

Create a SquashFS file from an image directory.

3.5.1. Synopsis

$ ch-dir2squash IMGDIR OUTDIR [ARGS ...]

3.5.2. Description

Create Charliecloud SquashFS file from image directory IMGDIR under directory OUTDIR, named as last component of IMGDIR plus suffix .sqfs.

Optional ARGS will passed to mksquashfs unchanged.

Additional arguments:

--help

print help and exit

--version

print version and exit

3.5.3. Example

$ ch-dir2squash /var/tmp/debian /var/tmp
Parallel mksquashfs: Using 6 processors
Creating 4.0 filesystem on /var/tmp/debian.sqfs, block size 131072.
[...]
-rw-r--r--  1 charlie charlie 41M Apr 23 14:41 /var/tmp/debian.sqfs

3.6. ch-builder2squash

Flatten a builder image into a Charliecloud SquashFS file.

3.6.1. Synopsis

$ ch-builder2squash [-b BUILDER] IMAGE OUTDIR [ARGS ...]

3.6.2. Description

Flattens the builder image tagged IMAGE into a SquashFS file in OUTDIR.

Wrapper for ch-builder2tar --nocompress and ch-tar2sqfs. Intermediate files and directories are removed.

Sudo privileges are required to run docker export.

Optional ARGS passed to mksquashfs unchanged.

Additional arguments:

--help

print help and exit

--version

print version and exit

3.6.3. Example

$ docker image list | fgrep debian
REPOSITORY   TAG       IMAGE ID       CREATED      SIZE
debian       stretch   2d337f242f07   3 weeks ago  101MB
$ ch-builder2squash debian /var/tmp
Parallel mksquashfs: Using 6 processors
Creating 4.0 filesystem on /var/tmp/debian.sqfs, block size 131072.
[...]
squashed /var/tmp/debian.sqfs OK
$ ls -lh /var/tmp/debian*
-rw-r--r-- 1 charlie charlie 41M Apr 23 14:37 debian.sqfs

3.7. ch-fromhost

Inject files from the host into an image directory, with various magic.

3.7.1. Synopsis

$ ch-fromhost [OPTION ...] [FILE_OPTION ...] IMGDIR

3.7.2. Description

Note

This command is experimental. Features may be incomplete and/or buggy. Please report any issues you find, so we can fix them!

Inject files from the host into the Charliecloud image directory IMGDIR.

The purpose of this command is to inject files into a container image that are necessary to run the container on a specific host; e.g., GPU libraries that are tied to a specific kernel version. It is not a general copy-to-image tool; see further discussion on use cases below. It should be run after ch-tar2dir and before ch-run. After invocation, the image is no longer portable to other hosts.

Injection is not atomic; if an error occurs partway through injection, the image is left in an undefined state. Injection is currently implemented using a simple file copy, but that may change in the future.

By default, file paths that contain the strings /bin or /sbin are assumed to be executables and placed in /usr/bin within the container. File paths that contain the strings /lib or .so are assumed to be shared libraries and are placed in the first-priority directory reported by ldconfig (see --lib-path below). Other files are placed in the directory specified by --dest.

If any shared libraries are injected, run ldconfig inside the container (using ch-run -w) after injection.

3.7.3. Options

3.7.3.1. To specify which files to inject

-c, --cmd CMD

Inject files listed in the standard output of command CMD.

-f, --file FILE

Inject files listed in the file FILE.

-p, --path PATH

Inject the file at PATH.

--cray-mpi

Cray-enable MPICH/OpenMPI installed inside the image. See important details below.

--nvidia

Use nvidia-container-cli list (from libnvidia-container) to find executables and libraries to inject.

These can be repeated, and at least one must be specified.

3.7.3.2. To specify the destination within the image

-d, --dest DST

Place files specified later in directory IMGDIR/DST, overriding the inferred destination, if any. If a file’s destination cannot be inferred and --dest has not been specified, exit with an error. This can be repeated to place files in varying destinations.

3.7.3.3. Additional arguments

--lib-path

Print the guest destination path for shared libraries inferred as described above.

--no-ldconfig

Don’t run ldconfig even if we appear to have injected shared libraries.

-h, --help

Print help and exit.

-v, --verbose

List the injected files.

--version

Print version and exit.

3.7.4. When to use ch-fromhost

This command does a lot of heuristic magic; while it can copy arbitrary files into an image, this usage is discouraged and prone to error. Here are some use cases and the recommended approach:

  1. I have some files on my build host that I want to include in the image. Use the COPY instruction within your Dockerfile. Note that it’s OK to build an image that meets your specific needs but isn’t generally portable, e.g., only runs on specific micro-architectures you’re using.

  2. I have an already built image and want to install a program I compiled separately into the image. Consider whether a building a new derived image with a Dockerfile is appropriate. Another good option is to bind-mount the directory containing your program at run time. A less good option is to cp(1) the program into your image, because this permanently alters the image in a non-reproducible way.

  3. I have some shared libraries that I need in the image for functionality or performance, and they aren’t available in a place where I can use COPY. This is the intended use case of ch-fromhost. You can use --cmd, --file, and/or --path to put together a custom solution. But, please consider filing an issue so we can package your functionality with a tidy option like --cray-mpi or --nvidia.

3.7.5. --cray-mpi dependencies and quirks

The implementation of --cray-mpi is messy, foul smelling, and brittle. It replaces or overrides the MPICH or OpenMPI libraries installed in the container. Users should be aware of the following.

  1. Containers must have the following software installed:

    1. Corresponding open source MPI implementation. (MPICH and OpenMPI.)

    2. PatchELF with our patches. Use the shrink-soname branch. (MPICH only.)

    3. libgfortran.so.3, because Cray’s libmpi.so.12 links to it. (MPICH only.)

  2. Applications must be dynamically linked to libmpi.so.12 (not e.g. libmpich.so.12).

    1. How to configure MPICH to accomplish this is not yet clear to us; test/Dockerfile.mpich does it, while the Debian packages do not. (MPICH only.)

  3. An ABI compatible module for the given MPI implementation must be loaded when ch-fromhost is invoked.

    1. Load the cray-mpich-abi module. (MPICH only.)

    2. We recommend loading the module of a version as close to what is installed in the image as possible. This OpenMPI install needs to be built such that libmpi contains all needed plugins (as opposed to them being standalone shared libraries). See OpenMPI’s documentation for how to do this. (OpenMPI only.)

  4. Tested only for C programs compiled with GCC, and it probably won’t work otherwise. If you’d like to use another compiler or another programming language, please get in touch so we can implement the necessary support.

Please file a bug if we missed anything above or if you know how to make the code better.

3.7.6. Notes

Symbolic links are dereferenced, i.e., the files pointed to are injected, not the links themselves.

As a corollary, do not include symlinks to shared libraries. These will be re-created by ldconfig.

There are two alternate approaches for nVidia GPU libraries:

  1. Link libnvidia-containers into ch-run and call the library functions directly. However, this would mean that Charliecloud would either (a) need to be compiled differently on machines with and without nVidia GPUs or (b) have libnvidia-containers available even on machines without nVidia GPUs. Neither of these is consistent with Charliecloud’s philosophies of simplicity and minimal dependencies.

  2. Use nvidia-container-cli configure to do the injecting. This would require that containers have a half-started state, where the namespaces are active and everything is mounted but pivot_root(2) has not been performed. This is not feasible because Charliecloud has no notion of a half-started container.

Further, while these alternate approaches would simplify or eliminate this script for nVidia GPUs, they would not solve the problem for other situations.

3.7.7. Bugs

File paths may not contain colons or newlines.

3.7.8. Examples

Place shared library /usr/lib64/libfoo.so at path /usr/lib/libfoo.so (assuming /usr/lib is the first directory searched by the dynamic loader in the image), within the image /var/tmp/baz and executable /bin/bar at path /usr/bin/bar. Then, create appropriate symlinks to libfoo and update the ld.so cache.

$ cat qux.txt
/bin/bar
/usr/lib64/libfoo.so
$ ch-fromhost --file qux.txt /var/tmp/baz

Same as above:

$ ch-fromhost --cmd 'cat qux.txt' /var/tmp/baz

Same as above:

$ ch-fromhost --path /bin/bar --path /usr/lib64/libfoo.so /var/tmp/baz

Same as above, but place the files into /corge instead (and the shared library will not be found by ldconfig):

$ ch-fromhost --dest /corge --file qux.txt /var/tmp/baz

Same as above, and also place file /etc/quux at /etc/quux within the container:

$ ch-fromhost --file qux.txt --dest /etc --path /etc/quux /var/tmp/baz

Inject the executables and libraries recommended by nVidia into the image, and then run ldconfig:

$ ch-fromhost --nvidia /var/tmp/baz

Inject the Cray-enabled MPI libraries into the image, and then run ldconfig:

$ ch-fromhost --cray-mpi /var/tmp/baz

3.7.9. Acknowledgements

This command was inspired by the similar Shifter feature that allows Shifter containers to use the Cray Aries network. We particularly appreciate the help provided by Shane Canon and Doug Jacobsen during our implementation of --cray-mpi.

We appreciate the advice of Ryan Olson at nVidia on implementing --nvidia.

3.8. ch-image

Build and manage images; completely unprivileged.

3.8.1. Synopsis

$ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT
$ ch-image [...] delete IMAGE_REF
$ ch-image [...] import PATH IMAGE_REF
$ ch-image [...] list [IMAGE_REF]
$ ch-image [...] pull [...] IMAGE_REF [IMAGE_DIR]
$ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF]
$ ch-image [...] reset
$ ch-image [...] storage-path
$ ch-image { --help | --version | --dependencies }

3.8.2. Description

ch-image is a tool for building and manipulating container images, but not running them (for that you want ch-run). It is completely unprivileged, with no setuid/setgid/setcap helpers. The action to take is specified by a sub-command.

Options that print brief information and then exit:

-h, --help

Print help and exit successfully. If specified before the sub-command, print general help and list of sub-commands; if after the sub-command, print help specific to that sub-command.

--dependencies

Report dependency problems on standard output, if any, and exit. If all is well, there is no output and the exit is successful; in case of problems, the exit is unsuccessful.

--version

Print version number and exit successfully.

Common options placed before the sub-command:

-a, --arch ARCH

Use ARCH for architecture-aware registry operations, currently pull and pulls done within build. ARCH can be: (1) yolo, to bypass architecture-aware code and use the registry’s default architecture; (2) host, to use the host’s architecture, obtained with the equivalent of uname -m (default if --arch not specified); or (3) an architecture name. If the specified architecture is not available, the error message will list which ones are.

Notes:

  1. ch-image is limited to one image per image reference in builder storage at a time, regardless of architecture. For example, if you say ch-image pull --arch=foo baz and then ch-image pull --arch=bar baz, builder storage will contain one image called “baz”, with architecture “bar”.

  2. Images’ default architecture is usually amd64, so this is usually what you get with --arch=yolo. Similarly, if a registry image is architecture-unaware, it will still be pulled with --arch=amd64 and --arch=host on x86-64 hosts (other host architectures must specify --arch=yolo to pull architecture-unaware images).

  3. uname -m and image registries often use different names for the same architecture. For example, what uname -m reports as “x86_64” is known to registries as “amd64”. --arch=host should translate if needed, but it’s useful to know this is happening. Directly specified architecture names are passed to the registry without translation.

  4. Registries treat architecture as a pair of items, architecture and sometimes variant (e.g., “arm” and “v7”). Charliecloud treats architecture as a simple string and converts to/from the registry view transparently.

--no-cache

Download everything needed, ignoring the cache.

--password-many

Re-prompt the user every time a registry password is needed.

-s, --storage DIR

Set the storage directory (see below for important details).

--tls-no-verify

Don’t verify TLS certificates of the repository. (Do not use this option unless you understand the risks.)

-v, --verbose

Print extra chatter; can be repeated.

3.8.3. Authentication

If the remote repository needs authentication, Charliecloud will prompt you for a username and password. Note that some repositories call the secret something other than “password”; e.g., GitLab calls it a “personal access token (PAT)”.

These values are remembered for the life of the process and silently re-offered to the registry if needed. One case when this happens is on push to a private registry: many registries will first offer a read-only token when ch-image checks if something exists, then re-authenticate when upgrading the token to read-write for upload. If your site uses one-time passwords such as provided by a security device, you can specify --password-many to provide a new secret each time.

These values are not saved persistently, e.g. in a file. Note that we do use normal Python variables for this information, without pinning them into physical RAM with mlock(2) or any other special treatment, so we cannot guarantee they will never reach non-volatile storage.

There is no separate login subcommand like Docker. For non-interactive authentication, you can use environment variables CH_IMAGE_USERNAME and CH_IMAGE_PASSWORD. Only do this if you fully understand the implications for your specific use case, because it is difficult to securely store secrets in environment variables.

3.8.4. Storage directory

ch-image maintains state using normal files and directories located in its storage directory; contents include temporary images used for building and various caches.

In descending order of priority, this directory is located at:

-s, --storage DIR

Command line option.

$CH_IMAGE_STORAGE

Environment variable.

/var/tmp/$USER/ch-image

Default.

Unlike many container implementations, there is no notion of storage drivers, graph drivers, etc., to select and/or configure.

The storage directory can reside on any filesystem. However, it contains lots of small files and metadata traffic can be intense. For example, the Charliecloud test suite uses approximately 400,000 files and directories in the storage directory as of this writing. Place it on a filesystem appropriate for this; tmpfs’es such as /var/tmp are a good choice if you have enough RAM (/tmp is not recommended because ch-run bind-mounts it into containers by default).

While you can currently poke around in the storage directory and find unpacked images runnable with ch-run, this is not a supported use case. The supported workflow uses ch-builder2tar or ch-builder2squash to obtain a packed image; see the tutorial for details.

The storage directory format changes on no particular schedule. Often ch-image is able to upgrade the directory; however, downgrading is not supported and sometimes upgrade is not possible. In these cases, ch-image will refuse to run until you delete and re-initialize the directory with ch-image reset.

Warning

Network filesystems, especially Lustre, are typically bad choices for the storage directory. This is a site-specific question and your local support will likely have strong opinions.

3.8.5. build

Build an image from a Dockerfile and put it in the storage directory.

3.8.5.1. Synopsis

$ ch-image [...] build [-t TAG] [-f DOCKERFILE] [...] CONTEXT

3.8.5.2. Description

Uses ch-run -w -u0 -g0 --no-home --no-passwd to execute RUN instructions. Note that FROM implicitly pulls the base image if needed, so you may want to read about the pull subcommand below as well.

Required argument:

CONTEXT

Path to context directory; this is the root of COPY and ADD instructions in the Dockerfile.

Options:

-b, --bind SRC[:DST]

For RUN instructions only, bind-mount SRC at guest DST. The default destination if not specified is to use the same path as the host; i.e., the default is equivalent to --bind=SRC:SRC. If DST does not exist, try to create it as an empty directory, though images do have ten directories /mnt/[0-9] already available as mount points. Can be repeated.

Note: See documentation for ch-run --bind for important caveats and gotchas.

Note: Other instructions that modify the image filesystem, e.g. COPY, can only access host files from the context directory, regardless of this option.

--build-arg KEY[=VALUE]

Set build-time variable KEY defined by ARG instruction to VALUE. If VALUE not specified, use the value of environment variable KEY.

-f, --file DOCKERFILE

Use DOCKERFILE instead of CONTEXT/Dockerfile. Specify a single hyphen (-) to use standard input; note that in this case, the context directory is still provided, which matches docker build -f - behavior.

--force

Inject the unprivileged build workarounds; see discussion later in this section for details on what this does and when you might need it. If a build fails and ch-image thinks --force would help, it will suggest it.

-n, --dry-run

Don’t actually execute any Dockerfile instructions.

--no-force-detect

Don’t try to detect if the workarounds in --force would help.

--parse-only

Stop after parsing the Dockerfile.

-t, --tag TAG

Name of image to create. If not specified, infer the name:

  1. If Dockerfile named Dockerfile with an extension: use the extension with invalid characters stripped, e.g. Dockerfile.@FOO.barfoo.bar.

  2. If Dockerfile has extension dockerfile: use the basename with the same transformation, e.g. baz.@QUX.dockerfile -> baz.qux.

  3. If context directory is not /: use its name, i.e. the last component of the absolute path to the context directory, with the same transformation,

  4. Otherwise (context directory is /): use root.

If no colon present in the name, append :latest.

3.8.5.3. Privilege model

ch-image is a fully unprivileged image builder. It does not use any setuid or setcap helper programs, and it does not use configuration files /etc/subuid or /etc/subgid. This contrasts with the “rootless” or “fakeroot” modes of some competing builders, which do require privileged supporting code or utilities.

This approach does yield some quirks. We provide built-in workarounds that should mostly work (i.e., --force), but it can be helpful to understand what is going on.

ch-image executes all instructions as the normal user who invokes it. For RUN, this is accomplished with ch-run -w --uid=0 --gid=0 (and some other arguments), i.e., your host EUID and EGID both mapped to zero inside the container, and only one UID (zero) and GID (zero) are available inside the container. Under this arrangement, processes running in the container for each RUN appear to be running as root, but many privileged system calls will fail without the workarounds described below. This affects any fully unprivileged container build, not just Charliecloud.

The most common time to see this is installing packages. For example, here is RPM failing to chown(2) a file, which makes the package update fail:

  Updating   : 1:dbus-1.10.24-13.el7_6.x86_64                            2/4
Error unpacking rpm package 1:dbus-1.10.24-13.el7_6.x86_64
error: unpacking of archive failed on file /usr/libexec/dbus-1/dbus-daemon-launch-helper;5cffd726: cpio: chown
  Cleanup    : 1:dbus-libs-1.10.24-12.el7.x86_64                         3/4
error: dbus-1:1.10.24-13.el7_6.x86_64: install failed

This one is (ironically) apt-get failing to drop privileges:

E: setgroups 65534 failed - setgroups (1: Operation not permitted)
E: setegid 65534 failed - setegid (22: Invalid argument)
E: seteuid 100 failed - seteuid (22: Invalid argument)
E: setgroups 0 failed - setgroups (1: Operation not permitted)

By default, nothing is done to avoid these problems, though ch-image does try to detect if the workarounds could help. --force activates the workarounds: ch-image injects extra commands to intercept these system calls and fake a successful result, using fakeroot(1). There are three basic steps:

  1. After FROM, analyze the image to see what distribution it contains, which determines the specific workarounds.

  2. Before the user command in the first RUN instruction where the injection seems needed, install fakeroot(1) in the image, if one is not already installed, as well as any other necessary initialization commands. For example, we turn off the apt sandbox (for Debian Buster) and configure EPEL but leave it disabled (for CentOS/RHEL).

  3. Prepend fakeroot to RUN instructions that seem to need it, e.g. ones that contain apt, apt-get, dpkg for Debian derivatives and dnf, rpm, or yum for RPM-based distributions.

The details are specific to each distribution. ch-image analyzes image content (e.g., grepping /etc/debian_version) to select a configuration; see lib/fakeroot.py for details. ch-image prints exactly what it is doing.

3.8.5.4. Compatibility with other Dockerfile interpreters

ch-image is an independent implementation and shares no code with other Dockerfile interpreters. It uses a formal Dockerfile parsing grammar developed from the Dockerfile reference documentation and miscellaneous other sources, which you can examine in the source code.

We believe this independence is valuable for several reasons. First, it helps the community examine Dockerfile syntax and semantics critically, think rigorously about what is really needed, and build a more robust standard. Second, it yields disjoint sets of bugs (note that Podman, Buildah, and Docker all share the same Dockerfile parser). Third, because it is a much smaller code base, it illustrates how Dockerfiles work more clearly. Finally, it allows straightforward extensions if needed to support scientific computing.

ch-image tries hard to be compatible with Docker and other interpreters, though as an independent implementation, it is not bug-compatible.

The following subsections describe differences from the Dockerfile reference that we expect to be approximately permanent. For not-yet-implemented features and bugs in this area, see related issues on GitHub.

None of these are set in stone. We are very interested in feedback on our assessments and open questions. This helps us prioritize new features and revise our thinking about what is needed for HPC containers.

3.8.5.4.1. Context directory

The context directory is bind-mounted into the build, rather than copied like Docker. Thus, the size of the context is immaterial, and the build reads directly from storage like any other local process would. However, you still can’t access anything outside the context directory.

3.8.5.4.2. Variable substitution

Variable substitution happens for all instructions, not just the ones listed in the Dockerfile reference.

ARG and ENV cause cache misses upon definition, in contrast with Docker where these variables miss upon use, except for certain cache-excluded variables that never cause misses, listed below.

ch-image passes the following proxy environment variables in to the build. Changes to these variables do not cause a cache miss. They do not require an ARG instruction, as documented in the Dockerfile reference. Unlike Docker, they are available if the same-named environment variable is defined; --build-arg is not required.

HTTP_PROXY
http_proxy
HTTPS_PROXY
https_proxy
FTP_PROXY
ftp_proxy
NO_PROXY
no_proxy

In addition to those listed in the Dockerfile reference, these environment variables are passed through in the same way:

SSH_AUTH_SOCK

Finally, these variables are also pre-defined but are unrelated to the host environment:

PATH=/ch/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
TAR_OPTIONS=--no-same-owner

Note that ARG and ENV have different syntax despite very similar semantics.

3.8.5.4.3. COPY

Especially for people used to UNIX cp(1), the semantics of the Dockerfile COPY instruction can be confusing.

Most notably, when a source of the copy is a directory, the contents of that directory, not the directory itself, are copied. This is documented, but it’s a real gotcha because that’s not what cp(1) does, and it means that many things you can do in one cp(1) command require multiple COPY instructions.

Also, the reference documentation is incomplete. In our experience, Docker also behaves as follows; ch-image does the same in an attempt to be bug-compatible.

  1. You can use absolute paths in the source; the root is the context directory.

  2. Destination directories are created if they don’t exist in the following situations:

    1. If the destination path ends in slash. (Documented.)

    2. If the number of sources is greater than 1, either by wildcard or explicitly, regardless of whether the destination ends in slash. (Not documented.)

    3. If there is a single source and it is a directory. (Not documented.)

  3. Symbolic links behave differently depending on how deep in the copied tree they are. (Not documented.)

    1. Symlinks at the top level — i.e., named as the destination or the source, either explicitly or by wildcards — are dereferenced. They are followed, and whatever they point to is used as the destination or source, respectively.

    2. Symlinks at deeper levels are not dereferenced, i.e., the symlink itself is copied.

  4. If a directory appears at the same path in source and destination, and is at the 2nd level or deeper, the source directory’s metadata (e.g., permissions) are copied to the destination directory. (Not documented.)

  5. If an object appears in both the source and destination, and is at the 2nd level or deeper, and is of different types in the source and destination, then the source object will overwrite the destination object. (Not documented.) For example, if /tmp/foo/bar is a regular file, and /tmp is the context directory, then the following Dockerfile snippet will result in a file in the container at /foo/bar (copied from /tmp/foo/bar); the directory and all its contents will be lost.

    RUN mkdir -p /foo/bar && touch /foo/bar/baz
    COPY foo /foo
    

We expect the following differences to be permanent:

  • Wildcards use Python glob semantics, not the Go semantics.

  • COPY --chown is ignored, because it doesn’t make sense in an unprivileged build.

3.8.5.4.4. Features we do not plan to support
  • Parser directives are not supported. We have not identified a need for any of them.

  • EXPOSE: Charliecloud does not use the network namespace, so containerized processes can simply listen on a host port like other unprivileged processes.

  • HEALTHCHECK: This instruction’s main use case is monitoring server processes rather than applications. Also, implementing it requires a container supervisor daemon, which we have no plans to add.

  • MAINTAINER is deprecated.

  • STOPSIGNAL requires a container supervisor daemon process, which we have no plans to add.

  • USER does not make sense for unprivileged builds.

  • VOLUME: This instruction is not currently supported. Charliecloud has good support for bind mounts; we anticipate that it will continue to focus on that and will not introduce the volume management features that Docker has.

3.8.5.5. Examples

Build image bar using ./foo/bar/Dockerfile and context directory ./foo/bar:

$ ch-image build -t bar -f ./foo/bar/Dockerfile ./foo/bar
[...]
grown in 4 instructions: bar

Same, but infer the image name and Dockerfile from the context directory path:

$ ch-image build ./foo/bar
[...]
grown in 4 instructions: bar

Build using humongous vendor compilers you want to bind-mount instead of installing into the image:

$ ch-image build --bind /opt/bigvendor:/opt .
$ cat Dockerfile
FROM centos:7

RUN /opt/bin/cc hello.c
#COPY /opt/lib/*.so /usr/local/lib   # fail: COPY doesn't bind mount
RUN cp /opt/lib/*.so /usr/local/lib  # possible workaround
RUN ldconfig

3.8.6. delete

$ ch-image [...] delete IMAGE_REF

Delete the image described by the image reference IMAGE_REF from the storage directory.

3.8.7. list

Print information about images. If no argument given, list the images in builder storage.

3.8.7.1. Synopsis

$ ch-image [...] list [IMAGE_REF]

3.8.7.2. Description

Optional argument:

IMAGE_REF

Print details of what’s known about IMAGE_REF, both locally and in the remote registry, if any.

3.8.7.3. Examples

List images in builder storage:

$ ch-image list
alpine:3.9 (amd64)
alpine:latest (amd64)
debian:buster (amd64)

Print details about Debian Buster image:

$ ch-image list debian:buster
details of image:    debian:buster
in local storage:    no
full remote ref:     registry-1.docker.io:443/library/debian:buster
available remotely:  yes
remote arch-aware:   yes
host architecture:   amd64
archs available:     386 amd64 arm/v5 arm/v7 arm64/v8 mips64le ppc64le s390x

3.8.8. import

$ ch-image [...] import PATH IMAGE_REF

Copy the image at PATH into builder storage with name IMAGE_REF. PATH can be:

  • an image directory

  • a tarball with no top-level directory (a.k.a. a “tarbomb”)

  • a standard tarball with one top-level directory

If the imported image contains Charliecloud metadata, that will be imported unchanged, i.e., images exported from ch-image builder storage will be functionally identical when re-imported.

3.8.9. pull

Pull the image described by the image reference IMAGE_REF from a repository to the local filesystem.

3.8.9.1. Synopsis

$ ch-image [...] pull [...] IMAGE_REF [IMAGE_DIR]

See the FAQ for the gory details on specifying image references.

3.8.9.2. Description

Destination:

IMAGE_DIR

If specified, place the unpacked image at this path; it is then ready for use by ch-run or other tools. The storage directory will not contain a copy of the image, i.e., it is only unpacked once.

Options:

--last-layer N

Unpack only N layers, leaving an incomplete image. This option is intended for debugging.

--parse-only

Parse IMAGE_REF, print a parse report, and exit successfully without talking to the internet or touching the storage directory.

This script does a fair amount of validation and fixing of the layer tarballs before flattening in order to support unprivileged use despite image problems we frequently see in the wild. For example, device files are ignored, and file and directory permissions are increased to a minimum of rwx------ and rw------- respectively. Note, however, that symlinks pointing outside the image are permitted, because they are not resolved until runtime within a container.

The following metadata in the pulled image is retained; all other metadata is currently ignored. (If you have a need for additional metadata, please let us know!)

  • Current working directory set with WORKDIR is effective in downstream Dockerfiles.

  • Environment variables set with ENV are effective in downstream Dockerfiles and also written to /ch/environment for use in ch-run --set-env.

  • Mount point directories specified with VOLUME are created in the image if they don’t exist, but no other action is taken.

Note that some images (e.g., those with a “version 1 manifest”) do not contain metadata. A warning is printed in this case.

3.8.9.3. Examples

Download the Debian Buster image matching the host’s architecture and place it in the storage directory:

$ uname -m
aarch32
pulling image:    debian:buster
requesting arch:  arm64/v8
manifest list: downloading
manifest: downloading
config: downloading
layer 1/1: c54d940: downloading
flattening image
layer 1/1: c54d940: listing
validating tarball members
resolving whiteouts
layer 1/1: c54d940: extracting
image arch: arm64
done

Same, specifying the architecture explicitly:

$ ch-image --arch=arm/v7 pull debian:buster
pulling image:    debian:buster
requesting arch:  arm/v7
manifest list: downloading
manifest: downloading
config: downloading
layer 1/1: 8947560: downloading
flattening image
layer 1/1: 8947560: listing
validating tarball members
resolving whiteouts
layer 1/1: 8947560: extracting
image arch: arm (may not match host arm64/v8)

Download the same image and place it in /tmp/buster:

$ ch-image pull debian:buster /tmp/buster
[...]
$ ls /tmp/buster
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

3.8.10. push

Push the image described by the image reference IMAGE_REF from the local filesystem to a repository.

3.8.10.1. Synopsis

$ ch-image [...] push [--image DIR] IMAGE_REF [DEST_REF]

See the FAQ for the gory details on specifying image references.

3.8.10.2. Description

Destination:

DEST_REF

If specified, use this as the destination image reference, rather than IMAGE_REF. This lets you push to a repository without permanently adding a tag to the image.

Options:

--image DIR

Use the unpacked image located at DIR rather than an image in the storage directory named IMAGE_REF.

Because Charliecloud is fully unprivileged, the owner and group of files in its images are not meaningful in the broader ecosystem. Thus, when pushed, everything in the image is flattened to user:group root:root. Also, setuid/setgid bits are removed, to avoid surprises if the image is pulled by a privileged container implementation.

3.8.10.3. Examples

Push a local image to the registry example.com:5000 at path /foo/bar with tag latest. Note that in this form, the local image must be named to match that remote reference.

$ ch-image push example.com:5000/foo/bar:latest
pushing image:   example.com:5000/foo/bar:latest
layer 1/1: gathering
layer 1/1: preparing
preparing metadata
starting upload
layer 1/1: a1664c4: checking if already in repository
layer 1/1: a1664c4: not present, uploading
config: 89315a2: checking if already in repository
config: 89315a2: not present, uploading
manifest: uploading
cleaning up
done

Same, except use local image alpine:3.9. In this form, the local image name does not have to match the destination reference.

$ ch-image push alpine:3.9 example.com:5000/foo/bar:latest
pushing image:   alpine:3.9
destination:     example.com:5000/foo/bar:latest
layer 1/1: gathering
layer 1/1: preparing
preparing metadata
starting upload
layer 1/1: a1664c4: checking if already in repository
layer 1/1: a1664c4: not present, uploading
config: 89315a2: checking if already in repository
config: 89315a2: not present, uploading
manifest: uploading
cleaning up
done

Same, except use unpacked image located at /var/tmp/image rather than an image in ch-image storage. (Also, the sole layer is already present in the remote registry, so we don’t upload it again.)

$ ch-image push --image /var/tmp/image example.com:5000/foo/bar:latest
pushing image:   example.com:5000/foo/bar:latest
image path:      /var/tmp/image
layer 1/1: gathering
layer 1/1: preparing
preparing metadata
starting upload
layer 1/1: 892e38d: checking if already in repository
layer 1/1: 892e38d: already present
config: 546f447: checking if already in repository
config: 546f447: not present, uploading
manifest: uploading
cleaning up
done

3.8.11. reset

$ ch-image [...] reset

Delete all images and cache from ch-image builder storage.

3.8.12. storage-path

$ ch-image [...] storage-path

Print the storage directory path and exit.

3.8.13. Environment variables

CH_IMAGE_USERNAME, CH_IMAGE_PASSWORD

Username and password for registry authentication. See important caveats in section “Authentication” above.

CH_LOG_FILE

If set, append log chatter to this file, rather than standard error. This is useful for debugging situations where standard error is consumed or lost.

Also sets verbose mode if not already set (equivalent to --verbose).

CH_LOG_FESTOON

If set, prepend PID and timestamp to logged chatter.

3.9. ch-mount

Mount a SquashFS image file using FUSE.

3.9.1. Synopsis

$ ch-mount SQFS PARENTDIR

3.9.2. Description

Create new empty directory named SQFS with suffix (e.g., .sqfs) removed, then mount SQFS on this new directory. This new directory must not already exist.

Additional arguments:

--help

print help and exit

--version

print version and exit

3.9.3. Example

$ ch-mount /var/tmp/debian.sqfs /var/tmp
$ ls /var/tmp/debian
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr  WEIRD_AL_YANKOVIC

3.10. ch-pull2dir

Pull image from a Docker Hub and unpack into directory.

3.10.1. Synopsis

$ ch-pull2dir IMAGE[:TAG] DIR

3.10.2. Description

Pull Docker image named IMAGE[:TAG] from Docker Hub and extract it into a subdirectory of DIR. A temporary tarball is stored in DIR.

Sudo privileges are required to run the docker pull command.

This runs the following command sequence: ch-pull2tar, ch-tar2dir. See warning in the documentation for ch-tar2dir.

Additional arguments:

--help

print help and exit

--version

print version and exit

3.10.3. Examples

$ ch-pull2dir alpine /var/tmp
Using default tag: latest
latest: Pulling from library/alpine
Digest: sha256:621c2f39f8133acb8e64023a94dbdf0d5ca81896102b9e57c0dc184cadaf5528
Status: Image is up to date for alpine:latest
-rw-r--r--. 1 charlie charlie 2.1M Oct  5 19:52 /var/tmp/alpine.tar.gz
creating new image /var/tmp/alpine
/var/tmp/alpine unpacked ok
removed '/var/tmp/alpine.tar.gz'

Same as above, except optional TAG is specified:

$ ch-pull2dir alpine:3.6 /var/tmp
3.6: Pulling from library/alpine
Digest: sha256:cc24af836d1377e092ecb4e8f0a4324c3b1aa2b5295c2239edcc7bbc86a9cbc6
Status: Image is up to date for alpine:3.6
-rw-r--r--. 1 charlie charlie 2.1M Oct  5 19:54 /var/tmp/alpine:3.6.tar.gz
creating new image /var/tmp/alpine:3.6
/var/tmp/alpine:3.6 unpacked ok
removed '/var/tmp/alpine:3.6.tar.gz'

3.11. ch-pull2tar

Pull image from a Docker Hub and flatten into tarball.

3.11.1. Synopsis

$ ch-pull2tar IMAGE[:TAG] OUTDIR

3.11.2. Description

Pull a Docker image named IMAGE[:TAG] from Docker Hub and flatten it into a Charliecloud tarball in directory OUTDIR.

This runs the following command sequence: docker pull, ch-builder2tar but provides less flexibility than the individual commands.

Sudo privileges are required for docker pull.

Additional arguments:

--help

print help and exit

--version

print version and exit

3.11.3. Examples

$ ch-pull2tar alpine /var/tmp
Using default tag: latest
latest: Pulling from library/alpine
Digest: sha256:621c2f39f8133acb8e64023a94dbdf0d5ca81896102b9e57c0dc184cadaf5528
Status: Image is up to date for alpine:latest
-rw-r--r--. 1 charlie charlie 2.1M Oct  5 19:52 /var/tmp/alpine.tar.gz

Same as above, except optional TAG is specified:

$ ch-pull2tar alpine:3.6
3.6: Pulling from library/alpine
Digest: sha256:cc24af836d1377e092ecb4e8f0a4324c3b1aa2b5295c2239edcc7bbc86a9cbc6
Status: Image is up to date for alpine:3.6
-rw-r--r--. 1 charlie charlie 2.1M Oct  5 19:54 /var/tmp/alpine:3.6.tar.gz

3.12. ch-run

Run a command in a Charliecloud container.

3.12.1. Synopsis

$ ch-run [OPTION...] NEWROOT CMD [ARG...]

3.12.2. Description

Run command CMD in a fully unprivileged Charliecloud container using the flattened and unpacked image directory located at NEWROOT.

-b, --bind=SRC[:DST]

Bind-mount SRC at guest DST. The default destination if not specified is to use the same path as the host; i.e., the default is --bind=SRC:SRC. Can be repeated.

If --write is given and DST does not exist, it will be created as an empty directory. However, DST must be entirely within the image itself; DST cannot enter a previous bind mount. For example, --bind /foo:/tmp/foo will fail because /tmp is shared with the host via bind-mount (unless --private-tmp is given).

Most images do have ten directories /mnt/[0-9] already available as mount points.

Symlinks in DST are followed, and absolute links can have surprising behavior. Bind-mounting happens after namespace setup but before pivoting into the container image, so absolute links use the host root. For example, suppose the image has a symlink /foo -> /mnt. Then, --bind=/bar:/foo will bind-mount on the host’s /mnt, which is inaccessible on the host because namespaces are already set up and also inaccessible in the container because of the subsequent pivot into the image. Currently, this problem is only detected when DST needs to be created: ch-run will refuse to follow absolute symlinks in this case, to avoid directory creation surprises.

-c, --cd=DIR

Initial working directory in container.

--ch-ssh

Bind ch-ssh(1) into container at /usr/bin/ch-ssh.

--env-no-expand

don’t expand variables when using --set-env

-g, --gid=GID

Run as group GID within container.

-j, --join

Use the same container (namespaces) as peer ch-run invocations.

--join-pid=PID

Join the namespaces of an existing process.

--join-ct=N

Number of ch-run peers (implies --join; default: see below).

--join-tag=TAG

Label for ch-run peer group (implies --join; default: see below).

--no-home

By default, your host home directory (i.e., $HOME) is bind-mounted at guest /home/$USER. This is accomplished by mounting a new tmpfs at /home, which hides any image content under that path. If this is specified, neither of these things happens and the image’s /home is exposed unaltered.

--no-passwd

By default, temporary /etc/passwd and /etc/group files are created according to the UID and GID maps for the container and bind-mounted into it. If this is specified, no such temporary files are created and the image’s files are exposed.

-t, --private-tmp

By default, /tmp is shared with the host. If this is specified, a new tmpfs is mounted on the container’s /tmp instead.

--set-env=FILE, --set-env=VAR=VALUE

set environment variable(s), either as specified in host path FILE, or set variable VAR to VALUE

-u, --uid=UID

Run as user UID within container.

--unset-env=GLOB

Unset environment variables whose names match GLOB.

-v, --verbose

Be more verbose (can be repeated).

-w, --write

Mount image read-write (by default, the image is mounted read-only).

-?, --help

Print help and exit.

--usage

Print a short usage message and exit.

-V, --version

Print version and exit.

Note: Because ch-run is fully unprivileged, it is not possible to change UIDs and GIDs within the container (the relevant system calls fail). In particular, setuid, setgid, and setcap executables do not work. As a precaution, ch-run calls prctl(PR_SET_NO_NEW_PRIVS, 1) to disable these executables within the container. This does not reduce functionality but is a “belt and suspenders” precaution to reduce the attack surface should bugs in these system calls or elsewhere arise.

3.12.3. Host files and directories available in container via bind mounts

In addition to any directories specified by the user with --bind, ch-run has standard host files and directories that are bind-mounted in as well.

The following host files and directories are bind-mounted at the same location in the container. These give access to the host’s devices and various kernel facilities. (Recall that Charliecloud provides minimal isolation and containerized processes are mostly normal unprivileged processes.) They cannot be disabled and are required; i.e., they must exist both on host and within the image.

  • /dev

  • /proc

  • /sys

Optional; bind-mounted only if path exists on both host and within the image, without error or warning if not.

  • /etc/hosts and /etc/resolv.conf. Because Charliecloud containers share the host network namespace, they need the same hostname resolution configuration.

  • /etc/machine-id. Provides a unique ID for the OS installation; matching the host works for most situations. Needed to support D-Bus, some software licensing situations, and likely other use cases. See also issue #1050.

  • /var/lib/hugetlbfs at guest /var/opt/cray/hugetlbfs, and /var/opt/cray/alps/spool. These support Cray MPI.

  • $PREFIX/bin/ch-ssh at guest /usr/bin/ch-ssh. SSH wrapper that automatically containerizes after connecting.

Additional bind mounts done by default but can be disabled; see the options above.

  • $HOME at /home/$USER (and image /home is hidden). Makes user data and init files available.

  • /tmp. Provides a temporary directory that persists between container runs and is shared with non-containerized application components.

  • temporary files at /etc/passwd and /etc/group. Usernames and group names need to be customized for each container run.

3.12.4. Multiple processes in the same container with --join

By default, different ch-run invocations use different user and mount namespaces (i.e., different containers). While this has no impact on sharing most resources between invocations, there are a few important exceptions. These include:

  1. ptrace(2), used by debuggers and related tools. One can attach a debugger to processes in descendant namespaces, but not sibling namespaces. The practical effect of this is that (without --join), you can’t run a command with ch-run and then attach to it with a debugger also run with ch-run.

  2. Cross-memory attach (CMA) is used by cooperating processes to communicate by simply reading and writing one another’s memory. This is also not permitted between sibling namespaces. This affects various MPI implementations that use CMA to pass messages between ranks on the same node, because it’s faster than traditional shared memory.

--join is designed to address this by placing related ch-run commands (the “peer group”) in the same container. This is done by one of the peers creating the namespaces with unshare(2) and the others joining with setns(2).

To do so, we need to know the number of peers and a name for the group. These are specified by additional arguments that can (hopefully) be left at default values in most cases:

  • --join-ct sets the number of peers. The default is the value of the first of the following environment variables that is defined: OMPI_COMM_WORLD_LOCAL_SIZE, SLURM_STEP_TASKS_PER_NODE, SLURM_CPUS_ON_NODE.

  • --join-tag sets the tag that names the peer group. The default is environment variable SLURM_STEP_ID, if defined; otherwise, the PID of ch-run’s parent. Tags can be re-used for peer groups that start at different times, i.e., once all peer ch-run have replaced themselves with the user command, the tag can be re-used.

Caveats:

  • One cannot currently add peers after the fact, for example, if one decides to start a debugger after the fact. (This is only required for code with bugs and is thus an unusual use case.)

  • ch-run instances race. The winner of this race sets up the namespaces, and the other peers use the winner to find the namespaces to join. Therefore, if the user command of the winner exits, any remaining peers will not be able to join the namespaces, even if they are still active. There is currently no general way to specify which ch-run should be the winner.

  • If --join-ct is too high, the winning ch-run’s user command exits before all peers join, or ch-run itself crashes, IPC resources such as semaphores and shared memory segments will be leaked. These appear as files in /dev/shm/ and can be removed with rm(1).

  • Many of the arguments given to the race losers, such as the image path and --bind, will be ignored in favor of what was given to the winner.

3.12.5. Environment variables

ch-run leaves environment variables unchanged, i.e. the host environment is passed through unaltered, except:

  • limited tweaks to avoid significant guest breakage;

  • user-set variables via --set-env;

  • user-unset variables via --unset-env; and

  • set CH_RUNNING.

This section describes these features.

The default tweaks happen first, and then --set-env and --unset-env in the order specified on the command line. The latter two can be repeated arbitrarily many times, e.g. to add/remove multiple variable sets or add only some variables in a file.

3.12.5.1. Default behavior

By default, ch-run makes the following environment variable changes:

  • $CH_RUNNING: Set to Weird Al Yankovic. While a process can figure out that it’s in an unprivileged container and what namespaces are active without this hint, the checks can be messy, and there is no way to tell that it’s a Charliecloud container specifically. This variable makes such a test simple and well-defined. (Note: This variable is unaffected by --unset-env.)

  • $HOME: If the path to your home directory is not /home/$USER on the host, then an inherited $HOME will be incorrect inside the guest. This confuses some software, such as Spack.

    Thus, we change $HOME to /home/$USER, unless --no-home is specified, in which case it is left unchanged.

  • $PATH: Newer Linux distributions replace some root-level directories, such as /bin, with symlinks to their counterparts in /usr.

    Some of these distributions (e.g., Fedora 24) have also dropped /bin from the default $PATH. This is a problem when the guest OS does not have a merged /usr (e.g., Debian 8 “Jessie”). Thus, we add /bin to $PATH if it’s not already present.

    Further reading:

3.12.5.2. Setting variables with --set-env

The purpose of --set-env is to set environment variables in addition to (or instead of) those inherited from the host shell.

If the argument contains an equals character, then it is interpreted as a variable name and value; otherwise, it is a host path to a file with one variable name/value per line (guest paths can be specified by prepending the image path). Values given replace any already set (i.e., if a variable is repeated, the last value wins). Environment variables in the value are expanded unless --env-no-expand is given, though see below for syntax differences from the shell.

For example, to prepend /opt/bin to the current shell’s path (note protecting expansion of $PATH by the shell, though here the results would be equivalent if we let the shell do it):

$ ch-run --set-env='PATH=/opt/bin:$PATH' ...

To add variables set by Dockerfile ENV instructions to the current environment:

$ ch-run --set-env=$IMG/ch/environment ...

To prepend /opt/bin to the path set by the Dockerfile (here we really can’t let the shell expand $PATH):

$ ch-run --set-env=$IMG/ch/environment --set-env='PATH=/opt/bin:$PATH' ...

The syntax of the argument is a key-value pair separated by the first equals character (=, ASCII 61), with optional single straight quotes (', ASCII 39) around the value, though be aware that quotes are also interpreted by the shell. Newlines (ASCII 10) are not permitted in either key or value. The value may be empty, but not the key.

Environment variables in the value are expanded unless --env-no-expand is given. In this case, the value is a sequence of possibly-empty items separated by colon (:, ASCII 58). If an item begins with dollar sign ($, ASCII 36), then the rest of the item the name of an environment variable. If this variable is set to a non-empty value, that value is substituted for the item; otherwise (i.e., the variable is unset or the empty string), the item is deleted, including a delimiter colon. The purpose of omitting empty expansions is to avoid surprising behavior such as an empty element in $PATH meaning the current directory. If no expansions happen, this paragraph is a no-op.

If a file is given instead, it is a sequence of such arguments, one per line. Empty lines are ignored. No comments are interpreted. (This syntax is designed to accept the output of printenv and be easily produced by other simple mechanisms.)

Examples of valid arguments, assuming that environment variable $BAR is set to bar and $UNSET is unset (or set to the empty string):

Line

Key

Value

FOO=bar

FOO

bar

FOO=bar=baz

FOO

bar=baz

FLAGS=-march=foo -mtune=bar

FLAGS

-march=foo -mtune=bar

FLAGS='-march=foo -mtune=bar'

FLAGS

-march=foo -mtune=bar

FOO=$BAR

FOO

bar

FOO=$BAR:baz

FOO

bar:baz

FOO=

FOO

empty string (not unset)

FOO=$UNSET

FOO

empty string (not unset or $UNSET)

FOO=baz:$UNSET:qux

FOO

baz:qux (not baz::qux)

FOO=:bar:baz::

FOO

:bar:baz::

FOO=''

FOO

empty string (not unset)

FOO=''''

FOO

'' (two single quotes)

Example invalid lines:

Line

Problem

FOO bar

no separator

=bar

key cannot be empty

Example valid lines that are probably not what you want:

Line

Key

Value

Problem

FOO="bar"

FOO

"bar"

double quotes aren’t stripped

FOO=bar # baz

FOO

bar # baz

comments not supported

FOO=bar\tbaz

FOO

bar\tbaz

backslashes are not special

FOO=bar

FOO

bar

leading space in key

FOO= bar

FOO

bar

leading space in value

$FOO=bar

$FOO

bar

variables not expanded in key

FOO=$BAR baz:qux

FOO

qux

variable BAR baz not set

3.12.5.3. Removing variables with --unset-env

The purpose of --unset-env=GLOB is to remove unwanted environment variables. The argument GLOB is a glob pattern (dialect fnmatch(3) with no flags); all variables with matching names are removed from the environment.

Warning

Because the shell also interprets glob patterns, if any wildcard characters are in GLOB, it is important to put it in single quotes to avoid surprises.

GLOB must be a non-empty string.

Example 1: Remove the single environment variable FOO:

$ export FOO=bar
$ env | fgrep FOO
FOO=bar
$ ch-run --unset-env=FOO $CH_TEST_IMGDIR/chtest -- env | fgrep FOO
$

Example 2: Hide from a container the fact that it’s running in a Slurm allocation, by removing all variables beginning with SLURM. You might want to do this to test an MPI program with one rank and no launcher:

$ salloc -N1
$ env | egrep '^SLURM' | wc
   44      44    1092
$ ch-run $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
[... long error message ...]
$ ch-run --unset-env='SLURM*' $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello
0: MPI version:
Open MPI v3.1.3, package: Open MPI root@c897a83f6f92 Distribution, ident: 3.1.3, repo rev: v3.1.3, Oct 29, 2018
0: init ok cn001.localdomain, 1 ranks, userns 4026532530
0: send/receive ok
0: finalize ok

Example 3: Clear the environment completely (remove all variables):

$ ch-run --unset-env='*' $CH_TEST_IMGDIR/chtest -- env
$

Note that some programs, such as shells, set some environment variables even if started with no init files:

$ ch-run --unset-env='*' $CH_TEST_IMGDIR/debian9 -- bash --noprofile --norc -c env
SHLVL=1
PWD=/
_=/usr/bin/env
$

3.12.6. Examples

Run the command echo hello inside a Charliecloud container using the unpacked image at /data/foo:

$ ch-run /data/foo -- echo hello
hello

Run an MPI job that can use CMA to communicate:

$ srun ch-run --join /data/foo -- bar

3.13. ch-run-oci

OCI wrapper for ch-run.

3.13.1. Synopsis

$ ch-run-oci OPERATION [ARG ...]

3.13.2. Description

Note

This command is experimental. Features may be incomplete and/or buggy. The quality of code is not yet up to the usual Charliecloud standards, and error handling is poor. Please report any issues you find, so we can fix them!

Open Containers Initiative (OCI) wrapper for ch-run(1). You probably don’t want to run this command directly; it is intended to interface with other software that expects an OCI runtime. The current goal is to support completely unprivileged image building (e.g. buildah --runtime=ch-run-oci) rather than general OCI container running.

Support of the OCI runtime specification is only partial. This is for two reasons. First, it’s an experimental and incomplete feature. More importantly, the philosophy and goals of OCI differ significantly from those of Charliecloud. Key differences include:

  • OCI is designed to run services, while Charliecloud is designed to run scientific applications.

  • OCI containers are persistent things with a complex lifecycle, while Charliecloud containers are simply UNIX processes.

  • OCI expects support for a variety of namespaces, while Charliecloud supports user and mount, no more and no less.

  • OCI expects runtimes to maintain a supervisor process in addition to user processes; Charliecloud has no need for this.

  • OCI expects runtimes to maintain state throughout the container lifecycle in a location independent from the caller.

For these reasons, ch-run-oci is a bit of a kludge, and much of what it does is provide scaffolding to satisfy OCI requirements.

Which OCI features are and are not supported is provided in the rest of this man page, and technical analysis and discussion are in the Contributor’s Guide.

This command supports OCI version 1.0.0 only and fails with an error if other versions are offered.

3.13.3. Operations

All OCI operations are accepted, but some are no-ops or merely scaffolding to satisfy the caller. For comparison, see also:

3.13.3.1. create

$ ch-run-oci create --bundle DIR --pid-file FILE [--no-new-keyring] CONTAINER_ID

Create a container. Charliecloud does not have separate create and start phases, so this operation only sets up OCI-related scaffolding.

Arguments:

--bundle DIR

Directory containing the OCI bundle. This must be /tmp/buildahYYY, where YYY matches CONTAINER_ID below.

--pid-file FILE

Filename to write the “container” process PID to. Note that for Charliecloud, the process given is fake; see above. This must be DIR/pid, where DIR is given by --bundle.

--no-new-keyring

Ignored. (Charliecloud does not implement session keyrings.)

CONTAINER_ID

String to use as the container ID. This must be buildah-buildahYYY, where YYY matches DIR above.

Unsupported arguments:

--console-socket PATH

UNIX socket to pass pseudoterminal file descriptor. Charliecloud does not support pseudoterminals; fail with an error if this argument is given. For Buildah, redirect its input from /dev/null to prevent it from requesting a pseudoterminal.

3.13.3.2. delete

$ ch-run-oci delete CONTAINER_ID

Clean up the OCI-related scaffolding for specified container.

3.13.3.3. kill

$ ch-run-oci kill CONTAINER_ID

No-op.

3.13.3.4. start

$ ch-run-oci start CONTAINER_ID

Eexecute the user command specified at create time in a Charliecloud container.

3.13.3.5. state

$ ch-run-oci state CONTAINER_ID

Print the state of the given container on standard output as an OCI compliant JSON document.

3.13.4. Unsupported OCI features

As noted above, various OCI features are not supported by Charliecloud. We have tried to guess which features would be essential to callers; ch-run-oci fails with an error if these are requested. Otherwise, the request is simply ignored.

We are interested in hearing about scientific-computing use cases for unsupported features, so we can add support for things that are needed.

Our goal is for this man page to be comprehensive: every OCI runtime feature should either work or be listed as unsupported.

Unsupported features that are an error:

  • Pseudoterminals

  • Hooks (prestart, poststart, and prestop)

  • Annotations

  • Joining existing namespaces

  • Intel Resource Director Technology (RDT)

Unsupported features that are ignored:

  • Mounts other than the root filesystem (we do use --no-home)

  • User/group mappings beyond one user mapped to EUID and one group mapped to EGID

  • Disabling prctl(PR_SET_NO_NEW_PRIVS)

  • Root filesystem propagation mode

  • sysctl directives

  • masked and read-only paths (remaining unprivileged protects you)

  • Capabilities

  • rlimits

  • Devices (all devices are inherited from the host)

  • cgroups

  • seccomp

  • SELinux

  • AppArmor

  • Container hostname setting

3.13.5. Environment variables

CH_LOG_FILE

If set, append log chatter to this file, rather than standard error. This is useful for debugging situations where standard error is consumed or lost.

Also sets verbose mode if not already set (equivalent to --verbose).

CH_LOG_FESTOON

If set, prepend PID and timestamp to logged chatter.

CH_RUN_OCI_HANG

If set to the name of a command (e.g., create), sleep indefinitely when that command is invoked. The purpose here is to halt a build so it can be examined and debugged.

3.14. ch-ssh

Run a remote command in a Charliecloud container.

3.14.1. Synopsis

$ CH_RUN_ARGS="NEWROOT [ARG...]"
$ ch-ssh [OPTION...] HOST CMD [ARG...]

3.14.2. Description

Runs command CMD in a Charliecloud container on remote host HOST. Use the content of environment variable CH_RUN_ARGS as the arguments to ch-run on the remote host.

Note

Words in CH_RUN_ARGS are delimited by spaces only; it is not shell syntax.

3.14.3. Example

On host bar.example.com, run the command echo hello inside a Charliecloud container using the unpacked image at /data/foo with starting directory /baz:

$ hostname
foo
$ export CH_RUN_ARGS='--cd /baz /data/foo'
$ ch-ssh bar.example.com -- hostname
bar

3.15. ch-tar2dir

Unpack an image tarball into a directory.

3.15.1. Synopsis

$ ch-tar2dir TARBALL DIR

3.15.2. Description

Extract the tarball TARBALL into a subdirectory of DIR. TARBALL must contain a Linux filesystem image, e.g. as created by ch-builder2tar, and be compressed with gzip or xz. If TARBALL has no extension, try appending .tar.gz and .tar.xz.

Inside DIR, a subdirectory will be created whose name corresponds to the name of the tarball with .tar.gz or other suffix removed. If such a directory exists already and appears to be a Charliecloud container image, it is removed and replaced. If the existing directory doesn’t appear to be a container image, the script aborts with an error.

Additional arguments:

--help

print help and exit

--version

print version and exit

Warning

Placing DIR on a shared file system can cause significant metadata load on the file system servers. This can result in poor performance for you and all your colleagues who use the same file system. Please consult your site admin for a suitable location.

3.15.3. Example

$ ls -lh /var/tmp
total 57M
-rw-r-----  1 reidpr reidpr  57M Feb 13 16:14 hello.tar.gz
$ ch-tar2dir /var/tmp/hello.tar.gz /var/tmp
creating new image /var/tmp/hello
/var/tmp/hello unpacked ok
$ ls -lh /var/tmp
total 57M
drwxr-x--- 22 reidpr reidpr 4.0K Feb 13 16:29 hello
-rw-r-----  1 reidpr reidpr  57M Feb 13 16:14 hello.tar.gz

3.16. ch-test

Run some or all of the Charliecloud test suite.

3.16.1. Synopsis

$ ch-test [PHASE] [--scope SCOPE] [ARGS]

3.16.2. Description

Charliecloud comes with a comprehensive test suite that exercises the container workflow itself as well as a few example applications. ch-test coordinates running the test suite.

While the CLI has lots of options, the defaults are reasonable, and bare ch-test will give useful results in a few minutes on single-node, internet-connected systems with a few GB available in /var/tmp.

The test suite requires a few GB (standard scope) or tens of GB (full scope) of storage for test fixtures:

  • Builder storage (e.g., layer cache). This goes wherever the builder puts it.

  • Packed images directory: image tarballs or SquashFS files.

  • Unpacked images directory. Images are unpacked into and then run from here.

  • Filesystem permissions directories. These are used to test that the kernel is enforcing permissions correctly. Note that this exercises the kernel, not Charliecloud, and can be omitted from routine Charliecloud testing.

The first three are created when needed if they don’t exist, while the filesystem permissions fixtures must be created manually, in order to accommodate configurations where sudo is not available via the same login path used for running tests.

The packed and unpacked image directories specified for testing are volatile. The contents of these directories are deleted before the build and run phases, respectively.

In all four cases, when creating directories, only the final path component is created. Parent directories must already exist, i.e., ch-test uses the behavior of mkdir rather than mkdir -p.

Some of the tests exercise parallel functionality. If ch-test is run on a single node, multiple cores will be used; if in a Slurm allocation, multiple nodes too.

The subset of tests to run mostly splits along two key dimensions. The phase is which parts of the workflow to run. Different parts of the workflow can be tested on different systems by copying the necessary artifacts between them, e.g. by building images on one system and running them on another. The scope allows trading off thoroughness versus time.

PHASE must be one of the following:

build

Image building and associated functionality, with the selected builder.

run

Running containers and associated functionality. This requires a packed images directory produced by a successful build phase, which can be copied from the build system if it’s not also the run system.

examples

Example applications. Requires an unpacked images directory produced by a successful run phase.

all

Execute phases build, run, and examples, in that order.

mk-perm-dirs

Create the filesystem permissions directories. Requires --perm-dirs.

clean

Delete automatically-generated test files, and packed and unpacked image directories.

rm-perm-dirs

Remove the filesystem permissions directories. Requires --perm-dirs.

-f, --file FILE[:TEST]

Run the tests in the given file only, which can be an arbitrary .bats file, except for test.bats under examples, where you must specify the corresponding Dockerfile or Build file instead. This is somewhat brittle and typically used for development or debugging. For example, it does not check whether the pre-requisites of whatever is in the file are satisfied. Often running build and run first is sufficient, but this varies.

If TEST is also given, then run only the test with that name, skipping the others. The separator is a literal colon. Most test names contain spaces, so you’ll usually need to quote the argument to protect it from the shell.

Scope is specified with:

-s, --scope SCOPE

SCOPE must be one of the following; the default is standard.

  • quick: Most important subset of workflow. Handy for development. Completion time: 1–2 minutes.

  • standard: All tested workflow functionality and a selection of more important examples. Completion time: 5–10 minutes.

  • full: All available tests, including all examples. Completion time, hot cache: 7–15 minutes; cold cache: 1–2 hours.

Additional arguments:

-b, --builder BUILDER

Image builder to use. See ch-build(1) for how the default is selected.

--dry-run

Print summary of what would be tested and then exit.

-h, --help

Print usage and then exit.

--img-dir DIR

Set unpacked images directory to DIR. In a multi-node allocation, this directory may not be shared between nodes. Default: $CH_TEST_IMGDIR if set; otherwise /var/tmp/img.

--pack-dir DIR

Set packed images directory to DIR. Default: $CH_TEST_TARDIR if set; otherwise /var/tmp/pack.

--pedantic (yes|no)

Some tests require configurations that are very specific (e.g., being a member of at least two groups) or unusual (e.g., sudo to a non-root group). If yes, then fail if the requirement is not met; if no, then skip. The default is yes for CI environments or people listed in README.md, no otherwise.

If yes and sudo seems to be available, implies --sudo.

--perm-dir DIR

Add DIR to filesystem permission fixture directories; can be specified multiple times. We recommend one such directory per mounted filesystem type whose kernel module you do not trust; e.g., you probably don’t need to test your tmpfses, but out-of-tree filesystems very likely need this.

Implies --sudo. Default: CH_TEST_PERMDIRS if set; otherwise skip the filesystem permissions tests.

--pack-fmt FMT

Use packed image format FMT (squash or tar).

--sudo

Enable things that require sudo, such as certain privilege escalation tests and creating/removing the filesystem permissions fixtures. Requires generic sudo capabilities. Note that the Docker builder uses sudo docker even without this option.

--lustre DIR

Use DIR for run-phase Lustre tests. Default: CH_TEST_LUSTREDIR if set; otherwise skip them.

The tests will create, populate, and delete a new subdirectory under DIR, leaving everything else in DIR untouched.

3.16.3. Exit status

Zero if all tests passed; non-zero if any failed. For setup and teardown phases, zero if everything was created or deleted correctly, non-zero otherwise.

3.16.4. Bugs

Bats will wait until all descendant processes finish before exiting, so if you get into a failure mode where a test sequence doesn’t clean up all its processes, ch-test will hang.

3.16.5. Examples

Many systems can simply use the defaults. To run the build, run, and examples phases on a single system, without the filesystem permissions tests:

$ ch-test
ch-test version 0.12

ch-run: 0.12 /usr/local/bin/ch-run
bats:   0.4.0 /usr/bin/bats
tests:  /usr/local/libexec/charliecloud/test

phase:                build run examples
scope:                standard (default)
builder:              docker (default)
use generic sudo:     no (default)
unpacked images dir:  /var/tmp/img (default)
packed images dir:    /var/tmp/tar (default)
fs permissions dirs:  skip (default)

checking namespaces ...
ok

checking builder ...
found: /usr/bin/docker 19.03.2

bats build.bats build_auto.bats build_post.bats
 ✓ documentation seems sane
 ✓ version number seems sane
[...]
All tests passed.

The next example is for a more complex setup like you might find in HPC centers:

  • Non-default fixture directories.

  • Non-default scope.

  • Different build and run systems.

  • Run the filesystem permissions tests.

Output has been omitted.

(mybox)$ ssh hpc-admin
(hpc-admin)$ ch-test mk-perm-dirs --perm-dir /scratch/$USER/perms \
                                  --perm-dir /home/$USER/perms
(hpc-admin)$ exit
(mybox)$ ch-test build --scope full
(mybox)$ scp -r /var/tmp/pack hpc:/scratch/$USER/pack
(mybox)$ ssh hpc
(hpc)$ salloc -N2
(cn001)$ export CH_TEST_TARDIR=/scratch/$USER/pack
(cn001)$ export CH_TEST_IMGDIR=/local/tmp
(cn001)$ export CH_TEST_PERMDIRS="/scratch/$USER/perms /home/$USER/perms"
(cn001)$ export CH_TEST_SCOPE=full
(cn001)$ ch-test run
(cn001)$ ch-test examples

3.17. ch-umount

Unmount a FUSE mounted squash filesystem and remove the mount point.

3.17.1. Synopsis

$ ch-umount MOUNTDIR

3.17.2. Description

Unmount Charliecloud SquashFS file at target directory MOUNTDIR. Remove empty MOUNTDIR after successful unmounting.

Additional arguments:

--help

print help and exit

--version

print version and exit

3.17.3. Example

$ ls /var/tmp/debian
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr  WEIRD_AL_YANKOVIC
$ ch-umount /var/tmp/debian
unmounted and removed /var/tmp/debian
$ ls /var/tmp/debian
ls: cannot access /var/tmp/debian: No such file or directory