Run a command in a Charliecloud container.
$ ch-run [OPTION...] IMAGE -- CMD [ARG...]
CMD in a fully unprivileged Charliecloud container using
the image specified by
IMAGE, which can be: (1) a path to a directory,
(2) the name of an image in
ch-image storage (e.g.
example.com:5050/foo) or, if the proper support is enabled, a SquashFS
ch-run does not use any setuid or setcap helpers, even for
mounting SquashFS images with FUSE.
DST. The default destination if not specified is to use the same path as the host; i.e., the default is
--bind=SRC:SRC. Can be repeated.
--writeis given and
DSTdoes not exist, it will be created as an empty directory. However,
DSTmust be entirely within the image itself;
DSTcannot enter a previous bind mount. For example,
--bind /foo:/tmp/foowill fail because
/tmpis shared with the host via bind-mount (unless
$TMPDIRis set to something else or
Most images do have ten directories
/mnt/[0-9]already available as mount points.
DSTare followed, and absolute links can have surprising behavior. Bind-mounting happens after namespace setup but before pivoting into the container image, so absolute links use the host root. For example, suppose the image has a symlink
/foo -> /mnt. Then,
--bind=/bar:/foowill bind-mount on the host’s
/mnt, which is inaccessible on the host because namespaces are already set up and also inaccessible in the container because of the subsequent pivot into the image. Currently, this problem is only detected when
DSTneeds to be created:
ch-runwill refuse to follow absolute symlinks in this case, to avoid directory creation surprises.
Initial working directory in container.
ch-ssh(1)into container at
Don’t expand variables when using
Run as group
Bind-mount your host home directory (i.e.,
$HOME) at guest
/home/$USER. This is accomplished by over-mounting a new
/home, which hides any image content under that path. By default, neither of these things happens and the image’s
/homeis exposed unaltered.
Use the same container (namespaces) as peer
Join the namespaces of an existing process.
--join; default: see below).
ch-runpeer group (implies
--join; default: see below).
DIRfor the SquashFS mount point, which must already exist. If not specified, the default is
/var/tmp/$USER.ch/mnt, which will be created if needed.
By default, temporary
/etc/groupfiles are created according to the UID and GID maps for the container and bind-mounted into it. If this is specified, no such temporary files are created and the image’s files are exposed.
Set the storage directory. Equivalent to the same option for
Using seccomp, intercept some system calls that would fail due to lack of privilege, do nothing, and return fake success to the calling program. This is intended for use by
ch-image(1)when building images; see that man page for a detailed discussion.
By default, the host’s
$TMPDIRif set) is bind-mounted at container
/tmp. If this is specified, a new
tmpfsis mounted on the container’s
Set environment variable(s). With:
no argument: as listed in file
/ch/environmentwithin the image. It is an error if the file does not exist or cannot be read. (Note that with SquashFS images, it is not currently possible to use other files within the image.)
FILE(i.e., no equals in argument): as specified in file at host path
FILE. Again, it is an error if the file cannot be read.
NAME=VALUE(i.e., equals sign in argument): set variable
See below for details on how environment variables work in
Run as user
Enable various unsafe behavior. For internal use only. Seriously, stay away from this option.
Unset environment variables whose names match
Be more verbose (can be repeated).
Mount image read-write (by default, the image is mounted read-only).
Print help and exit.
Print a short usage message and exit.
Print version and exit.
ch-run is fully unprivileged, it is not possible to
change UIDs and GIDs within the container (the relevant system calls fail). In
particular, setuid, setgid, and setcap executables do not work. As a
prctl(PR_SET_NO_NEW_PRIVS, 1) to
disable these executables within the
container. This does not reduce functionality but is a “belt and suspenders”
precaution to reduce the attack surface should bugs in these system calls or
7.3. Image format¶
ch-run supports two different image formats.
The first is a simple directory that contains a Linux filesystem tree. This can be accomplished by:
ch-imageor another builder to a directory.
Charliecloud’s tarball workflow: build or pull the image,
ch-convertit to a tarball, transfer the tarball to the target system, then
ch-convertthe tarball to a directory.
Manually mount a SquashFS image, e.g. with
squashfuse(1)and then un-mount it after run with
Any other workflow that produces an appropriate directory tree.
The second is a SquashFS image archive mounted internally by
available if it’s linked with the optional
ch-run mounts the image filesystem, services all FUSE
requests, and unmounts it, all within
above to set the mount point location.
Like other FUSE implementations, Charliecloud calls the
utility to mount the SquashFS filesystem. However, this executable does not
need to be installed setuid root, and in fact
suppresses its setuid bit if set (using
Prior versions of Charliecloud provided wrappers for the
squashfuse_ll SquashFS mount commands and
unmount command. We removed these because we concluded they had minimal
value-add over the standard, unwrapped commands.
Currently, Charliecloud unmounts the SquashFS filesystem when user command
CMD’s process exits. It does not monitor any of its child
processes. Therefore, if the user command spawns child processes and then
exits before them (e.g., some daemons), those children will have the image
unmounted from underneath them. In this case, the workaround is to
mount/unmount using external tools. We expect to remove this limitation in
a future version.
7.4. Host files and directories available in container via bind mounts¶
In addition to any directories specified by the user with
ch-run has standard host files and directories that are bind-mounted
in as well.
The following host files and directories are bind-mounted at the same location in the container. These give access to the host’s devices and various kernel facilities. (Recall that Charliecloud provides minimal isolation and containerized processes are mostly normal unprivileged processes.) They cannot be disabled and are required; i.e., they must exist both on host and within the image.
Optional; bind-mounted only if path exists on both host and within the image, without error or warning if not.
/etc/resolv.conf. Because Charliecloud containers share the host network namespace, they need the same hostname resolution configuration.
/etc/machine-id. Provides a unique ID for the OS installation; matching the host works for most situations. Needed to support D-Bus, some software licensing situations, and likely other use cases. See also issue #1050.
/var/opt/cray/alps/spool. These support Cray MPI.
/usr/bin/ch-ssh. SSH wrapper that automatically containerizes after connecting.
Additional bind mounts done by default but can be disabled; see the options above.
/homeis hidden). Makes user data and init files available.
$TMPDIRif set) at guest
/tmp. Provides a temporary directory that persists between container runs and is shared with non-containerized application components.
temporary files at
/etc/group. Usernames and group names need to be customized for each container run.
7.5. Multiple processes in the same container with
By default, different
ch-run invocations use different user and mount
namespaces (i.e., different containers). While this has no impact on sharing
most resources between invocations, there are a few important exceptions.
ptrace(2), used by debuggers and related tools. One can attach a debugger to processes in descendant namespaces, but not sibling namespaces. The practical effect of this is that (without
--join), you can’t run a command with
ch-runand then attach to it with a debugger also run with
Cross-memory attach (CMA) is used by cooperating processes to communicate by simply reading and writing one another’s memory. This is also not permitted between sibling namespaces. This affects various MPI implementations that use CMA to pass messages between ranks on the same node, because it’s faster than traditional shared memory.
--join is designed to address this by placing related
commands (the “peer group”) in the same container. This is done by one of the
peers creating the namespaces with
unshare(2) and the others joining
To do so, we need to know the number of peers and a name for the group. These are specified by additional arguments that can (hopefully) be left at default values in most cases:
--join-ctsets the number of peers. The default is the value of the first of the following environment variables that is defined:
--join-tagsets the tag that names the peer group. The default is environment variable
SLURM_STEP_ID, if defined; otherwise, the PID of
ch-run’s parent. Tags can be re-used for peer groups that start at different times, i.e., once all peer
ch-runhave replaced themselves with the user command, the tag can be re-used.
One cannot currently add peers after the fact, for example, if one decides to start a debugger after the fact. (This is only required for code with bugs and is thus an unusual use case.)
ch-runinstances race. The winner of this race sets up the namespaces, and the other peers use the winner to find the namespaces to join. Therefore, if the user command of the winner exits, any remaining peers will not be able to join the namespaces, even if they are still active. There is currently no general way to specify which
ch-runshould be the winner.
--join-ctis too high, the winning
ch-run’s user command exits before all peers join, or
ch-runitself crashes, IPC resources such as semaphores and shared memory segments will be leaked. These appear as files in
/dev/shm/and can be removed with
Many of the arguments given to the race losers, such as the image path and
--bind, will be ignored in favor of what was given to the winner.
7.6. Environment variables¶
ch-run leaves environment variables unchanged, i.e. the host
environment is passed through unaltered, except:
limited tweaks to avoid significant guest breakage;
user-set variables via
user-unset variables via
This section describes these features.
The default tweaks happen first, then
--unset-env in the order specified on the command line, and then
CH_RUNNING. The two options can be repeated arbitrarily many times,
e.g. to add/remove multiple variable sets or add only some variables in a
7.6.1. Default behavior¶
ch-run makes the following environment variable changes:
Weird Al Yankovic. While a process can figure out that it’s in an unprivileged container and what namespaces are active without this hint, that can be messy, and there is no way to tell that it’s a Charliecloud container specifically. This variable makes such a test simple and well-defined. (Note: This variable is unaffected by
--homeis specified, then your home directory is bind-mounted into the guest at
/home/$USER. If you also have a different home directory path on the host, an inherited
$HOMEwill be incorrect inside the guest, which confuses lots of software, notably Spack. Thus, with
$HOMEis set to
/home/$USER(by default, it is unchanged.)
Newer Linux distributions replace some root-level directories, such as
/bin, with symlinks to their counterparts in
Some of these distributions (e.g., Fedora 24) have also dropped
/binfrom the default
$PATH. This is a problem when the guest OS does not have a merged
/usr(e.g., Debian 8 “Jessie”). Thus, we add
$PATHif it’s not already present.
Unset, because this is almost certainly a host path, and that host path is made available in the guest at
7.6.2. Setting variables with
The purpose of
--set-env is to set environment variables within the
container. Values given replace any already in the environment (i.e.,
inherited from the host shell) or set by earlier
--set-env. This flag
takes an optional argument with two possible forms:
If the argument contains an equals sign (
=, ASCII 61), that sets an environment variable directly. For example, to set
FOOto the string value
$ ch-run --set-env=FOO=bar ...
Single straight quotes around the value (
', ASCII 39) are stripped, though be aware that both single and double quotes are also interpreted by the shell. For example, this example is similar to the prior one; the double quotes are removed by the shell and the single quotes are removed by
$ ch-run --set-env="'BAZ=qux'" ...
If the argument does not contain an equals sign, it is a host path to a file containing zero or more variables using the same syntax as above (except with no prior shell processing). This file contains a sequence of assignments separated by newlines. Empty lines are ignored, and no comments are interpreted. (This syntax is designed to accept the output of
printenvand be easily produced by other simple mechanisms.) For example:
$ cat /tmp/env.txt FOO=bar BAZ='qux' $ ch-run --set-env=/tmp/env.txt ...
For directory images only (because the file is read before containerizing), guest paths can be given by prepending the image path.
If there is no argument, the file
/ch/environmentwithin the image is used. This file is commonly populated by
ENVinstructions in the Dockerfile. For example, equivalently to form 2:
$ cat Dockerfile [...] ENV FOO=bar ENV BAZ=qux [...] $ ch-image build -t foo . $ ch-convert foo /var/tmp/foo.sqfs $ ch-run --set-env /var/tmp/foo.sqfs -- ...
(Note the image path is interpreted correctly, not as the
At present, there is no way to use files other than
/ch/environmentwithin SquashFS images.
Environment variables are expanded for values that look like search paths,
--env-no-expand is given prior to
--set-env. In this
case, the value is a sequence of zero or more possibly-empty items separated
by colon (
:, ASCII 58). If an item begins with dollar sign (
ASCII 36), then the rest of the item is the name of an environment variable.
If this variable is set to a non-empty value, that value is substituted for
the item; otherwise (i.e., the variable is unset or the empty string), the
item is deleted, including a delimiter colon. The purpose of omitting empty
expansions is to avoid surprising behavior such as an empty element in
$PATH meaning the current directory.
For example, to set
HOSTPATH to the search path in the current shell
(this is expanded by
ch-run, though letting the shell do it happens to
$ ch-run --set-env='HOSTPATH=$PATH' ...
/opt/bin to this current search path:
$ ch-run --set-env='PATH=/opt/bin:$PATH' ...
/opt/bin to the search path set by the Dockerfile, as
retrieved from guest file
/ch/environment (here we really cannot let
the shell expand
$ ch-run --set-env --set-env='PATH=/opt/bin:$PATH' ...
Examples of valid assignment, assuming that environment variable
is set to
UNSET is unset or set to the empty string:
Example invalid assignments:
no equals separator
name cannot be empty
Example valid assignments that are probably not what you want:
double quotes aren’t stripped
comments not supported
backslashes are not special
leading space in key
leading space in value
variables not expanded in key
7.6.3. Removing variables with
The purpose of
--unset-env=GLOB is to remove unwanted environment
variables. The argument
GLOB is a glob pattern (dialect
FNM_EXTMATCH flag where supported); all variables with
matching names are removed from the environment.
Because the shell also interprets glob patterns, if any wildcard characters
GLOB, it is important to put it in single quotes to avoid
GLOB must be a non-empty string.
Example 1: Remove the single environment variable
$ export FOO=bar $ env | fgrep FOO FOO=bar $ ch-run --unset-env=FOO $CH_TEST_IMGDIR/chtest -- env | fgrep FOO $
Example 2: Hide from a container the fact that it’s running in a Slurm
allocation, by removing all variables beginning with
SLURM. You might
want to do this to test an MPI program with one rank and no launcher:
$ salloc -N1 $ env | egrep '^SLURM' | wc 44 44 1092 $ ch-run $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello [... long error message ...] $ ch-run --unset-env='SLURM*' $CH_TEST_IMGDIR/mpihello-openmpi -- /hello/hello 0: MPI version: Open MPI v3.1.3, package: Open MPI root@c897a83f6f92 Distribution, ident: 3.1.3, repo rev: v3.1.3, Oct 29, 2018 0: init ok cn001.localdomain, 1 ranks, userns 4026532530 0: send/receive ok 0: finalize ok
Example 3: Clear the environment completely (remove all variables):
$ ch-run --unset-env='*' $CH_TEST_IMGDIR/chtest -- env $
Example 4: Remove all environment variables except for those prefixed with
$ export WANTED_1=yes $ export ALSO_WANTED_2=yes $ export NOT_WANTED_1=no $ ch-run --unset-env='!(WANTED_*|ALSO_WANTED_*)' $CH_TEST_IMGDIR/chtest -- env WANTED_1=yes ALSO_WANTED_2=yes $
Note that some programs, such as shells, set some environment variables even if started with no init files:
$ ch-run --unset-env='*' $CH_TEST_IMGDIR/debian_9ch -- bash --noprofile --norc -c env SHLVL=1 PWD=/ _=/usr/bin/env $
Run the command
echo hello inside a Charliecloud container using the
unpacked image at
$ ch-run /data/foo -- echo hello hello
Run an MPI job that can use CMA to communicate:
$ srun ch-run --join /data/foo -- bar
ch-run logs its command line to syslog. (This can be disabled by configuring
--disable-syslog.) This includes: (1) the invoking real UID, (2)
the number of command line arguments, and (3) the arguments, separated by
spaces. For example:
Dec 10 18:19:08 mybox ch-run: uid=1000 args=7: ch-run -v /var/tmp/00_tiny -- echo hello "wor l}\$d"
Logging is one of the first things done during program initialization, even before command line parsing. That is, almost all command lines are logged, even if erroneous, and there is no logging of program success or failure.
Arguments are serialized with the following procedure. The purpose is to provide a human-readable reconstruction of the command line while also allowing each argument to be recovered byte-for-byte.
If an argument contains only printable ASCII bytes that are not whitespace, shell metacharacters, double quote (
", ASCII 34 decimal), or backslash (
\, ASCII 92), then log it unchanged.
Otherwise, (a) enclose the argument in double quotes and (b) backslash-escape double quotes, backslashes, and characters interpreted by Bash (including POSIX shells) within double quotes.
The verbatim command line typed in the shell cannot be recovered, because not
enough information is provided to UNIX programs. For example,
echo 'foo' is given to programs as a sequence of two arguments,
foo; the two spaces and single quotes are removed by
the shell. The zero byte, ASCII NUL, cannot appear in arguments because it
would terminate the string.
7.9. Exit status¶
If there is an error during containerization,
ch-run exits with status
non-zero. If the user command is started successfully, the exit status is that
of the user command, with one exception: if the image is an internally mounted
SquashFS filesystem and the user command is killed by a signal, the exit
status is 1 regardless of the signal value.