[Docker] Working with Docker Images

Working with Docker Images

Every Linux container is based on an image, which serves as the blueprint for what becomes a running container. Docker or Open Container Initiative (OCI) images form the foundation for everything you deploy and run with Docker. To launch a container, you either download a public image or create your own. An image essentially represents the filesystem for the container, though it's comprised of linked filesystem layers corresponding to each build step.

Images, built from individual layers, place unique demands on the Linux kernel, requiring drivers for Docker's storage backend. Docker heavily relies on this backend for image management, communicating with the Linux filesystem to create and manage the layers forming a single usable image. The primary supported storage backends include Overlay2, B-Tree File System (Btrfs), and Device Mapper, each providing a fast copy-on-write system.

Understanding Dockerfile

To craft a custom Docker image using default tools, familiarity with Dockerfile is essential. This file outlines the steps needed to create an image and is typically found in the root directory of your application's source code repository.

A typical Dockerfile might resemble the one below, creating a Node.js-based application container:

Bash

FROM node:18.13.0

ARG email="[email protected]"
LABEL "maintainer"=$email
LABEL "rating"="Five Stars" "class"="First Class"

USER root

ENV AP /data/app
ENV SCPATH /etc/supervisor/conf.d

RUN apt-get -y update

# Install daemons
RUN apt-get -y install supervisor
RUN mkdir -p /var/log/supervisor

# Configure Supervisor
COPY ./supervisord/conf.d/* $SCPATH/

# Application Code
COPY *.js* $AP/

WORKDIR $AP

RUN npm install

CMD ["supervisord", "-n"]

This Dockerfile illustrates several instructions for controlling image assembly. Each line in a Dockerfile creates a new image layer, containing the changes resulting from that command. Consequently, when building new images, Docker only needs to build layers that deviate from previous builds.

While you could build a Node instance from a plain Linux image, Docker Hub offers official Node images for convenience. The ARG parameter sets variables and their default values, available only during the image build process.

Applying labels to images and containers allows metadata addition via key/value pairs for later identification. For instance, the "maintainer" label leverages the value of the email build argument defined earlier in the Dockerfile.

Adjusting the container's user with the USER instruction is possible, promoting best practices for security.

Using the ENV instruction sets shell variables for configuration during both build and runtime, aiding in Dockerfile simplicity and avoiding repetition.

The subsequent code section utilizes RUN instructions to initiate file structure creation, install necessary dependencies, and configure daemons.

COPY instruction copies files from the local filesystem into the image, often including application code and support files. It utilizes previously defined build variables to streamline the process.

WORKDIR instruction changes the working directory in the image for subsequent build instructions and the default process launched with resulting containers.

The order of commands in a Dockerfile significantly impacts ongoing build times, prioritizing steps with frequent changes towards the end.

Finally, the CMD instruction defines the command launching the desired process within the container, typically encouraging a single process per container for architectural simplicity.

Building an Image

To initiate image building, clone a Git repo containing an example application, such as docker-node-hello. Ensure Docker server and client communication is operational before building. Use the following command to clone the repo:

Bash

$ git clone https://github.com/spkane/docker-node-hello.git \
    --config core.autocrlf=input

This downloads a working Dockerfile and related source code into the docker-node-hello directory. The .dockerignore file, alongside the Dockerfile, defines files and directories excluded from the image build, enhancing efficiency by ignoring the .git directory.

Inspecting the repo, you'll find relevant files like Dockerfile, .dockerignore, index.js, package.json, and a supervisord directory containing configuration files.

With the Dockerfile and related source code available, build the image using Docker's build command. Each step in the build process maps to a line in the Dockerfile, creating new image layers. Subsequent builds should be quicker after the initial image download.

Bash

$ docker image build -t example/docker-node-hello:latest .

For faster builds, Docker employs a local cache, though it might lead to unexpected issues. Use --no-cache to disable caching for a build.

If building on a system with concurrent processes, resource limitation is possible using cgroup methods discussed later. Refer to official Docker documentation for detailed build arguments.

Running Your Image

After successfully building the image, run it on your Docker host with the following command, creating a running container mapping port 8080:

Bash

$ docker container run --rm -d -p 8080:8080 example/docker-node-hello:latest

This command spawns a container in the background from the example/docker-node-hello:latest image, with port 8080 on the Docker host mapped to port 8080 in the container. Verify the running application by accessing port 8080 on the Docker host via a web browser.

To configure the application via environment variables, stop the existing container and start a new one with the desired variables:

Bash

$ docker container stop [container_id]
$ docker container run --rm -d -p 8080:8080 \
    -e WHO="Sean and Karl" \
    example/docker-node-hello:latest

Replace [container_id] with the ID of the existing container. Now, the application should greet "Sean and Karl" instead of the default "World."

Storing Images

Public Registries:

Docker Hub

Docker Hub is a public registry provided by Docker, Inc. It hosts a vast collection of Docker images, including official images for popular software packages like Linux distributions and applications such as WordPress.

To push an image to Docker Hub, you need to first log in using your Docker ID:

Bash

docker login

Then, tag your image with your Docker Hub username and push it:

Bash

docker tag your-image your-dockerhub-username/your-image
docker push your-dockerhub-username/your-image

Quay.io

Quay.io is another public registry, now owned by Red Hat. It offers similar features to Docker Hub, including the ability to host public and private images.

The process to push an image to Quay.io is similar to Docker Hub:

Bash

docker login quay.io
docker tag your-image quay.io/your-quay-username/your-image
docker push quay.io/your-quay-username/your-image

Private Registries:

Harbor

Harbor is a private registry solution that provides features like image verification and GUI interfaces. It's often used by companies that require more control over their image hosting.

To use Harbor, you typically need to set it up on your own infrastructure. Once set up, pushing an image to Harbor is similar to pushing to Docker Hub or Quay.io:

Bash

docker login harbor.example.com
docker tag your-image harbor.example.com/your-project/your-image
docker push harbor.example.com/your-project/your-image

Red Hat Quay

Red Hat Quay is another private registry solution with advanced features like image scanning and vulnerability detection.

Similar to Harbor, pushing an image to Red Hat Quay involves logging in and tagging the image correctly:

Bash

docker login quay.example.com
docker tag your-image quay.example.com/your-organization/your-image
docker push quay.example.com/your-organization/your-image

Authenticating to a Registry

Docker requires authentication to access private registries. You can authenticate using your username and password:

Bash

docker login registry.example.com

Running a Private Registry

Setting up a private registry involves several steps:

Prepare SSL Certificates: Create SSL certificates for secure communication with the registry.

Set up Authentication: Configure authentication to restrict access to authorized users.

Deploy Registry: Run the Docker registry container with appropriate configurations.

Here's an example using Docker's official registry image:

Bash

# Create SSL certificates
openssl req -newkey rsa:4096 -nodes -sha256 -keyout domain.key -x509 -days 365 -out domain.crt

# Run the registry container
docker run -d -p 5000:5000 --restart=always --name registry \
  -v "$(pwd)"/registry:/var/lib/registry \
  -e REGISTRY_HTTP_TLS_CERTIFICATE=/certs/domain.crt \
  -e REGISTRY_HTTP_TLS_KEY=/certs/domain.key \
  registry:2

Pushing Images: Tag and push your images to the private registry:

Bash

docker tag your-image localhost:5000/your-image
docker push localhost:5000/your-image

Pulling Images: Pull images from the private registry:

Bash

docker pull localhost:5000/your-image

These steps should get you started with running your own private Docker registry. Remember to replace registry.example.com or quay.example.com with your actual registry domain.

Optimizing Images

Working efficiently with Docker involves keeping image sizes small and build times fast. This not only enhances deployment speed but also minimizes resource consumption. Let's explore some key considerations and techniques for achieving these objectives.

Keeping Images Small

In contemporary environments, the size of software downloads may seem trivial. However, in large-scale deployments where software is frequently updated across numerous nodes, the impact of large image sizes becomes evident. Network congestion and slower deployment cycles can hamper production environments.

Many Linux containers inherit from a base image containing a minimal Linux distribution, but this is not mandatory. Containers only need files essential for running the application on the host kernel. Let's illustrate this with a minimal container example.

Example: Minimal Go Web Application Container

Go, being a compiled language, generates statically compiled binary files. Consider a small web application written in Go available on GitHub.

To try out the application, run the following command:

Bash

$ docker container run --rm -d -p 8080:8080 spkane/scratch-helloworld

If successful, access the application at http://127.0.0.1:8080 in your web browser.

Now, let's examine the files in this container:

Bash

$ docker container ls -l
CONTAINER ID   IMAGE                           COMMAND       CREATED          ...
ddc3f61f311b   spkane/scratch-helloworld       "/helloworld"  4 minutes ago    ...

$ docker container export ddc3f61f311b -o web-app.tar
$ tar -tvf web-app.tar

You'll notice that besides the application binary, most files in the container are either zero-length or critical system files. This emphasizes that containers should only contain what's necessary to run on the underlying kernel.

To achieve smaller images, consider using multistage builds. This approach enables building production containers with minimal resources while ensuring repeatability in the build system.

Multistage Builds Example:

Bash

# Build container
FROM docker.io/golang:alpine as builder
RUN apk update && \
    apk add git && \
    CGO_ENABLED=0 go install -a -ldflags '-s' \
    github.com/spkane/scratch-helloworld@latest

# Production container
FROM scratch
COPY --from=builder /go/bin/scratch-helloworld /helloworld
EXPOSE 8080
CMD ["/helloworld"]

This Dockerfile demonstrates a multistage build for creating a minimal production container. The builder stage compiles the Go application, while the scratch stage creates the final lightweight image containing only the necessary binary.

In practice, multistage builds significantly reduce image size and resource overhead, making them ideal for production environments.

Layers Are Additive

Understanding Docker image layers is crucial. Each layer is strictly additive, meaning once created, its contents cannot be removed. Although files can be shadowed in subsequent layers, earlier layers cannot be made smaller by deleting files in later layers.

While you can squash layers using experimental Docker features, it's essential to consider the trade-offs. Squashing layers can reduce wasted space but may increase image download size.

The additive nature of image layers is evident when examining the filesystem layers and build steps. Modifications made in subsequent layers do not reduce the size of earlier layers; they only mask or overwrite existing files.

Utilizing the Layer Cache:

Bash

FROM docker.io/fedora

# Install Apache web server
RUN dnf install -y httpd && \
    dnf clean all

# Create directories for web content
RUN mkdir -p /var/www && \
    mkdir -p /var/www/html

# Add custom index.html file
ADD index.html /var/www/html

# Start Apache server
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]

In this Dockerfile, commands are ordered to maximize cache utilization. Stable and time-consuming steps, like installing Apache and creating directories, come before adding the custom index.html file. This ensures that changes to the index.html file invalidate the cache minimally.

Directory Caching:

Bash

# syntax=docker/dockerfile:1
FROM python:3.9.15-slim-bullseye

# Create app directory
RUN mkdir /app
WORKDIR /app

# Copy application code
COPY . /app

# Install dependencies using pip with directory caching
RUN --mount=type=cache,target=/root/.cache pip install -r requirements.txt

# Set working directory for application
WORKDIR /app/mastermind

# Define command to run the application
CMD ["python", "mastermind.py"]

Here, we utilize BuildKit's directory caching feature by specifying --mount=type=cache in the RUN command for installing Python dependencies. This mounts a caching layer into the container, speeding up subsequent builds by reusing cached dependencies.

These Dockerfile examples demonstrate effective techniques for optimizing Docker builds, ensuring faster build times and smaller image sizes by intelligently leveraging layer and directory caching mechanisms.

Troubleshooting Broken Builds

When encountering issues with Docker image builds, it's essential to diagnose and resolve them efficiently. Let's delve into the steps for troubleshooting broken builds, along with code examples.

Debugging Pre-BuildKit Images

Suppose you encounter a failed build in a pre-BuildKit environment. In such cases, you can utilize intermediate containers to isolate and address the problem. Let's illustrate this with an example using the docker-hello-node repository.

First, create a failing build by modifying the Dockerfile:

Bash

# Dockerfile
...
RUN apt-get -y update-all

Change it to:

Bash

# Dockerfile
...
RUN apt-get -y update

Now, attempt to build the image:

Bash

$ DOCKER_BUILDKIT=0 docker image build -t example/docker-node-hello:latest --no-cache .

You'll encounter an error due to the invalid operation update-all. To troubleshoot, start an interactive container from the last successful step:

Bash

$ docker container run --rm -ti 2a236efc3f06 /bin/bash

Inside the container, investigate the issue:

Bash

root@b83048106b0f:/# apt-get -y update-all
E: Invalid operation update-all

root@b83048106b0f:/# apt-get --help
apt 1.4.9 (amd64)
...

Once you identify the root cause, modify the Dockerfile accordingly and rebuild the image:

Bash

$ DOCKER_BUILDKIT=0 docker image build -t example/docker-node-hello:latest .

Debugging BuildKit Images

In BuildKit environments, debugging involves a slightly different approach. Let's simulate a failed build by modifying the Dockerfile:

Bash

# Dockerfile
...
RUN npm installer

Change it to:

Bash

# Dockerfile
...
RUN npm install

Now, attempt to build the image:

Bash

$ docker image build -t example/docker-node-hello:debug --no-cache .

You'll encounter an error indicating an invalid command. To troubleshoot, utilize multistage builds and the --target argument:

Bash

# Dockerfile
...
FROM deploy
RUN npm installer

Modify the Dockerfile to include a new FROM line before the problematic step. Then, build only the first stage:

Bash

$ docker image build -t example/docker-node-hello:debug --target deploy .

Create a container from the debug image and perform necessary tests:

Bash

$ docker container run --rm -ti docker.io/example/docker-node-hello:debug /bin/bash

Once the issue is identified, revert the changes and rebuild the image.

Multiarchitecture Builds

Supporting multiple architectures is crucial in modern computing environments. Docker's buildx plugin simplifies this process. Let's demonstrate building an image for multiple architectures:

Bash

$ docker buildx build --platform linux/amd64,linux/arm64 --tag wordchain:test .

This command builds the image for both AMD64 and ARM64 platforms. Verify the built images by running:

Bash

$ docker image ls

Conclusion

Troubleshooting broken builds is a critical aspect of Docker image development. By understanding the tools and techniques outlined above, developers can efficiently diagnose and resolve issues, ensuring smooth and reliable image builds across different environments.

#Docker