Blog

How does 'kubectl exec' work?

· Erkan Erol

Last Friday, one of my colleagues approached me and asked a question about how to exec a command in a pod with client-go. I didn’t know the answer and I noticed that I had never thought about the mechanism in “kubectl exec”. I had some ideas about how it should be, but I wasn’t 100% sure. I noted the topic to check again and I have learnt a lot after reading some blogs, docs and source codes. In this blog post, I am going to share my understanding and findings.

Please ping me if there is something wrong. https://twitter.com/erkan_erol_

Setup

I cloned https://github.com/ecomm-integration-ballerina/kubernetes-cluster in order to create a k8s cluster in my MacBook. I fixed IP addresses of the nodes in kubelet configurations since the default configuration didn’t let me run kubectl exec. You can find the root cause here.

  • Any machine = my MacBook
  • IP of master node = 192.168.205.10
  • IP of worker node = 192.168.205.11
  • API server port = 6443

Components

Components

  • kubectl exec process: When we run “kubectl exec …” in a machine, a process starts. You can run it in any machine which has an access to k8s api server.

  • api server: Component on the master that exposes the Kubernetes API. It is the front-end for the Kubernetes control plane.

  • kubelet: An agent that runs on each node in the cluster. It makes sure that containers are running in a pod.

  • container runtime: The software that is responsible for running containers. Examples: docker, cri-o, containerd…

  • kernel: kernel of the OS in the worker node which is responsible to manage processes.

  • target container: A container which is a part of a pod and which is running on one of the worker nodes.



Findings

1. Activities in Client Side

  • Create a pod in default namespace
// any machine
$ kubectl run exec-test-nginx --image=nginx
  • Then run an exec command and sleep 5000 to make observation
// any machine
$ kubectl exec -it exec-test-nginx-6558988d5-fgxgg -- sh
# sleep 5000
  • We can observe the kubectl process (pid=8507 in this case)
// any machine
$ ps -ef |grep kubectl
501  8507  8409   0  7:19PM ttys000    0:00.13 kubectl exec -it exec-test-nginx-6558988d5-fgxgg -- sh
  • When we check network activities of the process, we can see that it has some connections to api-server (192.168.205.10.6443)
// any machine
$ netstat -atnv |grep 8507
tcp4       0      0  192.168.205.1.51673    192.168.205.10.6443    ESTABLISHED 131072 131768   8507      0 0x0102 0x00000020
tcp4       0      0  192.168.205.1.51672    192.168.205.10.6443    ESTABLISHED 131072 131768   8507      0 0x0102 0x00000028
  • Let’s check the code. kubectl creates a POST request with subresource exec and sends a rest request.
    // source: https://github.com/kubernetes/kubernetes/blob/a1f1f0b599e961a5c59b02c349c0ed818b1851a5/staging/src/k8s.io/kubectl/pkg/cmd/exec/exec.go#L344-L359
    req := restClient.Post().
      Resource("pods").
      Name(pod.Name).
      Namespace(pod.Namespace).
      SubResource("exec")
    req.VersionedParams(&corev1.PodExecOptions{
      Container: containerName,
      Command:   p.Command,
      Stdin:     p.Stdin,
      Stdout:    p.Out != nil,
      Stderr:    p.ErrOut != nil,
      TTY:       t.Raw,
    }, scheme.ParameterCodec)
    
    return p.Executor.Execute("POST", req.URL(), p.Config, p.In, p.Out, p.ErrOut, t.Raw, sizeQueue)
    

rest-request


2. Activities in Master Node

  • We can observe the request in api-server side as well.

    handler.go:143] kube-apiserver: POST "/api/v1/namespaces/default/pods/exec-test-nginx-6558988d5-fgxgg/exec" satisfied by gorestful with webservice /api/v1
    upgradeaware.go:261] Connecting to backend proxy (intercepting redirects) https://192.168.205.11:10250/exec/default/exec-test-nginx-6558988d5-fgxgg/exec-test-nginx?command=sh&input=1&output=1&tty=1
    Headers: map[Connection:[Upgrade] Content-Length:[0] Upgrade:[SPDY/3.1] User-Agent:[kubectl/v1.12.10 (darwin/amd64) kubernetes/e3c1340] X-Forwarded-For:[192.168.205.1] X-Stream-Protocol-Version:[v4.channel.k8s.io v3.channel.k8s.io v2.channel.k8s.io channel.k8s.io]]
    

    Notice that the http request includes a protocol upgrade request. SPDY allows for separate stdin/stdout/stderr/spdy-error “streams” to be multiplexed over a single TCP connection.

  • Api server receives the request and binds it into a PodExecOptions

    // source: https://github.com/kubernetes/kubernetes/blob/a1f1f0b599e961a5c59b02c349c0ed818b1851a5/pkg/apis/core/types.go#L4124-L4147
    // PodExecOptions is the query options to a Pod's remote exec call
    type PodExecOptions struct {
      metav1.TypeMeta
      // Stdin if true indicates that stdin is to be redirected for the exec call
      Stdin bool
      // Stdout if true indicates that stdout is to be redirected for the exec call
      Stdout bool
      // Stderr if true indicates that stderr is to be redirected for the exec call
      Stderr bool
      // TTY if true indicates that a tty will be allocated for the exec call
      TTY bool
      // Container in which to execute the command.
      Container string
      // Command is the remote command to execute; argv array; not executed within a shell.
      Command []string
    }
    
  • To be able to take necessary actions, api-server needs to know which location it should contact.

    // source: https://github.com/kubernetes/kubernetes/blob/a1f1f0b599e961a5c59b02c349c0ed818b1851a5/pkg/registry/core/pod/strategy.go#L455-L467
    // ExecLocation returns the exec URL for a pod container. If opts.Container is blank
    // and only one container is present in the pod, that container is used.
    func ExecLocation(
      getter ResourceGetter,
      connInfo client.ConnectionInfoGetter,
      ctx context.Context,
      name string,
      opts *api.PodExecOptions,
    ) (*url.URL, http.RoundTripper, error) {
      return streamLocation(getter, connInfo, ctx, name, opts, opts.Container, "exec")
    }
    

    Of course the endpoint is derived from node info.

    // source: https://github.com/kubernetes/kubernetes/blob/a1f1f0b599e961a5c59b02c349c0ed818b1851a5/pkg/registry/core/pod/strategy.go#L504-L510
    nodeName := types.NodeName(pod.Spec.NodeName)
      if len(nodeName) == 0 {
      	// If pod has not been assigned a host, return an empty location
      	return nil, nil, errors.NewBadRequest(fmt.Sprintf("pod %s does not have a host assigned", name))
      }
      nodeInfo, err := connInfo.GetConnectionInfo(ctx, nodeName)
    

    GOTCHA! KUBELET HAS A PORT (node.Status.DaemonEndpoints.KubeletEndpoint.Port) TO WHICH API-SERVER CAN CONNECT.

    // source: https://github.com/kubernetes/kubernetes/blob/a1f1f0b599e961a5c59b02c349c0ed818b1851a5/pkg/kubelet/client/kubelet_client.go#L180-L206
    // GetConnectionInfo retrieves connection info from the status of a Node API object.
    func (k *NodeConnectionInfoGetter) GetConnectionInfo(ctx context.Context, nodeName types.NodeName) (*ConnectionInfo, error) {
      node, err := k.nodes.Get(ctx, string(nodeName), metav1.GetOptions{})
      if err != nil {
        return nil, err
      }
    
      // Find a kubelet-reported address, using preferred address type
      host, err := nodeutil.GetPreferredNodeAddress(node, k.preferredAddressTypes)
      if err != nil {
        return nil, err
      }
    
      // Use the kubelet-reported port, if present
      port := int(node.Status.DaemonEndpoints.KubeletEndpoint.Port)
      if port <= 0 {
        port = k.defaultPort
      }
    
      return &ConnectionInfo{
        Scheme:    k.scheme,
        Hostname:  host,
        Port:      strconv.Itoa(port),
        Transport: k.transport,
      }, nil
    }
    

    Communication between Nodes and the Control Plane > API server to kubelet: “These connections terminate at the kubelet’s HTTPS endpoint. By default, the apiserver does not verify the kubelet’s serving certificate, which makes the connection subject to man-in-the-middle attacks, and unsafe to run over untrusted and/or public networks."

  • Now, api server knows the endpoint and it opens a connections.

    // source: https://github.com/kubernetes/kubernetes/blob/a1f1f0b599e961a5c59b02c349c0ed818b1851a5/pkg/registry/core/pod/rest/subresources.go#L132-L144
    // Connect returns a handler for the pod exec proxy
    func (r *ExecREST) Connect(ctx context.Context, name string, opts runtime.Object, responder rest.Responder) (http.Handler, error) {
      execOpts, ok := opts.(*api.PodExecOptions)
      if !ok {
        return nil, fmt.Errorf("invalid options object: %#v", opts)
      }
      location, transport, err := pod.ExecLocation(r.Store, r.KubeletConn, ctx, name, execOpts)
      if err != nil {
        return nil, err
      }
      return newThrottledUpgradeAwareProxyHandler(location, transport, false, true, true, responder), nil
    }
    
  • Let’s check what is going on the master node.

    First, learn the ip of the worker node. It is 192.168.205.11 in this case.

    // any machine
    $ kubectl get nodes k8s-node-1 -o wide
    NAME         STATUS   ROLES    AGE   VERSION   INTERNAL-IP      EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
    k8s-node-1   Ready    <none>   9h    v1.15.3   192.168.205.11   <none>        Ubuntu 16.04.6 LTS   4.4.0-159-generic   docker://17.3.3
    

    Then get the kubelet port. It is 10250 in this case.

    // any machine
    $ kubectl get nodes k8s-node-1 -o jsonpath='{.status.daemonEndpoints.kubeletEndpoint}'
    map[Port:10250]
    

    Then check the network. Is there a connection to worker node(192.168.205.11)? THE CONNECTİON IS THERE. When I kill the exec process, it disappears so I know it is set by api-server because of my exec command.

    // master node
    $ netstat -atn |grep 192.168.205.11
    tcp        0      0 192.168.205.10:37870    192.168.205.11:10250    ESTABLISHED
    ...
    

api-server-to-kubelet

  • Now the connection between kubectl and api-server is still open and there is another connection between api-server and kubelet.



3. Activities in Worker Node

  • Let’s continue by connecting to the worker node and checking what is going on the worker node.

    First, we can observe the connection here as well. The second line. 192.168.205.10 is the IP of master node.

    // worker node
    $ netstat -atn |grep 10250
    tcp6       0      0 :::10250                :::*                    LISTEN
    tcp6       0      0 192.168.205.11:10250    192.168.205.10:37870    ESTABLISHED
    

    What about our sleep command? HOORAYYYY!! OUR COMMAND IS THERE!!!!

    // worker node
    $ ps -afx
    ...
    31463 ?        Sl     0:00      \_ docker-containerd-shim 7d974065bbb3107074ce31c51f5ef40aea8dcd535ae11a7b8f2dd180b8ed583a /var/run/docker/libcontainerd/7d974065bbb3107074ce31c51
    31478 pts/0    Ss     0:00          \_ sh
    31485 pts/0    S+     0:00              \_ sleep 5000
    ...
    
  • Wait! How did kubelet do it?

  • kubelet has a daemon which serves an api over a port for api-server requests.

    // source: https://github.com/kubernetes/kubernetes/blob/33081c1f07be128a89441a39c467b7ea2221b23d/pkg/kubelet/server/streaming/server.go#L41-L60
    // Server is the library interface to serve the stream requests.
    type Server interface {
      http.Handler
    
      // Get the serving URL for the requests.
      // Requests must not be nil. Responses may be nil iff an error is returned.
      GetExec(*runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error)
      GetAttach(req *runtimeapi.AttachRequest) (*runtimeapi.AttachResponse, error)
      GetPortForward(*runtimeapi.PortForwardRequest) (*runtimeapi.PortForwardResponse, error)
    
      // Start the server.
      // addr is the address to serve on (address:port) stayUp indicates whether the server should
      // listen until Stop() is called, or automatically stop after all expected connections are
      // closed. Calling Get{Exec,Attach,PortForward} increments the expected connection count.
      // Function does not return until the server is stopped.
      Start(stayUp bool) error
      // Stop the server, and terminate any open connections.
      Stop() error
    }
    
  • kubelet computes a response endpoint for exec requests.

    //source: https://github.com/kubernetes/kubernetes/blob/33081c1f07be128a89441a39c467b7ea2221b23d/pkg/kubelet/server/streaming/server.go#L178-L190
    func (s *server) GetExec(req *runtimeapi.ExecRequest) (*runtimeapi.ExecResponse, error) {
      if err := validateExecRequest(req); err != nil {
        return nil, err
      }
      token, err := s.cache.Insert(req)
      if err != nil {
        return nil, err
      }
      return &runtimeapi.ExecResponse{
        Url: s.buildURL("exec", token),
      }, nil
    }
    

    Don’t confuse. It doesn’t return the result of the command. It returns an endpoint for communication.

    // source: https://github.com/kubernetes/kubernetes/blob/6568325ca2bef519e5c8228cd33887660b5ed7b0/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go#L4534-L4540
    type ExecResponse struct {
      // Fully qualified URL of the exec streaming server.
      Url                  string   `protobuf:"bytes,1,opt,name=url,proto3" json:"url,omitempty"`
      XXX_NoUnkeyedLiteral struct{} `json:"-"`
      XXX_sizecache        int32    `json:"-"`
    }
    

    kubelet implements RuntimeServiceClient interface which is part of Container Runtime Interface.

    //source: https://github.com/kubernetes/kubernetes/blob/6568325ca2bef519e5c8228cd33887660b5ed7b0/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go#L7158-L7229
    type RuntimeServiceClient interface {
      // Version returns the runtime name, runtime version, and runtime API version.
      Version(ctx context.Context, in *VersionRequest, opts ...grpc.CallOption) (*VersionResponse, error)
      // RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
      // the sandbox is in the ready state on success.
      RunPodSandbox(ctx context.Context, in *RunPodSandboxRequest, opts ...grpc.CallOption) (*RunPodSandboxResponse, error)
      // StopPodSandbox stops any running process that is part of the sandbox and
      // reclaims network resources (e.g., IP addresses) allocated to the sandbox.
      // If there are any running containers in the sandbox, they must be forcibly
      // terminated.
      // This call is idempotent, and must not return an error if all relevant
      // resources have already been reclaimed. kubelet will call StopPodSandbox
      // at least once before calling RemovePodSandbox. It will also attempt to
      // reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
      // multiple StopPodSandbox calls are expected.
      StopPodSandbox(ctx context.Context, in *StopPodSandboxRequest, opts ...grpc.CallOption) (*StopPodSandboxResponse, error)
      // RemovePodSandbox removes the sandbox. If there are any running containers
      // in the sandbox, they must be forcibly terminated and removed.
      // This call is idempotent, and must not return an error if the sandbox has
      // already been removed.
      RemovePodSandbox(ctx context.Context, in *RemovePodSandboxRequest, opts ...grpc.CallOption) (*RemovePodSandboxResponse, error)
      // PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
      // present, returns an error.
      PodSandboxStatus(ctx context.Context, in *PodSandboxStatusRequest, opts ...grpc.CallOption) (*PodSandboxStatusResponse, error)
      // ListPodSandbox returns a list of PodSandboxes.
      ListPodSandbox(ctx context.Context, in *ListPodSandboxRequest, opts ...grpc.CallOption) (*ListPodSandboxResponse, error)
      // CreateContainer creates a new container in specified PodSandbox
      CreateContainer(ctx context.Context, in *CreateContainerRequest, opts ...grpc.CallOption) (*CreateContainerResponse, error)
      // StartContainer starts the container.
      StartContainer(ctx context.Context, in *StartContainerRequest, opts ...grpc.CallOption) (*StartContainerResponse, error)
      // StopContainer stops a running container with a grace period (i.e., timeout).
      // This call is idempotent, and must not return an error if the container has
      // already been stopped.
      // TODO: what must the runtime do after the grace period is reached?
      StopContainer(ctx context.Context, in *StopContainerRequest, opts ...grpc.CallOption) (*StopContainerResponse, error)
      // RemoveContainer removes the container. If the container is running, the
      // container must be forcibly removed.
      // This call is idempotent, and must not return an error if the container has
      // already been removed.
      RemoveContainer(ctx context.Context, in *RemoveContainerRequest, opts ...grpc.CallOption) (*RemoveContainerResponse, error)
      // ListContainers lists all containers by filters.
      ListContainers(ctx context.Context, in *ListContainersRequest, opts ...grpc.CallOption) (*ListContainersResponse, error)
      // ContainerStatus returns status of the container. If the container is not
      // present, returns an error.
      ContainerStatus(ctx context.Context, in *ContainerStatusRequest, opts ...grpc.CallOption) (*ContainerStatusResponse, error)
      // UpdateContainerResources updates ContainerConfig of the container.
      UpdateContainerResources(ctx context.Context, in *UpdateContainerResourcesRequest, opts ...grpc.CallOption) (*UpdateContainerResourcesResponse, error)
      // ReopenContainerLog asks runtime to reopen the stdout/stderr log file
      // for the container. This is often called after the log file has been
      // rotated. If the container is not running, container runtime can choose
      // to either create a new log file and return nil, or return an error.
      // Once it returns error, new container log file MUST NOT be created.
      ReopenContainerLog(ctx context.Context, in *ReopenContainerLogRequest, opts ...grpc.CallOption) (*ReopenContainerLogResponse, error)
      // ExecSync runs a command in a container synchronously.
      ExecSync(ctx context.Context, in *ExecSyncRequest, opts ...grpc.CallOption) (*ExecSyncResponse, error)
      // Exec prepares a streaming endpoint to execute a command in the container.
      Exec(ctx context.Context, in *ExecRequest, opts ...grpc.CallOption) (*ExecResponse, error)
      // Attach prepares a streaming endpoint to attach to a running container.
      Attach(ctx context.Context, in *AttachRequest, opts ...grpc.CallOption) (*AttachResponse, error)
      // PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
      PortForward(ctx context.Context, in *PortForwardRequest, opts ...grpc.CallOption) (*PortForwardResponse, error)
      // ContainerStats returns stats of the container. If the container does not
      // exist, the call returns an error.
      ContainerStats(ctx context.Context, in *ContainerStatsRequest, opts ...grpc.CallOption) (*ContainerStatsResponse, error)
      // ListContainerStats returns stats of all running containers.
      ListContainerStats(ctx context.Context, in *ListContainerStatsRequest, opts ...grpc.CallOption) (*ListContainerStatsResponse, error)
      // UpdateRuntimeConfig updates the runtime configuration based on the given request.
      UpdateRuntimeConfig(ctx context.Context, in *UpdateRuntimeConfigRequest, opts ...grpc.CallOption) (*UpdateRuntimeConfigResponse, error)
      // Status returns the status of the runtime.
      Status(ctx context.Context, in *StatusRequest, opts ...grpc.CallOption) (*StatusResponse, error)
    } 
    

    It just uses gRPC to invoke a method through Container Runtime Interface.

    // source: https://github.com/kubernetes/kubernetes/blob/6568325ca2bef519e5c8228cd33887660b5ed7b0/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go#L7230-L7233
    type runtimeServiceClient struct {
      cc *grpc.ClientConn
    }
    
    // source: https://github.com/kubernetes/kubernetes/blob/6568325ca2bef519e5c8228cd33887660b5ed7b0/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go#L7374-L7381
    func (c *runtimeServiceClient) Exec(ctx context.Context, in *ExecRequest, opts ...grpc.CallOption) (*ExecResponse, error) {
      out := new(ExecResponse)
      err := c.cc.Invoke(ctx, "/runtime.v1alpha2.RuntimeService/Exec", in, out, opts...)
      if err != nil {
        return nil, err
      }
      return out, nil
    }
    

    Container Runtime is responsible to implement RuntimeServiceServer

    // source: https://github.com/kubernetes/kubernetes/blob/6568325ca2bef519e5c8228cd33887660b5ed7b0/staging/src/k8s.io/cri-api/pkg/apis/runtime/v1alpha2/api.pb.go#L7436-L7508
    // RuntimeServiceServer is the server API for RuntimeService service.
    type RuntimeServiceServer interface {
      // Version returns the runtime name, runtime version, and runtime API version.
      Version(context.Context, *VersionRequest) (*VersionResponse, error)
      // RunPodSandbox creates and starts a pod-level sandbox. Runtimes must ensure
      // the sandbox is in the ready state on success.
      RunPodSandbox(context.Context, *RunPodSandboxRequest) (*RunPodSandboxResponse, error)
      // StopPodSandbox stops any running process that is part of the sandbox and
      // reclaims network resources (e.g., IP addresses) allocated to the sandbox.
      // If there are any running containers in the sandbox, they must be forcibly
      // terminated.
      // This call is idempotent, and must not return an error if all relevant
      // resources have already been reclaimed. kubelet will call StopPodSandbox
      // at least once before calling RemovePodSandbox. It will also attempt to
      // reclaim resources eagerly, as soon as a sandbox is not needed. Hence,
      // multiple StopPodSandbox calls are expected.
      StopPodSandbox(context.Context, *StopPodSandboxRequest) (*StopPodSandboxResponse, error)
      // RemovePodSandbox removes the sandbox. If there are any running containers
      // in the sandbox, they must be forcibly terminated and removed.
      // This call is idempotent, and must not return an error if the sandbox has
      // already been removed.
      RemovePodSandbox(context.Context, *RemovePodSandboxRequest) (*RemovePodSandboxResponse, error)
      // PodSandboxStatus returns the status of the PodSandbox. If the PodSandbox is not
      // present, returns an error.
      PodSandboxStatus(context.Context, *PodSandboxStatusRequest) (*PodSandboxStatusResponse, error)
      // ListPodSandbox returns a list of PodSandboxes.
      ListPodSandbox(context.Context, *ListPodSandboxRequest) (*ListPodSandboxResponse, error)
      // CreateContainer creates a new container in specified PodSandbox
      CreateContainer(context.Context, *CreateContainerRequest) (*CreateContainerResponse, error)
      // StartContainer starts the container.
      StartContainer(context.Context, *StartContainerRequest) (*StartContainerResponse, error)
      // StopContainer stops a running container with a grace period (i.e., timeout).
      // This call is idempotent, and must not return an error if the container has
      // already been stopped.
      // TODO: what must the runtime do after the grace period is reached?
      StopContainer(context.Context, *StopContainerRequest) (*StopContainerResponse, error)
      // RemoveContainer removes the container. If the container is running, the
      // container must be forcibly removed.
      // This call is idempotent, and must not return an error if the container has
      // already been removed.
      RemoveContainer(context.Context, *RemoveContainerRequest) (*RemoveContainerResponse, error)
      // ListContainers lists all containers by filters.
      ListContainers(context.Context, *ListContainersRequest) (*ListContainersResponse, error)
      // ContainerStatus returns status of the container. If the container is not
      // present, returns an error.
      ContainerStatus(context.Context, *ContainerStatusRequest) (*ContainerStatusResponse, error)
      // UpdateContainerResources updates ContainerConfig of the container.
      UpdateContainerResources(context.Context, *UpdateContainerResourcesRequest) (*UpdateContainerResourcesResponse, error)
      // ReopenContainerLog asks runtime to reopen the stdout/stderr log file
      // for the container. This is often called after the log file has been
      // rotated. If the container is not running, container runtime can choose
      // to either create a new log file and return nil, or return an error.
      // Once it returns error, new container log file MUST NOT be created.
      ReopenContainerLog(context.Context, *ReopenContainerLogRequest) (*ReopenContainerLogResponse, error)
      // ExecSync runs a command in a container synchronously.
      ExecSync(context.Context, *ExecSyncRequest) (*ExecSyncResponse, error)
      // Exec prepares a streaming endpoint to execute a command in the container.
      Exec(context.Context, *ExecRequest) (*ExecResponse, error)
      // Attach prepares a streaming endpoint to attach to a running container.
      Attach(context.Context, *AttachRequest) (*AttachResponse, error)
      // PortForward prepares a streaming endpoint to forward ports from a PodSandbox.
      PortForward(context.Context, *PortForwardRequest) (*PortForwardResponse, error)
      // ContainerStats returns stats of the container. If the container does not
      // exist, the call returns an error.
      ContainerStats(context.Context, *ContainerStatsRequest) (*ContainerStatsResponse, error)
      // ListContainerStats returns stats of all running containers.
      ListContainerStats(context.Context, *ListContainerStatsRequest) (*ListContainerStatsResponse, error)
      // UpdateRuntimeConfig updates the runtime configuration based on the given request.
      UpdateRuntimeConfig(context.Context, *UpdateRuntimeConfigRequest) (*UpdateRuntimeConfigResponse, error)
      // Status returns the status of the runtime.
      Status(context.Context, *StatusRequest) (*StatusResponse, error)
    }
    

kubelet-to-container-runtime

  • If it is so, we need to observe a connection between kubelet and container runtime. Right? Let’s check.

Run this command before and after running exec command and check the diff. This one is the diff in my case.

// worker node
$ ss -a -p |grep kubelet
...
u_str  ESTAB      0      0       * 157937                * 157387                users:(("kubelet",pid=5714,fd=33))
...

Hımmm. There is a new connection via unix sockets between kubelet(pid=5714) and something. Who can be? YES. IT IS DOCKER(pid=1186).

// worker node
$ ss -a -p |grep 157387
...
u_str  ESTAB      0      0       * 157937                * 157387                users:(("kubelet",pid=5714,fd=33))
u_str  ESTAB      0      0      /var/run/docker.sock 157387                * 157937                users:(("dockerd",pid=1186,fd=14))
...

Remember. This is the docker daemon process(pid=1186) which runs our command.

// worker node.
$ ps -afx
...
 1186 ?        Ssl    0:55 /usr/bin/dockerd -H fd://
17784 ?        Sl     0:00      \_ docker-containerd-shim 53a0a08547b2f95986402d7f3b3e78702516244df049ba6c5aa012e81264aa3c /var/run/docker/libcontainerd/53a0a08547b2f95986402d7f3
17801 pts/2    Ss     0:00          \_ sh
17827 pts/2    S+     0:00              \_ sleep 5000
...

4. Activities in Container Runtime

  • Let’s check cri-o’s source code to understand how it can happen. The logic is similar in docker.

    It has a server which implements RuntimeServiceServer.

    // source: https://github.com/cri-o/cri-o/blob/b3c67952115bee6f44405f547fef0b65a9134c60/server/server.go#L62-L83
    // Server implements the RuntimeService and ImageService
    type Server struct {
      config          libconfig.Config
      seccompProfile  *seccomp.Seccomp
      stream          StreamService
      netPlugin       ocicni.CNIPlugin
      hostportManager hostport.HostPortManager
    
      appArmorProfile string
      hostIP          string
      bindAddress     string
    
      *lib.ContainerServer
      monitorsChan      chan struct{}
      defaultIDMappings *idtools.IDMappings
      systemContext     *types.SystemContext // Never nil
    
      updateLock sync.RWMutex
    
      seccompEnabled  bool
      appArmorEnabled bool
    }
    
    // source: https://github.com/cri-o/cri-o/blob/b3c67952115bee6f44405f547fef0b65a9134c60/server/container_exec.go#L14-L28
    // Exec prepares a streaming endpoint to execute a command in the container.
    func (s *Server) Exec(ctx context.Context, req *pb.ExecRequest) (resp *pb.ExecResponse, err error) {
      const operation = "exec"
      defer func() {
        recordOperation(operation, time.Now())
        recordError(operation, err)
      }()
    
      resp, err = s.getExec(req)
      if err != nil {
        return nil, fmt.Errorf("unable to prepare exec endpoint: %v", err)
      }
    
      return resp, nil
    }
    

    At the end of the chain, container runtime executes the command in the worker node.

    // source: https://github.com/cri-o/cri-o/blob/b3c67952115bee6f44405f547fef0b65a9134c60/internal/oci/runtime_oci.go#L293-L342
    // ExecContainer prepares a streaming endpoint to execute a command in the container.
    func (r *runtimeOCI) ExecContainer(c *Container, cmd []string, stdin io.Reader, stdout, stderr io.WriteCloser, tty bool, resize <-chan remotecommand.TerminalSize) error {
      processFile, err := prepareProcessExec(c, cmd, tty)
      if err != nil {
        return err
      }
      defer os.RemoveAll(processFile.Name())
    
      args := []string{rootFlag, r.root, "exec"}
      args = append(args, "--process", processFile.Name(), c.ID())
      execCmd := exec.Command(r.path, args...)
      if v, found := os.LookupEnv("XDG_RUNTIME_DIR"); found {
        execCmd.Env = append(execCmd.Env, fmt.Sprintf("XDG_RUNTIME_DIR=%s", v))
      }
      var cmdErr, copyError error
      if tty {
        cmdErr = ttyCmd(execCmd, stdin, stdout, resize)
      } else {
        if stdin != nil {
          // Use an os.Pipe here as it returns true *os.File objects.
          // This way, if you run 'kubectl exec <pod> -i bash' (no tty) and type 'exit',
          // the call below to execCmd.Run() can unblock because its Stdin is the read half
          // of the pipe.
          r, w, err := os.Pipe()
          if err != nil {
            return err
          }
          go func() { _, copyError = pools.Copy(w, stdin) }()
    
          execCmd.Stdin = r
        }
        if stdout != nil {
          execCmd.Stdout = stdout
        }
        if stderr != nil {
          execCmd.Stderr = stderr
        }
    
        cmdErr = execCmd.Run()
      }
    
      if copyError != nil {
        return copyError
      }
      if exitErr, ok := cmdErr.(*exec.ExitError); ok {
        return &utilexec.ExitErrorWrapper{ExitError: exitErr}
      }
      return cmdErr
    }
    

container-runtime-to-kernel

Finally, kernel executes commands.

kernel-puts



Reminders

  • api-server can also initialize a connection to kubelet.
  • These connections persist until the interactive exec ends.
    • Connection between kubectl and api-server
    • Connection between api-server and kubelet
    • Connection between kubelet and container runtime
  • kubectl or api-server cannot run anything in the worker nodes. kubelet can run but it also interacts with container runtime for this kind of actions.



Resources