2021-12-15

Kubectl Exec Through HTTPS Tunnel and Reverse Proxy: one week of debugging for a one-line fix

Recently we had a project that required exposing a private Kubernetes apiserver to a publicly accessible endpoint. We build the solution based on FRP (Fast Reverse Proxy). Most of the kubectl commands worked fine until we tried kubectl exec, where it stuck for a moment and then timed out. We’ve previously worked on a similar project so we know there are some tricky parts when it comes to kubectl exec because the Kubernetes apiserver needs to upgrade the HTTP/1.x connection to the SPDY protocol, in order to stream commands back and forth between kubectl and the pod/container.

What we didn’t expect was that this could be a week-long debugging experience!
So we thought we’d blog about it :)

Overall architecture

The following diagram shows the overall architecture. FRP server has a publicly accessible endpoint, and the FRP client runs within the private network. The client will initiate and set up a TLS tunnel between the server. The FRP client uses an https2https plugin to call apiserver to make sure the apiserver certificates match the domain name.

When kubectl starts exec, apiserver will try to upgrade the HTTP/1.x connection to SPDY, which is a deprecated binary streaming protocol. The upgrade will be done across all the HTTP/1.x connections.


48 image 1


Issue: kubectl exec/attach/port-forward doesn’t work

What we saw was, kubectl exec either stuck without printing anything, then timed out, or errored out immediately with some io error. Kubectl port-forward and attach have similar symptoms.

More verbose logging shows that, after making an HTTP POST /exec call to the apiserver, kubectl receives the HTTP response with the correct status 101 switching protocol to SPDY, together with correct response headers. However, after receiving the upgrade response, there is no more progress, no more verbose logging, then it either stuck, then timed out or immediately errored out.

Example verbose logging with port-forwarding like below:

$ kubectl port-forward svc/wd-wordpress 8080:80 -v=8 POST https://kubernetes.default.svc:6443/api/v1/namespaces/default/pods/wd-wordpress-5ccf7d4c7d-r998n/portforward Request Headers: User-Agent: kubectl/v1.18.8 (darwin/amd64) kubernetes/9f2892a X-Stream-Protocol-Version: portforward.k8s.io Response Status: 101 Switching Protocols in 50 milliseconds Response Headers: Content-Length: 0 Connection: Upgrade Date: Fri, 01 Oct 2021 00:05:21 GMT Upgrade: SPDY/3.1 X-Stream-Protocol-Version: portforward.k8s.io Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080 E0930 17:05:21.891128 52575 portforward.go:233] lost connection to pod

Question 1: Where is it stuck?

The first question is, between kubectl and the FRP server, where does it actually get stuck? Is it that FRP server is waiting for kubectl, or kubectl is waiting for FRP server?

Because the stuck happens after the protocol is upgraded to SPDY, the normal HTTP verbose logs won’t help. So we built a customized kubectl binary with more debugging logs. We also enable the underlying SPDY streaming library debugging log by adding DEBUG=true to environment variables.

With this, we were able to figure out that, after kubectl receives the successful HTTP upgrade response, it tries to create a couple of streams on top of the upgraded connection. After it created the stream, it waits for a response, and that is where it stucks, it does not get the response from the FRP server. We can see the source code from here.

// CreateStream creates a new stream with the specified headers and registers // it with the connection. func (c \*connection) CreateStream(headers http.Header) (httpstream.Stream, error) { stream, err := c.conn.CreateStream(headers, nil, false) if err != nil { return nil, err } if err = stream.WaitTimeout(createStreamResponseTimeout); err != nil { return nil, err } c.registerStream(stream) return stream, nil }

Question 2: Why is the FRP server not responding to kubectl?

Now we know that kubectl sends out the stream creation request, but it doesn't get a response. So the next question is, is it that the FRP server does not receive the request, or that the FRP server can not get a response from the backend so can’t respond to kubectl?

With more debugging on both FRP server and client, we put our focus on the https2https plugin within the FRP client. The plugin is nothing but yet another reverse proxy. It is used to start an HTTPS connection with valid certificates matching the Kubernetes apiserver endpoint domain name.

The plugin uses the default httputil.ReverseProxy does not have the capability to add connection-level debug logging. So we make a customized version of the ReverseProxy with more debugging logs. With this, we were able to pinpoint the error that, after the reverse proxy finishes the upgrade of the connection to SPDY when it tries to set up the pipe between the FRP server and the apiserver, it can not read from the FRP server in spc.copyToBackend(). The read action immediately returned with an error “i/o deadline reached”.

ReverseProxy source code is here.

func (p *ReverseProxy) handleUpgradeResponse(rw http.ResponseWriter, req *http.Request, res \*http.Response) { conn, brw, err := hj.Hijack() …… spc := switchProtocolCopier{user: conn, backend: backConn} go spc.copyToBackend(errc) go spc.copyFromBackend(errc) <-errc return }

Question 3: Can we clear the deadline on the connection?

The error message we got is ‘i/o deadline reached’, and it happens during the HTTP connection upgrade. To upgrade from HTTP/1.x to SPDY, the caller needs to hijack the underlying connection. httpserver.Hijacker() comments below:

type Hijacker interface { // Hijack lets the caller take over the connection. // After a call to Hijack the HTTP server library // will not do anything else with the connection.// The returned net.Conn may have read or write deadlines // already set, depending on the configuration of the // Server. It is the caller’s responsibility to set // or clear those deadlines as needed. }

With this specific comment ‘net.Conn may have read or write deadlines already set’, together with the error we got, we very naturally think that maybe the connection already has a deadline set somewhere before the hijack, so let’s just clear or increase the deadline.

Unfortunately, after changing all the deadline-related code in both FRP server and client, we were still not able to get past this error. So we’ll have to figure out how this deadline is set on the connection.

Question 4: Who sets the deadline?

With the goal in mind to find out where the deadline was set, we found out that the error ‘io deadline reached’ comes from the yamux package, which FRP uses as the underline streaming library. The read timeout is returned when the deadline is set on the connection. Source code can be seen here.

// Read is used to read from the stream func (s \*Stream) Read(b []byte) (n int, err error) {var timeout <-chan time.Time var timer \*time.Timer readDeadline := s.readDeadline.Load().(time.Time) if !readDeadline.IsZero() { delay := readDeadline.Sub(time.Now()) timer = time.NewTimer(delay) timeout = timer.C } select {case <-timeout: return 0, ErrTimeout }

One thing we noticed is, the real deadline set on the stream was a very old timestamp, it’s not the normal timeout that applications usually set for read/write operations, so we think the deadline has to come from the http stack itself, not from the application (FRP).

With that in mind, we found the following code, during the hijack, in order to let the caller take full control of the connection, http.server needs to abort all the currently ongoing readings. The way to do that is to set this aLongTimeAgo as the read deadline, then wait for all current readings to timeout.

func (cr \*connReader) abortPendingRead() { cr.lock() defer cr.unlock() if !cr.inRead { return } cr.aborted = true cr.conn.rwc.SetReadDeadline(aLongTimeAgo) for cr.inRead { cr.cond.Wait() } cr.conn.rwc.SetReadDeadline(time.Time{}) }

What should happen is, after the current reading operation timeout and broadcast back on the signal, this wait() will return successfully. Then it should immediately clear the readDeadline by resetting it with a zero value Time{}, so all future read operations should continue after the hijack.

But the behavior we saw is, the clear readDeadline does not have any effect, all following reads still get the “i/o deadline reached” error.

Question 5: Where is the “io deadline error” returned to?

Now we know it’s hijack sets the deadline, and the clear deadline does not take effect, we need to figure out why the reset deadline can not clear the error, and who is reading the data and triggers the error.

More debugging leads us to the http.Serve(). Before calling handler.ServerHTTP(), it starts a go routing to do background reading, which reads from the connection and triggers the error when the aLongTimeAgo deadline is set by hijack. That still makes sense.

But the confusion is, it actually ignored the error with very clear comments for the abort pending read case, so why does the error still pop up to the caller?

func (cr \*connReader) backgroundRead() { n, err := cr.conn.rwc.Read(cr.byteBuf[:])if ne, ok := err.(net.Error); ok && cr.aborted && ne.Timeout() { // Ignore this error. It’s the expected error from // another goroutine calling abortPendingRead. } else if err != nil { cr.handleReadError(err) } cr.aborted = false cr.inRead = false cr.unlock() cr.cond.Broadcast() }

The fact that we only get the error after Hijack finish successfully, not within Hijack, lets us think that this error must be persisted somewhere, so even though it’s ignored during Hijack because the cr.aborted is true, but since cr.aborted is reset to false after hijack, later reads will trigger the error but won’t be ignored.

Question 6: Where is the error persisting?

With that in mind, we get down to the cr.conn.rwc.Read() code to figure out where the error is persisting. This leads us to the TLS package since all connections are TLS encrypted. When the TLS connection gets an error when reading, if it’s net.Error and the error is not Temporary, it is persisted into the TLS halfConn object, so all subsequent reads from the connection will immediately return with error. Source code is here.

Now we know that the only case this error persists is when the error is a net.Error and it is not a temporary error. We just need to get back to the original error definition to see if that’s the case.

// readRecordOrCCS reads one or more TLS records from the connection and // updates the record layer state. func (c \*Conn) readRecordOrCCS(expectChangeCipherSpec bool) error { if c.in.err != nil { return c.in.err }// Read header, payload. if err := c.readFromUntil(c.conn, recordHeaderLen); err != nil {if e, ok := err.(net.Error); !ok || !e.Temporary() { c.in.setErrorLocked(err) } return err } } func (hc \*halfConn) setErrorLocked(err error) error { if e, ok := err.(net.Error); ok { hc.err = &permanentError{err: e} } else { hc.err = err } return hc.err }

Answer: The one-line fix

Following is the error definition in yamux:

// ErrTimeout is used when we reach an IO deadline ErrTimeout = &NetError{ err: fmt.Errorf(“i/o deadline reached”), // Error should meet net.Error interface for timeouts for compatability // with standard library expectations, such as http servers. timeout: true, }

Now everything starts to make sense. Because ErrTimeout is not temporarily set to true, TLS connection thinks this is a permanent error, and persists that error into the connection. Even though Hijack succeeds because the abort flag was set to true, all subsequent reads will return this error persisting deep into the stack!

As you can imagine, the fix is just a one-line change, set temporary to true within the error definition.

// ErrTimeout is used when we reach an IO deadline ErrTimeout = &NetError{ … temporary: true, }

With this, kubectl exec/attach/portforward now works like a charm!

Conclusion

It was a… fun experience going through the golang net/http source code to identify and fix the puzzling issue with a simple change. The above questions look logical afterward, but because the net/http/connection stack was complicated, and the fact that there are too many moving parts/components, it was pretty easy to get lost during the debugging process not knowing which way to go. We added quite some debugging log to the kubectl/FRP/ReverseProxy, together with IDE debugger to pinpoint the issue. Hopefully, this can help you have a better understanding of how kubectl exec works, and the internals of golang net/http package. As always, if you have any questions, feel free to reach out to us — we love a good Kubernetes challenge!