Telepresence with Google Cloud Kubernetes Engine (GKE)

Published on 15 May 2024
6 min read
google cloud
kubernetes
telepresence
Telepresence with Google Cloud Kubernetes Engine (GKE)

In my current project Qwiz’n’Buzz we are actively working on a discord integration as an Discord Activity. In sake of user protection, Discord uses a proxy as a middleman for requests to our services. Additionally, the Discord SDK relies on your application been integrated in a iframe provided by Discord. This brings challenges for fast local development processes.

To test the integration locally Discord suggests cloudflared to tunnel the local service to a public endpoint. Unless you are using a paid plan, the endpoint URL is ephemeral and changes between restarts. This requires you have to update the Discord Activity URL Mapping settings every time you restart the tunnel.

Telepresence

This is where I remembered Telepresence. Telepresence allows you to proxy a local development environment into a remote Kubernetes cluster. This enables you to test and debug services within the context of the full system without deploying the service to the cluster. This way, we can provision stable development domains and cluster infrastructure to iterate quickly on the Discord integration locally.

Telepresence brings two ways for redirecting traffic from a kubernetes service to your local machine. The first way replaces the service-backing pod with a Telepresence pod that forwards traffic to your local machine. The second pattern adds a sidecar container (traffic-agent) to the service-backing pod that forwards traffic to your local machine. The second pattern is the default behavior and is the one I will focusing in this post.

Telepresence installs the sidecar in the service-backing pod (e.g., provided by a Deployment) and renames the original port, while the sidecar provides the original port.

Google Kubernetes Engine (GKE) and Network Endpoint Groups (NEGs)

While I have used Telepresence in the past, I had some challenges using it with our Google Kubernetes Engine (GKE, a managed Kubernetes cluster), which I pinpointed to the Network Endpoint Groups (NEGs) Google Cloud offers for a performant and managed load balancing solution utilizing Google Cloud’s network infrastructure. NEGs require health checks to ensure that traffic is only routed to healthy pods. These aren’t optional, and their Kubernetes configuration is limited to HTTP, HTTPS, and HTTP/2. The ingress load balancer provided by NEGs are configured automatically by Google Cloud by scanning the relevant services and pods resources in GKE but can also be customized manually via the BackendConfig resource.

Without special considerations, this creates a chicken-and-egg problem. Telepresence replaces the service-backing pod with a sidecar container that forwards traffic to your local machine, but the traffic is not routed to the sidecar container as long as the NEGs health check fails. Since the NEGs health checks aren’t optional and TCP health checks are not supported, we need to find a way to satisfy the health checks while using Telepresence.

Strategy 1: Utilizing a Sidecar for Health Checks

One strategy to provide the NEGs health check with an additional sidecar. This sidecar container serves a simple HTTP server that responds to the health check on the port of the sidecar.

  1. Implement a Sidecar Container: Deploy a lightweight sidecar container alongside your main application container within the same pod. This sidecar serves a simple HTTP server that responds to the health check requests from the NEG.
yaml
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: my-app:latest
          ports:
            - containerPort: 80
              name: http
        - name: healthz
          image: nginx:latest
          # Assuming nginx listens on port 8080
          ports:
            - containerPort: 8080
              name: healthz
  1. Configure Health Checks: Point the NEG’s health check configuration to the port exposed by the sidecar. This ensures that the health check passes as long as the sidecar is running, regardless of whether Telepresence is currently intercepting the main service’s traffic.
yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: my-backend-config
spec:
  healthCheck:
    type: HTTP
    port: 8080
    requestPath: /
---
apiVersion: v1
kind: Service
metadata:
  name: my-app
  annotations:
    cloud.google.com/neg: '{"ingress": true}'
    cloud.google.com/app-protocols: '{"backend":"HTTP"}'
    cloud.google.com/backend-config: '{"default":"my-backend-config"}' # Reference to the BackendConfig
spec:
  type: ClusterIP
  selector:
    app: my-app
  ports:
    - protocol: TCP
      name: http
      port: 80
      targetPort: http

With the sidecar handling health checks, you can use Telepresence to intercept the main service’s traffic without affecting the pod’s health status in the eyes of the NEG.

Strategy 2: Dedicated Health Check Port on the Application

Another approach is to expose a dedicated health check port directly in the application you want to intercept. This method involves changes in the application code and can be set up as follows:

  1. Expose an Additional Port: Modify your service’s deployment to include an additional port that serves HTTP health checks. This port should be separate from the main service port. Minor code changes may be required to support the new health check port.
yaml
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
    containers:
      - name: my-app
        image: my-app:latest
        ports:
          - containerPort: 8080
            name: http
          - containerPort: 8081
            name: healthz
  1. Update Service and NEG Configuration: Adjust the service and NEG configuration to recognize the new port specifically for health checks.
yaml
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: my-backend-config
spec:
  healthCheck:
    type: HTTP
    port: 8081
    requestPath: /
---
# Service configuration as before

As long as you’re not using the replacement mode, Telepresence will not interfere with the health check port, and the NEG will continue to route traffic to the pod as long as the health check endpoint is healthy.

Benefits and Considerations

Both strategies ensure that the NEG’s requirements for health checks are met while providing flexibility in debugging and developing applications using Telepresence. However, each approach has its considerations:

  • Sidecar Approach: This method increases resource usage slightly due to the additional container but keeps the health check logic separate from the main application code.

  • Dedicated Port Approach: This method is simpler on the manifest side, avoids the additional resources required by an extra sidecar, but it requires modifications to the application code to support an additional HTTP server for health checks.

Conclusion

Now, we can utilize a custom, stable subdomain for our preview Discord activity in the Discord’s URL Mapping setting and intercept traffic at any time without any manual reconfiguration on the Discord side.