On-demand eBPF diagnostics with Podtrace
ToolingComments
I disagree that permissions are the primary hurdle. Most modern orchestration layers allow for temporary privilege escalation via specific controllers, so the bottleneck is more likely kernel version compatibility.
That means this could actually push teams to modernize their kernels... which would bring so many other performance wins... maybe this is the catalyst for finally moving off legacy nodes?
The claim about avoiding overhead forgets the attachment spike. Historically, triggering on-demand eBPF during a live incident has often pushed an already stressed node into a crash.
The spikes are a problem, but the bigger issue is usually the permissions. In a real production environment, the person seeing the error usually doesn't have the access required to trigger a kernel-level trace on the fly.
If we consider environments with strict kernel lockdown or Seccomp profiles, the ability to attach probes on the fly might be blocked by default. Would this tool require pre-approved privileges that effectively negate the on-demand flexibility in locked-down clusters?
Is this just a fancy way of admitting that sidecars were a mistake? If we can just use the kernel, why are we still pretending that the one agent per pod model is sustainable?
This mirrors the trade-off seen in serverless functions. You trade constant baseline overhead for unpredictable latency and resource spikes during the initialization phase.
This approach could lower the barrier for developers who need deep visibility but aren't eBPF experts. Removing the need for a complex sidecar rollout makes it easier to verify a hypothesis in production without a full deployment cycle.