u/JulietSecurity

CVE-2026-46333 in Kubernetes: unset seccomp let pods reach pidfd_getfd, RuntimeDefault blocked it

CVE-2026-46333 is the Linux __ptrace_may_access() bug Qualys disclosed on May 15. Most of the public discussion I saw centered on ssh-keysign-pwn, but for Kubernetes the more interesting part was the underlying pidfd_getfd path.

Pods share the node kernel, and a pod image can contain its own setuid helper. So I wanted to know whether a normal workload could reach the same fd-duplication primitive without host namespaces, hostPath, or touching node files.

We built a deliberately boring target for this: a root-owned disposable file inside the test image, a non-root process, and a purpose-built setuid helper in the same image. The attacker could not open the file directly. The helper opened it while privileged, dropped back to the non-root UID, and exited. The test was whether the attacker could race pidfd_getfd and duplicate that helper's fd after the vulnerable exit state.

We tested local Docker, local kind, EKS Auto Mode on Bottlerocket, and a private mixed-node lab cluster.

The short version:

  • EKS Auto Mode / Bottlerocket reproduced controlled fd theft on all 4 tested nodes when seccomp was unset.
  • Explicit seccompProfile.type: Unconfined reproduced the same result in the EKS lab.
  • RuntimeDefault blocked pidfd_getfd.
  • PSS Restricted blocked pidfd_getfd and also stopped the setuid helper from opening the file because allowPrivilegeEscalation: false set NoNewPrivs.
  • PSS Baseline blocked explicit Unconfined and hostPID, but did not fix unset seccomp. The Baseline + unset-seccomp case still reproduced controlled fd theft.
  • Local kind matched the same broad pattern.
  • A Debian arm64 worker in our private lab reproduced the unset/Unconfined result.
  • Our Talos worker blocked normal pods with effective Seccomp: 2; even in a deliberately unconfined lab namespace, it did not reproduce in 500 attempts, which appears consistent with ptrace_scope=2 adding another gate.

The part I would not gloss over is Baseline. It can reject some obvious knobs, like explicit Unconfined and hostPID, while still allowing a pod where seccomp is simply missing. In this test, that missing field was the difference between pidfd_getfd being reachable and blocked.

For clusters, I would check:

  • workloads where effective seccomp is unset or Unconfined
  • containers where allowPrivilegeEscalation is true or unset
  • namespaces that rely on PSS Baseline for untrusted workloads
  • node image/kernel status for CVE-2026-46333
  • whether your runtime's default seccomp profile actually denies pidfd_getfd
  • whether node-level ptrace hardening, especially Yama ptrace_scope, is present

The mitigation stack is the boring one: patch affected node kernels, enforce RuntimeDefault or a tested Localhost seccomp profile for untrusted workloads, use Restricted where workloads tolerate it, set allowPrivilegeEscalation: false, drop capabilities, and keep CI/build/plugin/customer-controlled workloads away from sensitive workloads.

This does not prove host root, container escape, Kubernetes Secret access, node persistence, or theft of real host files. It also does not mean every EKS/Bottlerocket node is exploitable or every Talos cluster blocks this path. The point is narrower: in our EKS and kind labs, ordinary unset-seccomp pods could reproduce controlled fd theft; RuntimeDefault and Restricted changed the outcome.

We are not publishing exploit code, lab source, or reproduction commands. The writeup has the full matrix and defensive checks:

https://juliet.sh/blog/cve-2026-46333-kubernetes-eks-bottlerocket-seccomp-pidfd

reddit.com
u/JulietSecurity — 5 days ago

Copy Fail is the recent Linux kernel issue involving AF_ALG, the kernel crypto socket interface, and page-cache-backed file data. The short version: it is kernel attack surface reachable through a syscall path, not an application dependency inside an image.

That matters for Kubernetes because pods share the host kernel. If a node kernel is affected, the question is not just "is my container image vulnerable?" It is "can a workload on this node reach the vulnerable kernel interface?"

The specific Kubernetes question I wanted to answer was:

if a pod is running with common hardening like PSS Restricted and RuntimeDefault seccomp, is the relevant kernel interface still reachable from inside the pod?

In our Talos and EKS lab clusters, the answer was yes. RuntimeDefault did not deny socket(AF_ALG, ...).

That does not mean "every pod is an instant host-root shell." It means the default Kubernetes hardening most people reach for does not remove this kernel attack surface. If the node kernel is affected, a non-root pod can still reach AF_ALG unless you patch the kernel or apply a seccomp profile that explicitly blocks it.

What we found from the Kubernetes side:

  • RuntimeDefault seccomp did not block AF_ALG in our Talos or EKS lab tests
  • PSS Restricted does not require blocking AF_ALG
  • runAsNonRoot does not matter much for this specific question, because the syscall path is reachable before you get to normal user/group assumptions
  • image scanning is not the right primary control for this class of issue
  • file-integrity monitoring is also not the right primary control, because the interesting behavior is page-cache mutation rather than a normal modified file on disk

What I would check in a cluster:

  • which nodes are running kernels affected by CVE-2026-31431
  • which pods are scheduled on those nodes
  • whether those pods are using RuntimeDefault, Unconfined, or a Localhost seccomp profile
  • whether any Localhost seccomp profile actually denies socket(AF_ALG, ...)

Mitigations:

  • patch node kernels when your distro ships the fix
  • if patching is delayed, use a Localhost seccomp profile that explicitly denies AF_ALG
  • do not assume RuntimeDefault blocks this unless you have checked the actual runtime profile on your node OS
  • treat "affected kernel + pod can create AF_ALG sockets" as an exposure signal worth inventorying

We are not publishing exploit code or exploit steps. The writeup is focused on the Kubernetes validation and defensive checks:

Full Write Up: https://juliet.sh/blog/we-tested-copy-fail-in-kubernetes-pss-restricted-runtime-default-af-alg

Disclosure: I work on Juliet, a Kubernetes security vendor.

u/JulietSecurity — 23 days ago