r/kubernetes

How to wait for manifest to be ready before continuing in a script?

Let's say I'm doing the following in a shell script, as example I use Argo CD:

kubectl create namespace argocd

kubectl apply -n argocd --server-side --force-conflicts -f https://raw.githubusercontent.com/argoproj/argo-cd/v3.4.4/manifests/install.yaml

Now before continuing in my script, I want to make sure that everything from Argo CD is running. I've seen `kubectl wait`, but when waiting for ready on the complete manifest the issue is that e.g. service accounts have no ready state so this fails. What is the best practice to wait for everything from the manifest to be installed and running or created? Or is there something that is logically last so I can wait just for that last thing, like e.g. if deployment is ready than this implies that all service accounts were already created?

reddit.com

u/sorry_no_idea — 13 hours ago

▲ 2 r/kubernetes

Pointing 443 and 22 to the same app

Trying to setup gitea without a bunch of hard coding or hacky work arounds. Gitea helm just says

If you're using ingress and want to use SSH, keep in mind, that ingress is not able to forward SSH Ports. You will need a LoadBalancer like metallb and a setting in your ssh service annotations.

service:
  ssh:
    annotations:
      metallb.universe.tf/allow-shared-ip: test

I'm not even sure where that is suppose to go. I get that shared ip is suppose to make sure 22 can live on the same IP as 443 but can't manage any setup that ends up on the same IP. I even tried asking AI and it just keeps running in circles suggesting unhinged solutions. I know I have to be missing something simple after going down some rabbit holes

reddit.com

u/permalink_save — 23 hours ago

▲ 0 r/kubernetes

what is Envoy to an Apigee developer?

Hello Folks,

recently one of our client requested loadbalancing for microservices, our organization is plannign to implement Envoy and I am new to it, how I should start to study about it? I am mere amount of knowledge about cloud architecture and services. and know that NGINX is load balancer.

reddit.com

u/Hey_AbhishekHere — 1 day ago

▲ 11 r/kubernetes+1 crossposts

Isola: Secure Sandboxing for Kubernetes

Hi Reddit, I'd like to share with you all a passion project of mine: Isola. It is an open source and cloud agnostic way to sandbox untrusted or LLM/agent generated code inside existing Kubernetes clusters.

It is written in Go, with a REST api + python and typescript SDKs. It allows you to programatically create sandbox pods (isolated with gVisor's userspace kernel), snapshot and restore the sandbox filesystem (to allow init-once user-many-times or sandbox state rollback semantics), advanced networking controls and more.

Install with Helm anywhere (including easy local setup over something like kind or k3s).

I am very happy to discuss the architecture and implementation details here, spent a lot of time on getting it just right (in my opinion) - upstreamed contributions to gVisors to make some features I wanted work, or iterated a lot until I was able to have the snapshots lazily loaded from bucket storage instead of filling up nodes (and thanks for rclone for that).

Hope you like it!

u/benldrmn — 2 days ago

▲ 0 r/kubernetes

Anyone worked at as a Software Engineer?

Hi everyone,

I recently received a job offer for a position titled Software Engineer – Install and Deploy Applications, and I’m trying to better understand what this role is actually like.

From the title, it seems this may be more related to deployment, delivery, DevOps, or support rather than traditional software development, but I’m not sure.

I’d appreciate insight from anyone who has worked in a similar role.

I have a few questions:

What are the actual day-to-day responsibilities?
How much of the job is software development vs installation/configuration/troubleshooting?
Is this role closer to Software Engineering, DevOps, System Administration, or Technical Support?
What technologies/tools are commonly used (Linux, scripting, cloud, Kubernetes, databases, etc.)?

Any honest experiences or advice would be really helpful.

Thanks

reddit.com

u/Acceptable_Look_4870 — 2 days ago

▲ 205 r/kubernetes

Wow... so PodDisruptionBudget (PDB) is exactly what I've been looking for.

Whenever I upgraded my Kubernetes cluster, my goal is to

Keep my application available.

My usual approach was to increase the replica count, wait for the new pods to become Ready, then carefully drain one node at a time while watching the deployment.

It worked...

But it also relied on me getting every step right.

After discovering PodDisruptionBudget (PDB).

Instead of relying solely on my upgrade process, I can now declare my availability requirement to Kubernetes itself.

Now Kubernetes knows that at least one pod must remain available during voluntary disruptions like node drains or cluster upgrades.

It's funny how some Kubernetes resources don't really make sense until you're building something people will actually use.

One thing I enjoy about building real client projects is that they constantly challenge the way I solve problems. There's always a better Kubernetes pattern waiting to be discovered.

u/Defiant-Chard-2023 — 3 days ago

▲ 2 r/kubernetes+3 crossposts

Self-hosted K8s operator that proves your AI agents never phoned home (open source)

Been running AI agents on my own cluster and kept hitting the same problem: once a run finishes, how do you actually prove, later, that the agent stayed inside the network boundary you set? Logs can be tampered with, and most agent frameworks just trust you configured things right.

Built a small operator that applies default-deny egress per agent workload, seals the run at the network boundary, and emits a signed, hash-chained attestation artifact you can verify offline, even months later, even air-gapped. Apache 2.0 core, gVisor isolation, kagent-compatible if you already run that.

Repo: github.com/Clawdlinux/agentic-operator-core

Curious if this is a real problem for anyone else running agents at home or on-prem, or if I'm solving something nobody else worries about.

u/Useful_Journalist — 2 days ago

▲ 14 r/kubernetes

Kubernetes Kustomize question regarding coding (Go)

Hi,

I have a question regarding coding in Go for Kubernetes automation. It's about teasing in Kubebuilder and Go, basically, and I have the problem that I'm looking for the info object about a resource.

I have the code

typedObj, err := scheme.Scheme.ConvertToVersion(info.Object, info.Mapping.GroupVersionKind.GroupVersion())

which should give me the typed object of a resource, but I have looked around the Krusty repo/project for kustomize, but I don't find any possibility how to extract info

Any ideas or hints?

How in the Kubernetes ecosystem can I access info?

I have the baseline code

kOpts := krusty.MakeDefaultOptions()
kOpts.PluginConfig = kustomize_types.EnabledPluginConfig(kustomize_types.BploUndefined)
kOpts.PluginConfig.HelmConfig.Command = "helm"
k := krusty.MakeKustomizer(kOpts)
m, err := k.Run(filesys.FileSystemOrOnDisk{}, filepath.Join(paths...)) // type ResMap, type error

I want to read CRDs from the filesystem, so I need the m object for k.Run().

Any ideas? Anyone at least knows how I can extract the info from the Kubernetes ecosystem by some functionality with some key?

reddit.com

u/Electronic_Bad_2046 — 3 days ago

▲ 96 r/kubernetes+1 crossposts

How would you predict when a GitOps hub becomes the bottleneck?

I wrote an honest reflection on trying to predict the scaling limits of GitOps fleet management with a Hub-and-Spoke architecture. Warning: it turned into a long blog post, around 45 minutes read time, based on 3–4+ months of intense learnings, 31 iterations, scale tests, wrong assumptions, long nights, weekend work, and support from the community and cloud providers.

TL;DR: We tested GitOps fleet management with Argo CD, vCluster, kubara, and Sveltos. In our setup, Argo CD’s application controller started hitting OOM kills around 15k–20k cached objects per hub. Hydrated manifests helped, tuning helped only partially, and Sveltos handled addon-style rollout patterns at a fraction of the memory: ~2 GB vs. ~21 GB for Argo CD. The main lesson: at very large scale, architecture matters more than tuning.

Not a benchmark claim, not “tool X beats tool Y”. Just sharing what we saw and learned, and why combining GitOps engines can be a real multiplier for what you can achieve with Open Source.

Blog post: https://medium.com/itnext/gitops-for-15-000-clusters-what-large-scale-testing-with-vcluster-taught-us-41e4b0d43e0b

u/Odd-Welcome3466 — 3 days ago

▲ 5 r/kubernetes

why ReadWriteOncePod access mode is only supported for CSI volumes?

Hi, I knowledge that CSI mean Container Storage Interface which is like an API that I can create the driver of my own and connect with it. But to do that it's like java interface in OOP which means there are rules I must follow to write driver for CSI.

And also in previous days by reading the doc I knowledge that driver plugins are build inside K8s Core code so that's where CSI come.

I knowledge that ReadWriteOncePod make sure that one specific pod can only read the volume but why it must be CSI volumes?

Thank you so much for your time to answer my question.

reddit.com

u/William_Myint_01 — 2 days ago

▲ 17 r/kubernetes

MiniPC + K3s - Hosting K8s Labs for friends. Suggestions Appreciated

Hi r/kubernetes,

Hope you are all doing well. I recently set up a mini PC with k3s and wanted to use it for something beyond the usual homelab services. I maintain Yellow Olive, a terminal-based game for learning Kubernetes locally with minikube.

I started experimenting with a hosted variant: a small number of users sign in, each receives an isolated namespace, and works through a challenge using kubectl in the browser-for example, debugging a pod that fails to start.

The proof of concept is running on my homelab. I’m less confident about the multi-tenant security model and would appreciate feedback from others who’ve run similar setups.

How it works

User signs in with GitHub → assigned a lab seat (max 7) and a namespace ({login}-{github-id})
Start session - the API (with admin kubeconfig) applies namespace, ResourceQuota, NetworkPolicy, RBAC, and a challenge manifest
A ServiceAccount token is issued; a limited kubeconfig is stored server-side only
The browser terminal runs kubectl via subprocess using that kubeconfig
Check challenge - the platform validates the workload (e.g. pod is Running/Ready)

Admin credentials are used for bootstrap and validation. Players never receive cluster-admin access.

Isolation (three layers)

Cluster: ResourceQuota per namespace (CPU/memory caps, object limits), NetworkPolicy restricting traffic to within the namespace
RBAC: Role scoped to pods only (get/list/watch/create/update/patch/delete); ServiceAccount player bound to that Role
Application: Terminal accepts kubectl only, forces namespace server-side, strips flags like -n, --kubeconfig, --as, and blocks shell metacharacters

Feedback and Suggestion appreciated

Credential model - I’m using ServiceAccount tokens and keeping kubeconfig files on the server rather than issuing them to clients. For sessions of roughly an hour, does that match how you’d approach it, or is there a better pattern?

Namespace lifecycle - I haven’t settled on teardown yet: delete on logout, expire after a TTL, or clean up manually. What has worked in practice?

Capacity - Everything runs on one k3s node today (~7 namespaces, mostly single-pod challenges). Is that a reasonable long-term setup for a homelab, or a bottleneck waiting to happen. If it helps, my home lab PC has 16 gigs of memory.

In case, you want to check out the code, it's in my repository . Would really appreciate if you can star the repo for better reach :)

Project Yellow Olive on Github ( Hosted Labs )

TIA !

u/Content_Ad_4153 — 2 days ago

▲ 5 r/kubernetes

EnvoyProxy config on GatewayClass or Gateway in Envoy-Gateway deployment

I've just started migrating from Ingress to Gateway API. And have chosen Envoy-Gateway controller implementation.

Everything is more or less clear and quite simple. Issue is that now there are dozens of objects/resources I have to create and figure out "once" as practical placements as possible. It is about choise paralyses.

TLDR: Where is better to apply custom EnvoyProxy config, GatewayClass or Gateway?

Simple desire: I want to set better names for Gateway's envoy pods and services. (in addition to other config)

I do not plan to have multiple Gateways attached to a single GatewayClass. I plan to have 1:1 several GatewayClasses with their own singular Gateways (similar to what I had ingress-nginx).

Since I'm thinking about 1:1 configuration, attaching EnvoyProxy on a GatewayClass seems to be and OK idea. But then, configuration will happen only on GatewayClass. And if, for some reason, I want to attach another gateway to the same GatewayClass, I will have to provide EnvoyProxy config for the Gateway anyway, otherwise at least names would clash with the first Gateway.

EnvoyProxy config for a Gateway by default completely overrides GatewayClass' config (Without MergeGateways. And I don't want be using that option to not overcomplicate things.

I think EnvoyProxy on a Gateway is winning.
But then I don't know what to set in a GatewayClass' config, and if I need to worry about it at all.

reddit.com

u/i_Den — 3 days ago

▲ 11 r/kubernetes+1 crossposts

Split-Brain LLM Serving Explained | Prefill/Decode Disaggregation with llm-d

youtu.be

u/mostaptname — 3 days ago

▲ 0 r/kubernetes

I couldn't tell what an AI agent was allowed to do without reading its code, so I built a Dockerfile-shaped way to declare it

Here's the gap that's been bugging me: everyone's shipping AI agents, but I can't answer a basic question about any of them — what model does it use, what network can it reach, what tools can it call? — without reading the implementation. We govern containers with manifests and labels; agents are just… vibes and a Python file. Security can't review them; platforms can't enforce anything.

So I've been building **agentrc** — an open spec + small CLI to make that reviewable. You declare an agent in a Dockerfile-shaped **Agentfile**:

```

# syntax=agentrc.agentfile/v0.1

FROM python:3.11-slim

IDENTITY name=support-bot version=1.0

CAPABILITY text

SOP Answer billing questions. Escalate anything else.

COPY ./tools/lookup /mnt/tools/lookup

POLICY model.nameclaude-sonnet-4

POLICY network dns:api.stripe.com:443

POLICY agent.tool_timeout 30s

```

Four new keywords over normal Dockerfile syntax: `IDENTITY`, `CAPABILITY`, `SOP`, `POLICY`. Everything under `POLICY` is a **typed request** — not enforcement. The agent *asks*; the platform grants, narrows, or rejects it and enforces deny-by-default (the spec compiles requests to Cedar). The only egress that bot can be granted is `api.stripe.com:443`, and I can see that in one line instead of grepping code.

`arc build` compiles it to a normal **OCI image** with `ai.agentrc.*` labels — platforms read the labels, never the Agentfile, so it ships/signs/mirrors like any container. `arc run <ref> --backend local|bedrock|kubernetes --dry-run` translates the same artifact into that platform's deploy config.

**What this is NOT, so nobody's surprised:**

- Working Draft (0.1.0-draft.6) — expect breaking changes.

- Not a runtime, cloud, model provider, or framework. The backend translators are a **proof of concept** that the labels are sufficient — not production infra.

- Secrets are deliberately out of scope for now.

Try it: `curl -fsSL https://agentrc.ai/install.sh | sh` (or `brew` / `go install`). Spec: https://agentrc.ai · Code: https://github.com/adeelahmad/agentrc

Real questions I want critique on: does the four-keyword split hold up? Is "requests, not enforcement" the right boundary? What would make you comfortable running an agent you didn't write?

reddit.com

u/adeelahmadch — 3 days ago

▲ 0 r/kubernetes

Learn K8s with FlashCards

I’m building a free Android flashcard app for learning Kubernetes (concepts, commands, interview prep).

I need 12+ testers for Google Play’s 14‑day closed test.

If you join:

You get early access to a free learning tool.

You can help shape content (tell me which topics/cards are missing).

Join the group

https://groups.google.com/g/skilltesters

Install from Play on your Android device.

https://play.google.com/store/apps/details?id=app.flashcards.kubernetes

join on the web

https://play.google.com/apps/testing/app.flashcards.kubernetes

r/kubernetes, r/devops, r/learncode, r/AWS

reddit.com

u/LearnSkills5 — 3 days ago

▲ 127 r/kubernetes+5 crossposts

wrtK8s

youtu.be

u/xrothgarx — 5 days ago

▲ 26 r/kubernetes

How to route between clusters

We’re trying to design the following: two clusters in two different regions. Each cluster runs the same applications. Each cluster would have a “global proxy” service. If a request comes in through region A cluster but it’s for region B cluster (imagine global proxy in region B is down)….region A proxy needs to be able to send the request directly to the intended service in region B. Essentially arbitrary pod-to-pod (or service to service) traffic between two clusters. How is it normally done? Thank you

reddit.com

u/Electrical-Room4405 — 5 days ago

▲ 77 r/kubernetes

why does your company use Kubernetes?

Hey , I am just learning on Kubernetes and was wondering why your companies have chosen to run Kubernetes over other containerized solutions (like AWS Fargate)

reddit.com

u/dekonta — 6 days ago

▲ 8 r/kubernetes

Weekly: Show off your new tools and projects thread

Share any new Kubernetes tools, UIs, or related projects!

reddit.com

u/AutoModerator — 5 days ago

▲ 43 r/kubernetes

Loaded Crossplane's full doc set into MiniMax M3's 1M context window to speed up our evaluation

Been evaluating Crossplane for 8 weeks. Team lead got sold on it at KubeCon, came back like "we're moving off Terraform by Q1." Nobody on the team has touched Crossplane before and guess who gets to do the evaluation because I "have the most Kubernetes experience." Thanks.

The doc situation is brutal. Official docs, three provider doc sets, composition examples scattered across GitHub, XRD references that contradict the tutorials half the time. Spent an entire Saturday on composition patterns and by 4pm I was LESS confident than when I started. Not a great sign.

Did something kind of lazy. Downloaded everything, all provider docs, maybe 50 composition yamls from examples repo. Converted to plaintext, roughly 650k tokens. Loaded it into MiniMax M3 (handles 1M natively) and started asking the questions I'd been going in circles on.

Nested compositions referencing outputs without a separate Claim. This had been driving me nuts. Answer was spread across three doc pages AND a Feburary GitHub discussion I somehow missed. It nailed it, pulled the right section from each source.

Also needed ammo against my teammate who keeps insisting GCP networking is "production ready." M3 flagged several CRDs as still v1beta1. Confirmed in the provider repo. Sent him the screenshot at 11pm on Sunday. Maybe petty but whatever.

Where it ate it: ArgoCD integration. Mixed up the official GitOps guide with a 2024 community blog post. Config snippet would've broken our sync flow. Caught it because namespace didn't match staging. If I hadn't checked that would've been a fun incident.

Whole "250 pages across 5 repos" problem feels universal for infra tooling evaluation though. Anyone found a less painful workflow?

u/AkiraOEM — 5 days ago