Skip to content

Conversation

@ppapapetrou76
Copy link
Contributor

@ppapapetrou76 ppapapetrou76 commented Jan 26, 2026

The issue was that the CompareAppState function (specifically through the gitops-engine GetManagedLiveObjs cluster cache method) would attempt to fetch namespaced resources if the namespace did not exist in the cluster. This could result in Forbidden errors if the service account has no permission to access the non-existent namespace.

This PR addresses the above by checking if a namespace exists in the cluster's cache before attempting a live fetch.
If the namespace is missing and the resource is namespaced, the function now assumes the resource is also missing and skips the fetch.

I also added a UT that fails without the introduced code and succeeds with the changes.

closes #26076

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Title of the PR
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

… sync

Signed-off-by: Patroklos Papapetrou <ppapapetrou76@gmail.com>
@ppapapetrou76 ppapapetrou76 requested a review from a team as a code owner January 26, 2026 13:25
@bunnyshell
Copy link

bunnyshell bot commented Jan 26, 2026

🔴 Preview Environment stopped on Bunnyshell

See: Environment Details | Pipeline Logs

Available commands (reply to this comment):

  • 🔵 /bns:start to start the environment
  • 🚀 /bns:deploy to redeploy the environment
  • /bns:delete to remove the environment

@rumstead rumstead changed the title fix: chcek namespace existence before fetching namespaced resources during sync fix: check namespace existence before fetching namespaced resources during sync Jan 26, 2026
Copy link
Member

@blakepettersson blakepettersson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, but would be good to get the thoughts of e.g @leoluz

Copy link
Member

@reggie-k reggie-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the issue and why I think this fix resolves it:
The scenario in which Argo CD does not have RBAC to perform K8s operations on a resource, and relies on Kyverno to create the needed RBAC is a special scenario.
In this scenario, there is a logical deadlock Argo CD enters.
In order to create a ns during sync, the GetManagedLiveObjs must return successfully, but it does not since K8s API throws Forbidden error upon an attempt to get the resources in the (non-existing at this point) ns.
So the sync does not happen and the ns is not created. But in the user's scenario, the RBAC for Argo CD to create the other resources in that ns can only be provided AFTER the ns is created.

The core issue, in my understanding, is here:


When Argo CD has RBAC (regular cluster install with admin rolebinding/cluster rolebinding), the error upon getting a resource in a non-existent ns is not found, which is treated as an OK scenario, allowing sync, but when Argo CD has no RBAC, the error upon getting a resource in a non-existent ns is forbidden which is treated as an error scenario and results in the sync not happening.

The proposed fix checks for the ns existence in the cache, and if it does not exist, the execution path of performing a kubectl.GetResource is avoided, so the sync continues normally and the ns would be created in both cases (user described case with argo cd ns scoped installation with partial RBAC and argo cd cluster scope installation with full RBAC).

One thing to note here is that since Kyverno is external and it's creation of the RBAC that Argo CD requires to create the resources in this new ns is async, this fix alone is not enough and the user is recommended to configure sync retries in addition to using this fix, otherwise the ns would be created successfully but the resources in it may not, till the next explicit sync.

Copy link
Member

@reggie-k reggie-k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like my understanding was only partial, after further manual testing looks like there are additional execution paths that throw the forbidden error, not allowing the ns to be created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ArgoCD CompareAppState fails when creating namespace and namespaced resources in a single application with strict RBAC

3 participants