Skip to content

fix: site-level handler overrides were ignored, org-level disable always won#1590

Open
tkotthakota-adobe wants to merge 3 commits intomainfrom
SITES-41916
Open

fix: site-level handler overrides were ignored, org-level disable always won#1590
tkotthakota-adobe wants to merge 3 commits intomainfrom
SITES-41916

Conversation

@tkotthakota-adobe
Copy link
Copy Markdown
Contributor

https://jira.corp.adobe.com/browse/SITES-41916

Problem:
Site-level handler overrides were ignored — org-level disable always won, so upgrading a site from
free-trial to paid couldn't re-enable audits. Also, enabling a handler could crash with a TypeError if
the disabled.sites array didn't exist.

  • isHandlerEnabledForSite now checks site-level before org-level — site enable overrides org disable
  • #updatedHandler cleans up both enabled/disabled lists to keep them consistent
  • Safe array access with || [] fallback to prevent TypeError on missing keys

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 6, 2026

This PR will trigger no release when merged.

if (disabledSites.includes(siteId)) {
return false;
}
if (enabledSites.includes(siteId)) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment by @solaris007
Important: This flips existing config rows where a site is in enabled.sites and the org is in disabled.orgs from disabled to enabled at deploy time. Run a query against prod to enumerate affected rows before merging.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have prod access @solaris007 What is the best way to do this query?

Copy link
Copy Markdown
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tkotthakota-adobe,

Strengths

  • The new precedence ladder in isHandlerEnabledForSite (configuration.model.js:186-207) reads cleanly top-to-bottom: site-disable > site-enable > org-disable > default > org-enable. Much easier to reason about than the old nested-if approach, and the local destructuring makes each check self-documenting.
  • #updatedHandler now keeps enabled and disabled lists mutually exclusive on every transition (configuration.model.js:249-265), closing a stale-state hazard where the same entity could end up in both lists.
  • The (handler.disabled[entityKey] || []).filter(...) pattern (configuration.model.js:250-251) fixes a real correctness issue - the old .filter(...) || [] was dead code since .filter() always returns an array, but the parent could still be undefined.
  • Previously requested test for headline behavior change (Thread 2) has been addressed - configuration.model.test.js:248-256 directly asserts site-override-org-disable.
  • No new dependencies, CI passing, well-scoped diff.

Issues

Important (Should Fix)

1. Write path does not deliver the site-override for enabledByDefault: true handlers (configuration.model.js:253)

#updatedHandler only adds the entity to enabled[entityKey] when !handler.enabledByDefault. For enabledByDefault: true handlers, calling enableHandlerForSite when the org is in disabled.orgs results in: site not added to enabled.sites -> isHandlerEnabledForSite falls through to disabledOrgs check -> returns false.

The read path claims "site enable overrides org disable" universally, but the write path only delivers it for non-default handlers. The test at line 248 passes because it uses addHandler to pre-populate enabled.sites directly, bypassing the public API.

This may not affect the immediate use case (paid-tier audits are typically enabledByDefault: false), but it's the same shape of bug this PR exists to fix - a latent inconsistency between what the config reads and what the API writes.

Fix: Always add to enabled[entityKey] when enabled === true, regardless of enabledByDefault. Remove the !handler.enabledByDefault guard at line 253.

2. Pre-merge gate: enumerate prod config rows affected by the behavior flip

Building on the open thread at line 197 - @solaris007 correctly flagged that any handler where (site in enabled.sites AND org in disabled.orgs) will flip from "audit off" to "audit on" at deploy time. The author replied "I do not have prod access."

Path forward: someone with prod read access fetches the S3 singleton config and enumerates the affected (handler, siteId, orgId) triples. This is a read-only operation on a single JSON blob - should take under 30 minutes. Based on the count:

  • Zero: merge with confidence.
  • Small (<10): list them in the PR description so on-call knows what to expect.
  • Large: stage the rollout or pre-clean stale enabled.sites entries.

This is not a code fix but a deploy-readiness gate. The PR should not merge until this is answered.

3. New cleanup branches in #updatedHandler lack direct tests (configuration.model.test.js)

The PR adds write-side logic that removes entities from the opposite list on every transition (lines 250-251, 261-262 of the model). The existing tests start with handlers where the entity is NOT in the opposite list, so the cleanup is a no-op and never exercised. No test fails if those .filter(...) lines are removed.

Fix: Add round-trip tests, plus a enabledByDefault: false override test (the current test at line 248 uses enabledByDefault: true only). See inline comment for examples.

Minor (Nice to Have)

4. isHandlerEnabledForSite and isHandlerEnabledForOrg now use different precedence models, undocumented

After this PR, isHandlerEnabledForSite applies site-then-org precedence while isHandlerEnabledForOrg still uses "any disabled wins." A config where org O is in disabled.orgs and site S (in O) is in enabled.sites will report isHandlerEnabledForOrg(O) = false while isHandlerEnabledForSite(S) = true. This is likely intentional but undocumented.

Fix: Add a brief JSDoc on both methods noting the divergence.

5. "Site in both enabled.sites and disabled.sites" conflict resolution is implicit

The new code makes disabled.sites win (checked first at line 194). #updatedHandler keeps these disjoint on writes, but legacy data or manual edits could violate this. A one-line comment ("assumes enabled.sites and disabled.sites are disjoint; see #updatedHandler") or a test pinning the conflict resolution would make the invariant explicit.

Recommendations

  • Add a DEBUG-level log line when the site-override fires (enabledSites.includes(siteId) && disabledOrgs.includes(orgId)) to give on-call a grep target for post-deploy validation.
  • Update the PR description to include a "Deployment impact" section noting the behavior change for existing config rows.
  • Consider adding a JSDoc block on isHandlerEnabledForSite enumerating the full precedence rules.

Assessment

Ready to merge? No, with fixes.

Reasoning: The read-path fix is clean and well-tested for the headline case, but the write path leaves the enabledByDefault: true override unreachable via the public API (finding 1), the new cleanup branches lack direct test coverage (finding 3), and the prod-state enumeration from Thread 1 remains unanswered (finding 2). Fixing findings 1 and 3 are small code changes; finding 2 is a read-only prod query that someone with access can run.

Next Steps

  1. Fix the write path in #updatedHandler to always add to enabled[entityKey] when enabling (finding 1).
  2. Add the missing tests: enabledByDefault: false override, round-trip cleanup assertions (finding 3).
  3. Resolve Thread 1 by running the prod config enumeration before merging (finding 2).
  4. Minor items (findings 4-5) are optional but improve future readability.

@tkotthakota-adobe
Copy link
Copy Markdown
Contributor Author

Hi @solaris007 thank you for the review. I addressed review comments.

Copy link
Copy Markdown
Member

@solaris007 solaris007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tkotthakota-adobe,

Thanks for the thorough iteration. All five prior findings + two of three recommendations cleanly addressed. Two Important test gaps remain and the prod-enumeration thread (Important issue 2 from the prior review) needs a reviewer-side action - that one is on me, not you.

Strengths

Previously flagged issues now addressed:

  • Important issue 1 (write path doesn't deliver site-override for enabledByDefault: true handlers): cleanly fixed at configuration.model.js:280-286. The if (!handler.enabledByDefault) guard is gone; enableHandlerForSite now reliably populates enabled.sites for both default and non-default handlers, making the override actually reachable via the public API. The new comment ("Always add to enabled so isHandlerEnabledForSite can find the site in enabled.sites and return true even when the org is in disabled.orgs (site-level override)") accurately describes the behavior.
  • Important issue 3 (cleanup branches lack direct tests): addressed via three new tests. Test 2 ("removes site from disabled.sites when re-enabling for a non-default handler") genuinely exercises the new write path - handler starts with disabled.sites: [siteId], after updateHandlerSites(..., true) both the cleanup branch AND the unconditional-add fix from Important issue 1 are exercised. Test 3 covers the symmetric case for enabledByDefault: true going through the else if branch.
  • Minor issue 4 (precedence divergence undocumented): JSDoc on both methods is accurate. I traced isHandlerEnabledForSite line-by-line against the 6-step ladder and it matches. JSDoc explicitly calls out the divergence between the two methods.
  • Minor issue 5 (implicit conflict resolution between enabled.sites/disabled.sites): JSDoc on isHandlerEnabledForSite documents both the disjointness assumption AND the invariant maintained by #updatedHandler. The "lists disjoint" assumption is now actively enforced by the writer rather than just assumed.
  • Recommendation 1 (DEBUG log on site-override fire): implemented at configuration.model.js:213-215. The log fires only when the override is actually triggered (site in enabled.sites AND org in disabled.orgs). Includes type, siteId, orgId for correlation.
  • Recommendation 3 (JSDoc precedence enumeration): addressed via the Minor issue 4 fix.

This iteration:

  • Write-path simplification reads cleanly. #updatedHandler no longer branches on enabledByDefault for the enable case; always-add-to-enabled / always-remove-from-disabled is easier to reason about and matches the new read path.
  • Asymmetric precedence model (site-overrides-org for isHandlerEnabledForSite; org-disable-always-wins for isHandlerEnabledForOrg) is now intentional and self-documented. The JSDoc contrasts the two methods explicitly. The asymmetry maps cleanly to how each method is used at its respective fan-out boundary (per-site vs per-org consumer in jobs-dispatcher reports).
  • The 6-step precedence table is now the de-facto contract. Future readers can cite it instead of reverse-engineering branching code.
  • Authorization model integrity is sound. (enabled.sites, disabled.orgs) configuration is written only by trusted admin paths in spacecat-api-service (admin x-api-key controllers, internal onboarding flows, Slack admin commands). End users cannot self-promote a site into enabled.sites. The "site overrides org" semantics are an explicit, policy-driven business rule (paid upgrade), not a privilege-escalation surface.
  • DEBUG log payload { type, siteId, orgId } contains only non-secret identifiers. No PII, tokens, or credentials.
  • Diff is tightly scoped to addressing prior findings; no drive-by changes.

Issues

Important (Should Fix)

  1. Headline production scenario has no end-to-end test. The PR exists to fix the free-trial -> paid upgrade path, but no test exercises the full flow: updateHandlerSites('paid-handler', siteId, true) on a state where the org is in disabled.orgs, then assert isHandlerEnabledForSite('paid-handler', site) is true. The new "paid-handler" test is read-path-only (uses addHandler to pre-populate, bypassing the write path); Test 2 exercises the write path but without org in disabled.orgs. Add one combining test - this is what tells future maintainers what the PR is for. See inline comment for the test shape.

  2. enabledByDefault: false disable branch is not directly tested. #updatedHandler now has three write paths: enable (any handler), disable with enabledByDefault: true, and disable with enabledByDefault: false (the else branch at configuration.model.js:292-294). The first two are covered; the third is not. With this PR's "always add to enabled on enable" change, the asymmetry of the third branch is load-bearing: a paid handler that gets enabled (now in enabled.sites) and then disabled must NOT linger as enabled. See inline comment for the test shape.

Recommendations

  • Prod-state enumeration (prior Important issue 2) - shifting to my side. Author correctly stated they lack prod access; this is a deployment-readiness gate, not a code fix. I will run the query against the prod Configuration singleton: pull the current S3 config, compute intersect(enabled.sites, sitesOf(disabled.orgs)) per handler, and report the count of (handler, siteId, orgId) triples that flip from disabled to enabled at deploy. If zero or small + expected, merge proceeds; if surprising, the team can decide whether to merge-as-is, pre-clean, or notify affected customers. Tracking this on the PR before approval.
  • DEBUG vs INFO for the override log. The author followed the prior review recommendation (DEBUG). Worth re-evaluating: DEBUG is typically filtered out in prod observability, so the post-deploy validation purpose may not work without changing log filters. For the first 30 days post-deploy, INFO would make the canary signal greppable by on-call without filter changes; afterwards demote to DEBUG. Reasonable counter-argument: keep DEBUG and document in the rollout runbook. Author's call - this reverses my prior recommendation, which I now think was suboptimal for the canary purpose.
  • Add "Deployment impact" section to PR description (recommendation 2 from prior review, still not addressed). One paragraph noting: configs where site-in-enabled-sites + org-in-disabled-orgs flip from disabled to enabled at deploy time, the prod-enumeration result once it is run, and the rollback path. The merge commit body should be accurate.
  • Document the rollback recovery path. If a customer reports unexpected audits-on after deploy: Path A (small blast radius) - clean up the prod config singleton (single S3 write, recovers immediately). Path B (large blast radius or unclear scope) - revert this PR in spacecat-shared, publish patch release, bump consumers (slow, multi-repo). Worth noting in the PR description or a follow-up runbook.
  • Future: precedence test matrix derived from JSDoc. The 6-row precedence ladder is now the de-facto contract. A small data-driven test where each row of the table is one assertion would lock the spec and the implementation together and prevent silent drift. Cheap to add, exactly the shape of test that does not regress when refactoring the body. Out of scope for this PR; worth a follow-up issue.

Minor

  1. Disjointness invariant lacks a pinning test. JSDoc says "disabled.sites and enabled.sites are assumed disjoint; disabled.sites wins if violated." A one-line test seeding both lists with the same siteId and asserting false would prevent silent precedence drift on legacy data.

  2. Idempotency of enable not pinned. Array.from(new Set([...])) deduplication is the only thing keeping enabled.sites from growing unbounded if updateHandlerSites(..., true) is called repeatedly. A single test asserting enabled.sites.length === 1 after two consecutive enables would lock this behavior.

  3. updateHandlerOrgs cleanup not directly tested. Same #updatedHandler code path as sites, but the read-side semantics for orgs are now formally asymmetric (no site-level override). One test verifying updateHandlerOrgs(type, orgId, true) followed by isHandlerEnabledForOrg(...) returns true after a prior disable would mirror the site-side assertion.

Assessment

Ready to merge? With fixes.

Reasoning: code-level work is clean. The write-path simplification reads well, the JSDoc captures the contract, and the cleanup branch tests address the prior review properly. Two Important test gaps remain - the headline end-to-end scenario and the enabledByDefault: false disable branch - both small additions. The prod-enumeration deployment-readiness gate (prior Important issue 2) is now reviewer-side; I will run the query and report results on the PR before approval.

Next Steps

  1. Add the two missing tests (Important issues 1 and 2). Both are small additions to the existing test file.
  2. Reviewer (me) runs the prod-state enumeration before approving. I will paste results into the PR.
  3. Decide on the DEBUG vs INFO log level question.
  4. Optional: update PR description with deployment impact and rollback path; add the Minor tests if you have appetite for completeness.

enabledByDefault: false,
enabled: { sites: [site.getId()], orgs: [] },
disabled: { sites: [], orgs: [site.getOrganizationId()] },
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important - the headline production scenario lacks an end-to-end test.

The PR exists to fix the free-trial -> paid upgrade path: org has handler X with enabledByDefault: false and is in disabled.orgs; we upgrade site S via updateHandlerSites('X', siteId, true); we expect isHandlerEnabledForSite('X', S) to return true afterwards. The new "paid-handler" test above is a READ-PATH-only assertion - it pre-populates state directly via addHandler and never calls updateHandlerSites. Test 2 below ("removes site from disabled.sites when re-enabling for a non-default handler") exercises the write path but with disabled.orgs: [], so the override-against-disabled-org case is not asserted post-write.

Without a test that combines write + read, a future regression that breaks updateHandlerSites for non-default handlers (e.g., someone re-introducing the if (!enabledByDefault) guard the prior review asked you to remove) would not be caught: the read-path "paid-handler" test would still pass because it bypasses the write path. Add one test:

it('upgrades non-default handler from disabled org via updateHandlerSites', () => {
  instance.addHandler('paid-handler', {
    enabledByDefault: false,
    enabled: { sites: [], orgs: [] },
    disabled: { sites: [], orgs: [site.getOrganizationId()] },
  });
  instance.updateHandlerSites('paid-handler', site.getId(), true);
  expect(instance.isHandlerEnabledForSite('paid-handler', site)).to.be.true;
});

This is the test that tells future maintainers what the PR is for.


instance.updateHandlerSites('default-cleanup-handler', site.getId(), false);

expect(instance.getHandler('default-cleanup-handler').enabled.sites).to.not.include(site.getId());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important - enabledByDefault: false disable branch is not directly tested.

#updatedHandler has three write paths after this PR's simplification:

  1. Enable: remove from disabled + add to enabled (any handler) - covered by Test 2 ("cleanup-handler")
  2. Disable + enabledByDefault: true: add to disabled + remove from enabled - covered by Test 3 ("default-cleanup-handler")
  3. Disable + enabledByDefault: false (the else branch at configuration.model.js:292-294): only remove from enabled, do NOT add to disabled - NOT covered.

With the new "always add to enabled on enable" change in this PR, the asymmetry of branch 3 is now load-bearing: a paid handler that gets enabled (now in enabled.sites) and then disabled must NOT linger in enabled.sites. Add at least:

it('removes site from enabled.sites when disabling for a non-default handler', () => {
  instance.addHandler('paid-disable-handler', {
    enabledByDefault: false,
    enabled: { sites: [site.getId()], orgs: [] },
    disabled: { sites: [], orgs: [] },
  });
  instance.updateHandlerSites('paid-disable-handler', site.getId(), false);
  expect(instance.getHandler('paid-disable-handler').enabled.sites).to.not.include(site.getId());
  expect(instance.getHandler('paid-disable-handler').disabled.sites).to.not.include(site.getId());
});

This pins the contract that disabling a non-default handler is "not in either list" (default-off applies), distinct from disabling a default handler ("in disabled list, default-on overridden").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants