fix: site-level handler overrides were ignored, org-level disable always won#1590
fix: site-level handler overrides were ignored, org-level disable always won#1590tkotthakota-adobe wants to merge 3 commits intomainfrom
Conversation
|
This PR will trigger no release when merged. |
| if (disabledSites.includes(siteId)) { | ||
| return false; | ||
| } | ||
| if (enabledSites.includes(siteId)) { |
There was a problem hiding this comment.
Comment by @solaris007
Important: This flips existing config rows where a site is in enabled.sites and the org is in disabled.orgs from disabled to enabled at deploy time. Run a query against prod to enumerate affected rows before merging.
There was a problem hiding this comment.
I do not have prod access @solaris007 What is the best way to do this query?
solaris007
left a comment
There was a problem hiding this comment.
Hey @tkotthakota-adobe,
Strengths
- The new precedence ladder in
isHandlerEnabledForSite(configuration.model.js:186-207) reads cleanly top-to-bottom: site-disable > site-enable > org-disable > default > org-enable. Much easier to reason about than the old nested-if approach, and the local destructuring makes each check self-documenting. #updatedHandlernow keepsenabledanddisabledlists mutually exclusive on every transition (configuration.model.js:249-265), closing a stale-state hazard where the same entity could end up in both lists.- The
(handler.disabled[entityKey] || []).filter(...)pattern (configuration.model.js:250-251) fixes a real correctness issue - the old.filter(...) || []was dead code since.filter()always returns an array, but the parent could still beundefined. - Previously requested test for headline behavior change (Thread 2) has been addressed - configuration.model.test.js:248-256 directly asserts site-override-org-disable.
- No new dependencies, CI passing, well-scoped diff.
Issues
Important (Should Fix)
1. Write path does not deliver the site-override for enabledByDefault: true handlers (configuration.model.js:253)
#updatedHandler only adds the entity to enabled[entityKey] when !handler.enabledByDefault. For enabledByDefault: true handlers, calling enableHandlerForSite when the org is in disabled.orgs results in: site not added to enabled.sites -> isHandlerEnabledForSite falls through to disabledOrgs check -> returns false.
The read path claims "site enable overrides org disable" universally, but the write path only delivers it for non-default handlers. The test at line 248 passes because it uses addHandler to pre-populate enabled.sites directly, bypassing the public API.
This may not affect the immediate use case (paid-tier audits are typically enabledByDefault: false), but it's the same shape of bug this PR exists to fix - a latent inconsistency between what the config reads and what the API writes.
Fix: Always add to enabled[entityKey] when enabled === true, regardless of enabledByDefault. Remove the !handler.enabledByDefault guard at line 253.
2. Pre-merge gate: enumerate prod config rows affected by the behavior flip
Building on the open thread at line 197 - @solaris007 correctly flagged that any handler where (site in enabled.sites AND org in disabled.orgs) will flip from "audit off" to "audit on" at deploy time. The author replied "I do not have prod access."
Path forward: someone with prod read access fetches the S3 singleton config and enumerates the affected (handler, siteId, orgId) triples. This is a read-only operation on a single JSON blob - should take under 30 minutes. Based on the count:
- Zero: merge with confidence.
- Small (<10): list them in the PR description so on-call knows what to expect.
- Large: stage the rollout or pre-clean stale
enabled.sitesentries.
This is not a code fix but a deploy-readiness gate. The PR should not merge until this is answered.
3. New cleanup branches in #updatedHandler lack direct tests (configuration.model.test.js)
The PR adds write-side logic that removes entities from the opposite list on every transition (lines 250-251, 261-262 of the model). The existing tests start with handlers where the entity is NOT in the opposite list, so the cleanup is a no-op and never exercised. No test fails if those .filter(...) lines are removed.
Fix: Add round-trip tests, plus a enabledByDefault: false override test (the current test at line 248 uses enabledByDefault: true only). See inline comment for examples.
Minor (Nice to Have)
4. isHandlerEnabledForSite and isHandlerEnabledForOrg now use different precedence models, undocumented
After this PR, isHandlerEnabledForSite applies site-then-org precedence while isHandlerEnabledForOrg still uses "any disabled wins." A config where org O is in disabled.orgs and site S (in O) is in enabled.sites will report isHandlerEnabledForOrg(O) = false while isHandlerEnabledForSite(S) = true. This is likely intentional but undocumented.
Fix: Add a brief JSDoc on both methods noting the divergence.
5. "Site in both enabled.sites and disabled.sites" conflict resolution is implicit
The new code makes disabled.sites win (checked first at line 194). #updatedHandler keeps these disjoint on writes, but legacy data or manual edits could violate this. A one-line comment ("assumes enabled.sites and disabled.sites are disjoint; see #updatedHandler") or a test pinning the conflict resolution would make the invariant explicit.
Recommendations
- Add a DEBUG-level log line when the site-override fires (
enabledSites.includes(siteId) && disabledOrgs.includes(orgId)) to give on-call a grep target for post-deploy validation. - Update the PR description to include a "Deployment impact" section noting the behavior change for existing config rows.
- Consider adding a JSDoc block on
isHandlerEnabledForSiteenumerating the full precedence rules.
Assessment
Ready to merge? No, with fixes.
Reasoning: The read-path fix is clean and well-tested for the headline case, but the write path leaves the enabledByDefault: true override unreachable via the public API (finding 1), the new cleanup branches lack direct test coverage (finding 3), and the prod-state enumeration from Thread 1 remains unanswered (finding 2). Fixing findings 1 and 3 are small code changes; finding 2 is a read-only prod query that someone with access can run.
Next Steps
- Fix the write path in
#updatedHandlerto always add toenabled[entityKey]when enabling (finding 1). - Add the missing tests:
enabledByDefault: falseoverride, round-trip cleanup assertions (finding 3). - Resolve Thread 1 by running the prod config enumeration before merging (finding 2).
- Minor items (findings 4-5) are optional but improve future readability.
|
Hi @solaris007 thank you for the review. I addressed review comments. |
solaris007
left a comment
There was a problem hiding this comment.
Hey @tkotthakota-adobe,
Thanks for the thorough iteration. All five prior findings + two of three recommendations cleanly addressed. Two Important test gaps remain and the prod-enumeration thread (Important issue 2 from the prior review) needs a reviewer-side action - that one is on me, not you.
Strengths
Previously flagged issues now addressed:
- Important issue 1 (write path doesn't deliver site-override for
enabledByDefault: truehandlers): cleanly fixed atconfiguration.model.js:280-286. Theif (!handler.enabledByDefault)guard is gone;enableHandlerForSitenow reliably populatesenabled.sitesfor both default and non-default handlers, making the override actually reachable via the public API. The new comment ("Always add to enabled so isHandlerEnabledForSite can find the site in enabled.sites and return true even when the org is in disabled.orgs (site-level override)") accurately describes the behavior. - Important issue 3 (cleanup branches lack direct tests): addressed via three new tests. Test 2 ("removes site from disabled.sites when re-enabling for a non-default handler") genuinely exercises the new write path - handler starts with
disabled.sites: [siteId], afterupdateHandlerSites(..., true)both the cleanup branch AND the unconditional-add fix from Important issue 1 are exercised. Test 3 covers the symmetric case forenabledByDefault: truegoing through theelse ifbranch. - Minor issue 4 (precedence divergence undocumented): JSDoc on both methods is accurate. I traced
isHandlerEnabledForSiteline-by-line against the 6-step ladder and it matches. JSDoc explicitly calls out the divergence between the two methods. - Minor issue 5 (implicit conflict resolution between enabled.sites/disabled.sites): JSDoc on
isHandlerEnabledForSitedocuments both the disjointness assumption AND the invariant maintained by#updatedHandler. The "lists disjoint" assumption is now actively enforced by the writer rather than just assumed. - Recommendation 1 (DEBUG log on site-override fire): implemented at
configuration.model.js:213-215. The log fires only when the override is actually triggered (site inenabled.sitesAND org indisabled.orgs). Includestype,siteId,orgIdfor correlation. - Recommendation 3 (JSDoc precedence enumeration): addressed via the Minor issue 4 fix.
This iteration:
- Write-path simplification reads cleanly.
#updatedHandlerno longer branches onenabledByDefaultfor the enable case; always-add-to-enabled / always-remove-from-disabled is easier to reason about and matches the new read path. - Asymmetric precedence model (site-overrides-org for
isHandlerEnabledForSite; org-disable-always-wins forisHandlerEnabledForOrg) is now intentional and self-documented. The JSDoc contrasts the two methods explicitly. The asymmetry maps cleanly to how each method is used at its respective fan-out boundary (per-site vs per-org consumer in jobs-dispatcher reports). - The 6-step precedence table is now the de-facto contract. Future readers can cite it instead of reverse-engineering branching code.
- Authorization model integrity is sound.
(enabled.sites, disabled.orgs)configuration is written only by trusted admin paths inspacecat-api-service(adminx-api-keycontrollers, internal onboarding flows, Slack admin commands). End users cannot self-promote a site intoenabled.sites. The "site overrides org" semantics are an explicit, policy-driven business rule (paid upgrade), not a privilege-escalation surface. - DEBUG log payload
{ type, siteId, orgId }contains only non-secret identifiers. No PII, tokens, or credentials. - Diff is tightly scoped to addressing prior findings; no drive-by changes.
Issues
Important (Should Fix)
-
Headline production scenario has no end-to-end test. The PR exists to fix the free-trial -> paid upgrade path, but no test exercises the full flow:
updateHandlerSites('paid-handler', siteId, true)on a state where the org is indisabled.orgs, then assertisHandlerEnabledForSite('paid-handler', site)is true. The new "paid-handler" test is read-path-only (usesaddHandlerto pre-populate, bypassing the write path); Test 2 exercises the write path but without org indisabled.orgs. Add one combining test - this is what tells future maintainers what the PR is for. See inline comment for the test shape. -
enabledByDefault: falsedisable branch is not directly tested.#updatedHandlernow has three write paths: enable (any handler), disable withenabledByDefault: true, and disable withenabledByDefault: false(theelsebranch atconfiguration.model.js:292-294). The first two are covered; the third is not. With this PR's "always add to enabled on enable" change, the asymmetry of the third branch is load-bearing: a paid handler that gets enabled (now inenabled.sites) and then disabled must NOT linger as enabled. See inline comment for the test shape.
Recommendations
- Prod-state enumeration (prior Important issue 2) - shifting to my side. Author correctly stated they lack prod access; this is a deployment-readiness gate, not a code fix. I will run the query against the prod Configuration singleton: pull the current S3 config, compute
intersect(enabled.sites, sitesOf(disabled.orgs))per handler, and report the count of (handler, siteId, orgId) triples that flip from disabled to enabled at deploy. If zero or small + expected, merge proceeds; if surprising, the team can decide whether to merge-as-is, pre-clean, or notify affected customers. Tracking this on the PR before approval. - DEBUG vs INFO for the override log. The author followed the prior review recommendation (DEBUG). Worth re-evaluating: DEBUG is typically filtered out in prod observability, so the post-deploy validation purpose may not work without changing log filters. For the first 30 days post-deploy, INFO would make the canary signal greppable by on-call without filter changes; afterwards demote to DEBUG. Reasonable counter-argument: keep DEBUG and document in the rollout runbook. Author's call - this reverses my prior recommendation, which I now think was suboptimal for the canary purpose.
- Add "Deployment impact" section to PR description (recommendation 2 from prior review, still not addressed). One paragraph noting: configs where site-in-enabled-sites + org-in-disabled-orgs flip from disabled to enabled at deploy time, the prod-enumeration result once it is run, and the rollback path. The merge commit body should be accurate.
- Document the rollback recovery path. If a customer reports unexpected audits-on after deploy: Path A (small blast radius) - clean up the prod config singleton (single S3 write, recovers immediately). Path B (large blast radius or unclear scope) - revert this PR in spacecat-shared, publish patch release, bump consumers (slow, multi-repo). Worth noting in the PR description or a follow-up runbook.
- Future: precedence test matrix derived from JSDoc. The 6-row precedence ladder is now the de-facto contract. A small data-driven test where each row of the table is one assertion would lock the spec and the implementation together and prevent silent drift. Cheap to add, exactly the shape of test that does not regress when refactoring the body. Out of scope for this PR; worth a follow-up issue.
Minor
-
Disjointness invariant lacks a pinning test. JSDoc says "disabled.sites and enabled.sites are assumed disjoint; disabled.sites wins if violated." A one-line test seeding both lists with the same
siteIdand assertingfalsewould prevent silent precedence drift on legacy data. -
Idempotency of enable not pinned.
Array.from(new Set([...]))deduplication is the only thing keepingenabled.sitesfrom growing unbounded ifupdateHandlerSites(..., true)is called repeatedly. A single test assertingenabled.sites.length === 1after two consecutive enables would lock this behavior. -
updateHandlerOrgscleanup not directly tested. Same#updatedHandlercode path as sites, but the read-side semantics for orgs are now formally asymmetric (no site-level override). One test verifyingupdateHandlerOrgs(type, orgId, true)followed byisHandlerEnabledForOrg(...)returns true after a prior disable would mirror the site-side assertion.
Assessment
Ready to merge? With fixes.
Reasoning: code-level work is clean. The write-path simplification reads well, the JSDoc captures the contract, and the cleanup branch tests address the prior review properly. Two Important test gaps remain - the headline end-to-end scenario and the enabledByDefault: false disable branch - both small additions. The prod-enumeration deployment-readiness gate (prior Important issue 2) is now reviewer-side; I will run the query and report results on the PR before approval.
Next Steps
- Add the two missing tests (Important issues 1 and 2). Both are small additions to the existing test file.
- Reviewer (me) runs the prod-state enumeration before approving. I will paste results into the PR.
- Decide on the DEBUG vs INFO log level question.
- Optional: update PR description with deployment impact and rollback path; add the Minor tests if you have appetite for completeness.
| enabledByDefault: false, | ||
| enabled: { sites: [site.getId()], orgs: [] }, | ||
| disabled: { sites: [], orgs: [site.getOrganizationId()] }, | ||
| }); |
There was a problem hiding this comment.
Important - the headline production scenario lacks an end-to-end test.
The PR exists to fix the free-trial -> paid upgrade path: org has handler X with enabledByDefault: false and is in disabled.orgs; we upgrade site S via updateHandlerSites('X', siteId, true); we expect isHandlerEnabledForSite('X', S) to return true afterwards. The new "paid-handler" test above is a READ-PATH-only assertion - it pre-populates state directly via addHandler and never calls updateHandlerSites. Test 2 below ("removes site from disabled.sites when re-enabling for a non-default handler") exercises the write path but with disabled.orgs: [], so the override-against-disabled-org case is not asserted post-write.
Without a test that combines write + read, a future regression that breaks updateHandlerSites for non-default handlers (e.g., someone re-introducing the if (!enabledByDefault) guard the prior review asked you to remove) would not be caught: the read-path "paid-handler" test would still pass because it bypasses the write path. Add one test:
it('upgrades non-default handler from disabled org via updateHandlerSites', () => {
instance.addHandler('paid-handler', {
enabledByDefault: false,
enabled: { sites: [], orgs: [] },
disabled: { sites: [], orgs: [site.getOrganizationId()] },
});
instance.updateHandlerSites('paid-handler', site.getId(), true);
expect(instance.isHandlerEnabledForSite('paid-handler', site)).to.be.true;
});This is the test that tells future maintainers what the PR is for.
|
|
||
| instance.updateHandlerSites('default-cleanup-handler', site.getId(), false); | ||
|
|
||
| expect(instance.getHandler('default-cleanup-handler').enabled.sites).to.not.include(site.getId()); |
There was a problem hiding this comment.
Important - enabledByDefault: false disable branch is not directly tested.
#updatedHandler has three write paths after this PR's simplification:
- Enable: remove from disabled + add to enabled (any handler) - covered by Test 2 ("cleanup-handler")
- Disable +
enabledByDefault: true: add to disabled + remove from enabled - covered by Test 3 ("default-cleanup-handler") - Disable +
enabledByDefault: false(theelsebranch atconfiguration.model.js:292-294): only remove from enabled, do NOT add to disabled - NOT covered.
With the new "always add to enabled on enable" change in this PR, the asymmetry of branch 3 is now load-bearing: a paid handler that gets enabled (now in enabled.sites) and then disabled must NOT linger in enabled.sites. Add at least:
it('removes site from enabled.sites when disabling for a non-default handler', () => {
instance.addHandler('paid-disable-handler', {
enabledByDefault: false,
enabled: { sites: [site.getId()], orgs: [] },
disabled: { sites: [], orgs: [] },
});
instance.updateHandlerSites('paid-disable-handler', site.getId(), false);
expect(instance.getHandler('paid-disable-handler').enabled.sites).to.not.include(site.getId());
expect(instance.getHandler('paid-disable-handler').disabled.sites).to.not.include(site.getId());
});This pins the contract that disabling a non-default handler is "not in either list" (default-off applies), distinct from disabling a default handler ("in disabled list, default-on overridden").
https://jira.corp.adobe.com/browse/SITES-41916
Problem:
Site-level handler overrides were ignored — org-level disable always won, so upgrading a site from
free-trial to paid couldn't re-enable audits. Also, enabling a handler could crash with a TypeError if
the disabled.sites array didn't exist.