fix: stop rewriting the wiki every run (hash excludes Last Updated timestamp)#35
Conversation
build_wiki_content prepends a '**Last Updated:** <now>' header and update_wiki_page hashed the whole content, so the hash differed every run, cached_hash == content_hash was never true, and the wiki page was rewritten with a fresh revision on every cycle even when the modlog was unchanged — the wiki_hash_cache skip was fully defeated (needless API writes + revision spam). get_content_hash now strips the volatile Last Updated line before hashing, so the hash reflects only the modlog body. Self-heals after one write (the first post-deploy run rehashes the existing page, then subsequent unchanged runs skip). Verified: timestamp-only diffs now hash equal; body changes still differ.
|
Warning Review limit reached
More reviews will be available in 53 minutes and 33 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
Walkthrough
Changesmodlog_wiki_publisher.py overhaul
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
modlog_wiki_publisher.py (3)
362-384: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick winAdd a real migration for
target_authorbefore selecting it.
get_recent_actions_from_dbunconditionally selectstarget_author, but the v5 migration only addssubreddit. Existing databases that have not gone throughstore_processed_action’s lazy column-add path will hitno such column: target_author, return[], and skip wiki rebuilds. Add an idempotent migration fortarget_authorand bump the schema version so already-migrated DBs get it too.Also applies to: 867-880
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 362 - 384, The v5 migration in the database upgrade path only adds subreddit, but get_recent_actions_from_db still selects target_author unconditionally, so older databases can fail with a missing-column error and skip wiki rebuilds. Add an idempotent migration in the same migration block that checks for target_author with PRAGMA table_info(processed_actions), adds it if missing, and updates the schema version so existing installs are upgraded as well; keep the logic aligned with store_processed_action and get_recent_actions_from_db.
1704-1708: 🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy liftDo not persist environment-provided secrets during config auto-update.
original_config.update(env_config)merges env credentials into the runtime config, and the auto-update path then dumps that merged config back toconfig_path. If env vars provide Reddit credentials and a default is added, secrets get written to disk. Keep a separate file-only config for auto-update, or strip env-sourced fields before writing.Also applies to: 1725-1739
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 1704 - 1708, The config auto-update path is persisting environment-sourced secrets because `original_config.update(env_config)` merges runtime overrides into the same object that later gets written back to disk. In `load_env_config`/the auto-update flow around `config_path`, keep env overrides separate from the file-backed config or filter out env-provided credential fields before saving. Make sure the write step only serializes the original on-disk config plus non-sensitive defaults, not values injected from environment variables.
1887-1894: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winReset
error_countonly after a successful cycle.
error_countis reset at the start of each retry, so repeated failures never progress beyond attempt1/max_errorsand continuous mode may loop forever. Move the reset to aftercleanup_old_entriescompletes successfully.Also applies to: 1931-1944
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modlog_wiki_publisher.py` around lines 1887 - 1894, The retry state in modlog_wiki_publisher’s processing cycle is being cleared too early, which prevents repeated failures from advancing correctly. Update the logic around the main run flow in the processing function that calls process_modlog_actions and cleanup_old_entries so error_count is only reset after the full cycle completes successfully, including cleanup_old_entries; keep the retry counter intact across failures so max_errors and continuous mode behave correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modlog_wiki_publisher.py`:
- Around line 2005-2017: The --force-modlog path in modlog_wiki_publisher.py is
still using the incremental process_modlog_actions flow, so it can miss retained
older entries and skip refreshing cached actions. Update the force branch to use
a dedicated full-rebuild fetch path in the force-modlog handling logic, bounded
by retention instead of batch_size/duplicate filtering, and keep the rebuild
step using get_recent_actions_from_db with the force behavior aligned to that
full refresh.
- Around line 927-933: The permalink handling currently preserves and renders
absolute Reddit URLs as-is, which can create malformed double-prefixed links and
allow legacy profile URLs to slip through. Update the permalink normalization
and rendering path in target_permalink handling and format_content_link so only
Reddit content links containing /comments/ are accepted, reject /u/ and /user/
profile links, and normalize any absolute reddit.com URL before rendering
instead of blindly prefixing https://www.reddit.com. Apply the same guard to the
related target_permalink cached/rendered code paths so all link output is
content-only.
- Around line 1547-1550: The 403 diagnostic in the wiki publishing error path is
logging the placeholder literally because the message in the logger.error call
is missing interpolation. Update the error message in the relevant
publish/diagnostic block so the content_size value is formatted at runtime, and
keep the surrounding “Possible causes” logging in place to preserve the context
of the failure.
- Around line 1477-1494: The wiki-size safeguard is being swallowed by the broad
exception handler, so the `ValueError` raised in the existing-size check never
stops the update. In `modlog_wiki_publisher.py`, update the logic around the
existing wiki size lookup and the `existing_size > warning_threshold` block so
that only the wiki lookup failure is ignored, while the local safety error from
the size limit check is re-raised or allowed to propagate. Keep the handling
focused on the size-check path and use the existing symbols `existing_size`,
`warning_threshold`, and the `except Exception as e` block to separate lookup
errors from update-blocking safety errors.
---
Outside diff comments:
In `@modlog_wiki_publisher.py`:
- Around line 362-384: The v5 migration in the database upgrade path only adds
subreddit, but get_recent_actions_from_db still selects target_author
unconditionally, so older databases can fail with a missing-column error and
skip wiki rebuilds. Add an idempotent migration in the same migration block that
checks for target_author with PRAGMA table_info(processed_actions), adds it if
missing, and updates the schema version so existing installs are upgraded as
well; keep the logic aligned with store_processed_action and
get_recent_actions_from_db.
- Around line 1704-1708: The config auto-update path is persisting
environment-sourced secrets because `original_config.update(env_config)` merges
runtime overrides into the same object that later gets written back to disk. In
`load_env_config`/the auto-update flow around `config_path`, keep env overrides
separate from the file-backed config or filter out env-provided credential
fields before saving. Make sure the write step only serializes the original
on-disk config plus non-sensitive defaults, not values injected from environment
variables.
- Around line 1887-1894: The retry state in modlog_wiki_publisher’s processing
cycle is being cleared too early, which prevents repeated failures from
advancing correctly. Update the logic around the main run flow in the processing
function that calls process_modlog_actions and cleanup_old_entries so
error_count is only reset after the full cycle completes successfully, including
cleanup_old_entries; keep the retry counter intact across failures so max_errors
and continuous mode behave correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 885058d5-fc44-4873-a070-3971cd07135e
📒 Files selected for processing (1)
modlog_wiki_publisher.py
| if existing_size > warning_threshold: | ||
| logger.warning(f"Existing wiki page already at {existing_size:,} bytes ({(existing_size/REDDIT_WIKI_LIMIT)*100:.1f}% of limit)") | ||
| logger.warning( | ||
| f"Existing wiki page already at {existing_size:,} bytes ({(existing_size / REDDIT_WIKI_LIMIT) * 100:.1f}% of limit)" | ||
| ) | ||
|
|
||
| # If we're trying to add more content to an already large page | ||
| if content_size >= existing_size: | ||
| logger.error(f"Cannot increase wiki size from {existing_size:,} to {content_size:,} bytes - too close to limit") | ||
| logger.error("Consider reducing retention_days or max_wiki_entries_per_page in config") | ||
| raise ValueError(f"Wiki page too large to update safely") | ||
| logger.error( | ||
| f"Cannot increase wiki size from {existing_size:,} to {content_size:,} bytes - too close to limit" | ||
| ) | ||
| logger.error( | ||
| "Consider reducing retention_days or max_wiki_entries_per_page in config" | ||
| ) | ||
| raise ValueError("Wiki page too large to update safely") | ||
| except Exception as e: | ||
| # Wiki page might not exist yet, that's okay | ||
| if "404" not in str(e) and "not found" not in str(e).lower(): | ||
| logger.debug(f"Could not check existing wiki size: {e}") |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Do not swallow the wiki-size safeguard.
The ValueError raised when the existing wiki is already near the limit is immediately caught by the broad except Exception, so the edit proceeds anyway. Re-raise local safety errors or narrow the exception handler to the wiki lookup failure.
Proposed fix
- except Exception as e:
+ except ValueError:
+ raise
+ except Exception as e:
# Wiki page might not exist yet, that's okay
if "404" not in str(e) and "not found" not in str(e).lower():
logger.debug(f"Could not check existing wiki size: {e}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if existing_size > warning_threshold: | |
| logger.warning(f"Existing wiki page already at {existing_size:,} bytes ({(existing_size/REDDIT_WIKI_LIMIT)*100:.1f}% of limit)") | |
| logger.warning( | |
| f"Existing wiki page already at {existing_size:,} bytes ({(existing_size / REDDIT_WIKI_LIMIT) * 100:.1f}% of limit)" | |
| ) | |
| # If we're trying to add more content to an already large page | |
| if content_size >= existing_size: | |
| logger.error(f"Cannot increase wiki size from {existing_size:,} to {content_size:,} bytes - too close to limit") | |
| logger.error("Consider reducing retention_days or max_wiki_entries_per_page in config") | |
| raise ValueError(f"Wiki page too large to update safely") | |
| logger.error( | |
| f"Cannot increase wiki size from {existing_size:,} to {content_size:,} bytes - too close to limit" | |
| ) | |
| logger.error( | |
| "Consider reducing retention_days or max_wiki_entries_per_page in config" | |
| ) | |
| raise ValueError("Wiki page too large to update safely") | |
| except Exception as e: | |
| # Wiki page might not exist yet, that's okay | |
| if "404" not in str(e) and "not found" not in str(e).lower(): | |
| logger.debug(f"Could not check existing wiki size: {e}") | |
| if existing_size > warning_threshold: | |
| logger.warning( | |
| f"Existing wiki page already at {existing_size:,} bytes ({(existing_size / REDDIT_WIKI_LIMIT) * 100:.1f}% of limit)" | |
| ) | |
| # If we're trying to add more content to an already large page | |
| if content_size >= existing_size: | |
| logger.error( | |
| f"Cannot increase wiki size from {existing_size:,} to {content_size:,} bytes - too close to limit" | |
| ) | |
| logger.error( | |
| "Consider reducing retention_days or max_wiki_entries_per_page in config" | |
| ) | |
| raise ValueError("Wiki page too large to update safely") | |
| except ValueError: | |
| raise | |
| except Exception as e: | |
| # Wiki page might not exist yet, that's okay | |
| if "404" not in str(e) and "not found" not in str(e).lower(): | |
| logger.debug(f"Could not check existing wiki size: {e}") |
🧰 Tools
🪛 Ruff (0.15.18)
[warning] 1479-1479: Logging statement uses f-string
(G004)
[warning] 1485-1485: Logging statement uses f-string
(G004)
[warning] 1490-1490: Abstract raise to an inner function
(TRY301)
[warning] 1490-1490: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 1491-1491: Do not catch blind exception: Exception
(BLE001)
[warning] 1494-1494: Logging statement uses f-string
(G004)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modlog_wiki_publisher.py` around lines 1477 - 1494, The wiki-size safeguard
is being swallowed by the broad exception handler, so the `ValueError` raised in
the existing-size check never stops the update. In `modlog_wiki_publisher.py`,
update the logic around the existing wiki size lookup and the `existing_size >
warning_threshold` block so that only the wiki lookup failure is ignored, while
the local safety error from the size limit check is re-raised or allowed to
propagate. Keep the handling focused on the size-check path and use the existing
symbols `existing_size`, `warning_threshold`, and the `except Exception as e`
block to separate lookup errors from update-blocking safety errors.
| logger.error("Possible causes:") | ||
| logger.error(" 1. Wiki page size limit exceeded (current content: {content_size:,} bytes)") | ||
| logger.error( | ||
| " 1. Wiki page size limit exceeded (current content: {content_size:,} bytes)" | ||
| ) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Interpolate content_size in the 403 diagnostic.
This log message is missing the f prefix, so it emits {content_size:,} literally and loses the key troubleshooting detail.
Proposed fix
- logger.error(
- " 1. Wiki page size limit exceeded (current content: {content_size:,} bytes)"
- )
+ logger.error(
+ f" 1. Wiki page size limit exceeded (current content: {content_size:,} bytes)"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.error("Possible causes:") | |
| logger.error(" 1. Wiki page size limit exceeded (current content: {content_size:,} bytes)") | |
| logger.error( | |
| " 1. Wiki page size limit exceeded (current content: {content_size:,} bytes)" | |
| ) | |
| logger.error("Possible causes:") | |
| logger.error( | |
| f" 1. Wiki page size limit exceeded (current content: {content_size:,} bytes)" | |
| ) |
🧰 Tools
🪛 Ruff (0.15.18)
[warning] 1547-1547: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 1548-1550: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modlog_wiki_publisher.py` around lines 1547 - 1550, The 403 diagnostic in the
wiki publishing error path is logging the placeholder literally because the
message in the logger.error call is missing interpolation. Update the error
message in the relevant publish/diagnostic block so the content_size value is
formatted at runtime, and keep the surrounding “Possible causes” logging in
place to preserve the context of the failure.
Source: Path instructions
| if args.force_modlog: | ||
| logger.info("Force modlog requested - fetching ALL modlog actions from Reddit and rebuilding wiki...") | ||
| logger.info( | ||
| "Force modlog requested - fetching ALL modlog actions from Reddit and rebuilding wiki..." | ||
| ) | ||
| # First, fetch all recent modlog actions to populate database | ||
| logger.info("Fetching all modlog actions from Reddit...") | ||
| process_modlog_actions(reddit, config) | ||
|
|
||
| # Then rebuild wiki from database (showing only removal actions) | ||
| logger.info("Rebuilding wiki from database...") | ||
| actions = get_recent_actions_from_db(config, force_all_actions=False, show_only_removals=True) | ||
| actions = get_recent_actions_from_db( | ||
| config, force_all_actions=False, show_only_removals=True | ||
| ) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Make --force-modlog actually rebuild from all recent modlog entries.
This branch logs “fetching ALL modlog actions,” but it calls process_modlog_actions, which is still capped by batch_size and skips duplicate action IDs. A force rebuild can therefore miss older retained actions or fail to refresh existing cached rows. Use a dedicated force-fetch path bounded by retention, not the normal incremental batch path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modlog_wiki_publisher.py` around lines 2005 - 2017, The --force-modlog path
in modlog_wiki_publisher.py is still using the incremental
process_modlog_actions flow, so it can miss retained older entries and skip
refreshing cached actions. Update the force branch to use a dedicated
full-rebuild fetch path in the force-modlog handling logic, bounded by retention
instead of batch_size/duplicate filtering, and keep the rebuild step using
get_recent_actions_from_db with the force behavior aligned to that full refresh.
A session formatter (black --line-length=88) reformatted the entire file on the prior commit; the repo uses black --line-length=180. This restores the repo's formatting so the net change is only the get_content_hash fix and pre-commit passes.
Bug (in prod)
update_wiki_pagehashes the full wiki content, which begins with**Last Updated:** <current time>(build_wiki_content). Because the timestamp changes every run,content_hashis always new,cached_hash == content_hashis never true, and the page is rewritten with a fresh revision every ~10 min even when nothing changed. The entirewiki_hash_cachetable is dead weight.Impact: needless Reddit wiki writes + revision-history spam. Not data corruption.
Fix
get_content_hashstrips the volatileLast Updatedline before hashing, so the change-detection hash reflects only the modlog body. Self-heals after one write post-deploy.Verification
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes