feat: 新增按 Token 阈值触发上下文压缩及防死循环机制#8363
Conversation
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
context_limit_typestring and the defaultcompression_token_thresholdvalue4000are duplicated across several modules (ContextConfig,MainAgentBuildConfig,tool_loop_agent_runner.reset, pipeline stage settings, default config); consider centralizing these into shared constants or config defaults to avoid drift and make future changes safer. - In
ContextManager.process, turning offenforce_max_turnsentirely whencontext_limit_type == "token"may make it harder to cap pathological long-turn histories; it might be worth allowing token-based compression and a hard max-turns cap to coexist, or at least documenting why they must be mutually exclusive.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `context_limit_type` string and the default `compression_token_threshold` value `4000` are duplicated across several modules (`ContextConfig`, `MainAgentBuildConfig`, `tool_loop_agent_runner.reset`, pipeline stage settings, default config); consider centralizing these into shared constants or config defaults to avoid drift and make future changes safer.
- In `ContextManager.process`, turning off `enforce_max_turns` entirely when `context_limit_type == "token"` may make it harder to cap pathological long-turn histories; it might be worth allowing token-based compression and a hard max-turns cap to coexist, or at least documenting why they must be mutually exclusive.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request introduces a new "token" context limit mode alongside the default "turn" mode, allowing context compression to be triggered by an absolute token threshold. The reviewer suggested optimizing the compression and truncation checks in the default "turn" mode by ensuring that these operations are only triggered when there are actual compressible messages (user or assistant) present, preventing redundant operations and unnecessary logs when only a long system prompt is in the context.
| if self.config.context_limit_type == "token": | ||
| if ( | ||
| self._has_compressible_messages(result) | ||
| and total_tokens >= self.config.compression_token_threshold | ||
| ): | ||
| result = await self._run_compression(result, total_tokens) | ||
| else: | ||
| if self.compressor.should_compress( | ||
| result, total_tokens, self.config.max_context_tokens | ||
| ): | ||
| result = await self._run_compression(result, total_tokens) |
There was a problem hiding this comment.
在默认的按轮次压缩模式(context_limit_type != "token")下,如果系统提示词(System Prompt)非常长且超过了 max_context_tokens,而上下文中又没有可压缩的 user 或 assistant 消息时,self.compressor.should_compress 仍会返回 True。这会导致在每次处理消息时都无意义地触发 _run_compression,产生不必要的日志和 CPU 开销。
建议在 else 分支中也加入 self._has_compressible_messages(result) 的判断,以完全避免这种情况。
| if self.config.context_limit_type == "token": | |
| if ( | |
| self._has_compressible_messages(result) | |
| and total_tokens >= self.config.compression_token_threshold | |
| ): | |
| result = await self._run_compression(result, total_tokens) | |
| else: | |
| if self.compressor.should_compress( | |
| result, total_tokens, self.config.max_context_tokens | |
| ): | |
| result = await self._run_compression(result, total_tokens) | |
| if self.config.context_limit_type == "token": | |
| if ( | |
| self._has_compressible_messages(result) | |
| and total_tokens >= self.config.compression_token_threshold | |
| ): | |
| result = await self._run_compression(result, total_tokens) | |
| else: | |
| if ( | |
| self._has_compressible_messages(result) | |
| and self.compressor.should_compress( | |
| result, total_tokens, self.config.max_context_tokens | |
| ) | |
| ): | |
| result = await self._run_compression(result, total_tokens) |
| if self.config.context_limit_type == "token": | ||
| if ( | ||
| self._has_compressible_messages(messages) | ||
| and tokens_after_summary >= self.config.compression_token_threshold | ||
| ): | ||
| logger.info( | ||
| "Context still exceeds compression threshold after compression, applying halving truncation..." | ||
| ) | ||
| messages = self.truncator.truncate_by_halving(messages) | ||
| else: | ||
| if self.compressor.should_compress( | ||
| messages, tokens_after_summary, self.config.max_context_tokens | ||
| ): | ||
| logger.info( | ||
| "Context still exceeds max tokens after compression, applying halving truncation..." | ||
| ) | ||
| messages = self.truncator.truncate_by_halving(messages) |
There was a problem hiding this comment.
同样地,在 _run_compression 的最后检查中,如果 context_limit_type != "token" 且没有可压缩的消息,但系统提示词依然超限,程序会尝试进行减半截断(truncate_by_halving),并打印无意义的日志。
建议在 else 分支中也加入 self._has_compressible_messages(messages) 的判断。
| if self.config.context_limit_type == "token": | |
| if ( | |
| self._has_compressible_messages(messages) | |
| and tokens_after_summary >= self.config.compression_token_threshold | |
| ): | |
| logger.info( | |
| "Context still exceeds compression threshold after compression, applying halving truncation..." | |
| ) | |
| messages = self.truncator.truncate_by_halving(messages) | |
| else: | |
| if self.compressor.should_compress( | |
| messages, tokens_after_summary, self.config.max_context_tokens | |
| ): | |
| logger.info( | |
| "Context still exceeds max tokens after compression, applying halving truncation..." | |
| ) | |
| messages = self.truncator.truncate_by_halving(messages) | |
| if self.config.context_limit_type == "token": | |
| if ( | |
| self._has_compressible_messages(messages) | |
| and tokens_after_summary >= self.config.compression_token_threshold | |
| ): | |
| logger.info( | |
| "Context still exceeds compression threshold after compression, applying halving truncation..." | |
| ) | |
| messages = self.truncator.truncate_by_halving(messages) | |
| else: | |
| if ( | |
| self._has_compressible_messages(messages) | |
| and self.compressor.should_compress( | |
| messages, tokens_after_summary, self.config.max_context_tokens | |
| ) | |
| ): | |
| logger.info( | |
| "Context still exceeds max tokens after compression, applying halving truncation..." | |
| ) | |
| messages = self.truncator.truncate_by_halving(messages) |
a4c4a7d to
9bd38ca
Compare
Resolves #8348
动机与背景 (Motivation)
当前 AstrBot 的上下文压缩机制主要依赖“对话轮数”和“模型最大物理窗口的 82%”来触发。但在使用长文本任务或记忆注入插件时,极易因 Token 堆积导致模型前言不搭后语,甚至在特定情况下引发压缩逻辑的死循环。
在我的使用情况中,我更偏好让LLM将自己的想法马上说出来,使得对话轮次长,每行token占用少,但当进行深层次短轮次高token消耗时,二者的压缩策略出现冲突。
为此,我新增了 按 Token 阈值主动触发压缩 的策略,并完善了底层的防死循环保护。
改动点 (Changes)
ContextManager的process()和_run_compression()中增加了三层校验。如果当前上下文中没有可压缩的user/assistant消息(例如全是 protected system prompts),则拒绝触发无意义的压缩,直接打断死循环。compression_token_threshold触发精简。default.py,只有在选择了“按 Token 触发”时,才会显示 Token 阈值输入框,避免用户产生逻辑混淆。待解决问题 (Help Wanted: Frontend i18n / Build)
目前后端逻辑和配置层均已开发完毕并测试通过。我也修改了
dashboard/src/i18n/下的中英文翻译(如context_limit_type等字段)。遇到的问题:
由于本地启动时系统会自动下载云端的
dist.zip覆盖前端,导致我本地修改的 i18n 字典无法实时生效(配置面板会显示 raw key 而非中文翻译)。因为不太熟悉项目标准的前端发布 Pipeline,想请维护者大佬在 Review 完后端代码后,顺手帮忙更新一下云端的 dist。非常感谢!
Screenshots or Test Results / 运行截图或测试结果
📺 前端 UI 本地测试效果
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Add a token-threshold-based context compression mode with safeguards against infinite compression loops, and wire it through configuration, runtime, and UI metadata.
New Features:
Bug Fixes:
Enhancements: