feat: 新增按 Token 阈值触发上下文压缩及防死循环机制 by Newtonian-No · Pull Request #8363 · AstrBotDevs/AstrBot

Newtonian-No · 2026-05-26T17:33:41Z

Resolves #8348

动机与背景 (Motivation)

当前 AstrBot 的上下文压缩机制主要依赖“对话轮数”和“模型最大物理窗口的 82%”来触发。但在使用长文本任务或记忆注入插件时，极易因 Token 堆积导致模型前言不搭后语，甚至在特定情况下引发压缩逻辑的死循环。

在我的使用情况中，我更偏好让LLM将自己的想法马上说出来，使得对话轮次长，每行token占用少，但当进行深层次短轮次高token消耗时，二者的压缩策略出现冲突。

为此，我新增了 按 Token 阈值主动触发压缩 的策略，并完善了底层的防死循环保护。

改动点 (Changes)

防死循环保护 (Safeguard): 在 ContextManager 的 process() 和 _run_compression() 中增加了三层校验。如果当前上下文中没有可压缩的 user/assistant 消息（例如全是 protected system prompts），则拒绝触发无意义的压缩，直接打断死循环。
底层策略分支:
- 兼容旧版逻辑：依然默认支持“按轮次截断”。
- 新增逻辑：若开启“按 Token 触发”，则优先根据用户设定的 compression_token_threshold 触发精简。
UI Schema 条件渲染: 修改了 default.py，只有在选择了“按 Token 触发”时，才会显示 Token 阈值输入框，避免用户产生逻辑混淆。
测试通过: 原有上下文管理相关的 86 个测试用例已全部通过，向后兼容性良好。

待解决问题 (Help Wanted: Frontend i18n / Build)

目前后端逻辑和配置层均已开发完毕并测试通过。我也修改了 dashboard/src/i18n/ 下的中英文翻译（如 context_limit_type 等字段）。

遇到的问题：
由于本地启动时系统会自动下载云端的 dist.zip 覆盖前端，导致我本地修改的 i18n 字典无法实时生效（配置面板会显示 raw key 而非中文翻译）。

因为不太熟悉项目标准的前端发布 Pipeline，想请维护者大佬在 Review 完后端代码后，顺手帮忙更新一下云端的 dist。非常感谢！

Screenshots or Test Results / 运行截图或测试结果

📺 前端 UI 本地测试效果

关于 UI 文本显示的说明：
由于本地开发环境直接拉取了官方云端的 dist.zip 静态包，并未在本地重新编译前端，因此页面上会显示源文件的 raw key（如 context_limit_type）。
我在代码中已经完整补全了 zh-CN/config-metadata.json 和 en-US/config-metadata.json 的多语言配置，待本 PR 合并、前端重新 build 发布后即可正常显示对应的中英文标签。

Checklist / 检查清单

😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能，已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试，并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in requirements.txt and pyproject.toml.
/ 我确保没有引入新依赖库，或者引入了新依赖库的同时将其添加到 requirements.txt 和 pyproject.toml 文件相应位置。
😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。

Summary by Sourcery

Add a token-threshold-based context compression mode with safeguards against infinite compression loops, and wire it through configuration, runtime, and UI metadata.

New Features:

Introduce a configurable context compression trigger mode that can operate either by turn-based percentage of model window or by an absolute token threshold.
Allow users to set a fixed token threshold for triggering context compression when token-based mode is selected in provider settings.

Bug Fixes:

Prevent infinite or meaningless compression loops by skipping compression and truncation when there are no compressible user/assistant messages.

Enhancements:

Adjust compression flow to respect the selected context limit mode, including token counting, post-compression checks, and halving truncation behavior.
Propagate new context compression settings through the main agent build configuration, internal pipeline stages, and tool-loop agent runner.
Update configuration metadata and conditional UI schema so that token-threshold fields are only shown when token-based compression mode is enabled.

sourcery-ai

Hey - I've left some high level feedback:

The context_limit_type string and the default compression_token_threshold value 4000 are duplicated across several modules (ContextConfig, MainAgentBuildConfig, tool_loop_agent_runner.reset, pipeline stage settings, default config); consider centralizing these into shared constants or config defaults to avoid drift and make future changes safer.
In ContextManager.process, turning off enforce_max_turns entirely when context_limit_type == "token" may make it harder to cap pathological long-turn histories; it might be worth allowing token-based compression and a hard max-turns cap to coexist, or at least documenting why they must be mutually exclusive.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The `context_limit_type` string and the default `compression_token_threshold` value `4000` are duplicated across several modules (`ContextConfig`, `MainAgentBuildConfig`, `tool_loop_agent_runner.reset`, pipeline stage settings, default config); consider centralizing these into shared constants or config defaults to avoid drift and make future changes safer.
- In `ContextManager.process`, turning off `enforce_max_turns` entirely when `context_limit_type == "token"` may make it harder to cap pathological long-turn histories; it might be worth allowing token-based compression and a hard max-turns cap to coexist, or at least documenting why they must be mutually exclusive.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces a new "token" context limit mode alongside the default "turn" mode, allowing context compression to be triggered by an absolute token threshold. The reviewer suggested optimizing the compression and truncation checks in the default "turn" mode by ensuring that these operations are only triggered when there are actual compressible messages (user or assistant) present, preventing redundant operations and unnecessary logs when only a long system prompt is in the context.

gemini-code-assist · 2026-05-26T17:35:16Z

+                if self.config.context_limit_type == "token":
+                    if (
+                        self._has_compressible_messages(result)
+                        and total_tokens >= self.config.compression_token_threshold
+                    ):
+                        result = await self._run_compression(result, total_tokens)
+                else:
+                    if self.compressor.should_compress(
+                        result, total_tokens, self.config.max_context_tokens
+                    ):
+                        result = await self._run_compression(result, total_tokens)


在默认的按轮次压缩模式（context_limit_type != "token"）下，如果系统提示词（System Prompt）非常长且超过了 max_context_tokens，而上下文中又没有可压缩的 user 或 assistant 消息时，self.compressor.should_compress 仍会返回 True。这会导致在每次处理消息时都无意义地触发 _run_compression，产生不必要的日志和 CPU 开销。

建议在 else 分支中也加入 self._has_compressible_messages(result) 的判断，以完全避免这种情况。

Suggested change

if self.config.context_limit_type == "token":

if (

self._has_compressible_messages(result)

and total_tokens >= self.config.compression_token_threshold

):

result = await self._run_compression(result, total_tokens)

else:

if self.compressor.should_compress(

result, total_tokens, self.config.max_context_tokens

):

result = await self._run_compression(result, total_tokens)

if self.config.context_limit_type == "token":

if (

self._has_compressible_messages(result)

and total_tokens >= self.config.compression_token_threshold

):

result = await self._run_compression(result, total_tokens)

else:

if (

self._has_compressible_messages(result)

and self.compressor.should_compress(

result, total_tokens, self.config.max_context_tokens

)

):

result = await self._run_compression(result, total_tokens)

gemini-code-assist · 2026-05-26T17:35:17Z

+        if self.config.context_limit_type == "token":
+            if (
+                self._has_compressible_messages(messages)
+                and tokens_after_summary >= self.config.compression_token_threshold
+            ):
+                logger.info(
+                    "Context still exceeds compression threshold after compression, applying halving truncation..."
+                )
+                messages = self.truncator.truncate_by_halving(messages)
+        else:
+            if self.compressor.should_compress(
+                messages, tokens_after_summary, self.config.max_context_tokens
+            ):
+                logger.info(
+                    "Context still exceeds max tokens after compression, applying halving truncation..."
+                )
+                messages = self.truncator.truncate_by_halving(messages)


同样地，在 _run_compression 的最后检查中，如果 context_limit_type != "token" 且没有可压缩的消息，但系统提示词依然超限，程序会尝试进行减半截断（truncate_by_halving），并打印无意义的日志。

建议在 else 分支中也加入 self._has_compressible_messages(messages) 的判断。

Suggested change

if self.config.context_limit_type == "token":

if (

self._has_compressible_messages(messages)

and tokens_after_summary >= self.config.compression_token_threshold

):

logger.info(

"Context still exceeds compression threshold after compression, applying halving truncation..."

)

messages = self.truncator.truncate_by_halving(messages)

else:

if self.compressor.should_compress(

messages, tokens_after_summary, self.config.max_context_tokens

):

logger.info(

"Context still exceeds max tokens after compression, applying halving truncation..."

)

messages = self.truncator.truncate_by_halving(messages)

if self.config.context_limit_type == "token":

if (

self._has_compressible_messages(messages)

and tokens_after_summary >= self.config.compression_token_threshold

):

logger.info(

"Context still exceeds compression threshold after compression, applying halving truncation..."

)

messages = self.truncator.truncate_by_halving(messages)

else:

if (

self._has_compressible_messages(messages)

and self.compressor.should_compress(

messages, tokens_after_summary, self.config.max_context_tokens

)

):

logger.info(

"Context still exceeds max tokens after compression, applying halving truncation..."

)

messages = self.truncator.truncate_by_halving(messages)

…constants

Backend add token compression feat

20ba952

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. area:core The bug / feature is about astrbot's core, backend area:provider The bug / feature is about AI Provider, Models, LLM Agent, LLM Agent Runner. labels May 26, 2026

sourcery-ai Bot reviewed May 26, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 26, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 27, 2026

🦞 OpenClaw 生态日报 2026-05-27 ivanweng2077/big_model_radar#97

Open

dosubot Bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 27, 2026

chore: centralize context_limit_type and compression_token_threshold …

a900d79

…constants

github-actions Bot mentioned this pull request May 28, 2026

🦞 OpenClaw 生态日报 2026-05-28 ivanweng2077/big_model_radar#102

Open

Soulter force-pushed the master branch 3 times, most recently from a4c4a7d to 9bd38ca Compare May 28, 2026 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: 新增按 Token 阈值触发上下文压缩及防死循环机制#8363

feat: 新增按 Token 阈值触发上下文压缩及防死循环机制#8363
Newtonian-No wants to merge 2 commits into
AstrBotDevs:masterfrom
Newtonian-No:feat/token-based-compression

Newtonian-No commented May 26, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

gemini-code-assist Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Newtonian-No commented May 26, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

动机与背景 (Motivation)

改动点 (Changes)

待解决问题 (Help Wanted: Frontend i18n / Build)

Screenshots or Test Results / 运行截图或测试结果

📺 前端 UI 本地测试效果

Checklist / 检查清单

Summary by Sourcery

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Newtonian-No commented May 26, 2026 •

edited by sourcery-ai Bot

Loading