Skip to content

Auto-classify EffectKind#8

Open
xudong963 wants to merge 3 commits intomasterfrom
auto
Open

Auto-classify EffectKind#8
xudong963 wants to merge 3 commits intomasterfrom
auto

Conversation

@xudong963
Copy link
Copy Markdown
Owner

Auto-classify EffectKind: Make It Invisible

Summary

The biggest adoption barrier was requiring a manual EffectKind tag for every tool. This PR removes it entirely — users just pass functions, effect-log handles classification automatically.

Before:

  tools = [
      ToolDef("search_db",  EffectKind.ReadOnly,         search_db),
      ToolDef("send_email", EffectKind.IrreversibleWrite, send_email),
      ToolDef("upsert",     EffectKind.IdempotentWrite,   upsert_record),
  ]
  log = EffectLog("task-001", tools=tools)

After:

  log = EffectLog("task-001", tools=[search_db, send_email, upsert_record])

What's new

  • classify.py — 4-layer weighted heuristic classifier:
    • Name prefix matching (weight 0.50): search_ → ReadOnly, send_ → IrreversibleWrite, upsert_ → IdempotentWrite, etc.
    • Docstring keyword analysis (0.25)
    • Parameter name signals (0.15): to/recipient → IrreversibleWrite, query → ReadOnly
    • Source AST patterns (0.10): requests.post() → IrreversibleWrite
  • EffectLog Python wrapper — accepts raw callables alongside ToolDef. Auto-classifies callables, with overrides={} for corrections.
  • @ tool() / @ auto_tool — decorators now work without explicit EffectKind.
  • All 6 middleware updated — make_tooldefs() / make_tools() accept raw callables. Backward compatible: dicts with explicit "effect" still work.
  • classify_tools() — batch inspection API with printable report and .apply(overrides=).

Safety

  • Low confidence → defaults to IrreversibleWrite (never re-executes ambiguous tools)
  • Compensatable auto-downgrades to IrreversibleWrite (requires compensation fn)
  • Explicit always wins: overrides=, ToolDef(kind=), @tool(EffectKind.X) bypass classification
  • All classifications logged via effect_log.classify logger (INFO ≥ 0.6, WARNING < 0.6)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the need to manually tag each tool with an EffectKind by adding auto-classification to the Python bindings and updating middleware/helpers and docs to accept raw callables.

Changes:

  • Added an auto-classification system (effect_log.classify) and a Python EffectLog wrapper that supports AUTO / MANUAL / HYBRID modes.
  • Updated middleware make_tooldefs() / make_tools() helpers (and examples/docs) to accept raw callables with inferred effect kinds.
  • Added/expanded Python tests covering classification behavior, modes, decorators, and middleware acceptance of raw callables.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
examples/crash_recovery.py Uses EffectLog.manual() to keep the demo explicitly effect-tagged.
examples/README.md Documents callable-based usage and middleware auto-classification.
bindings/python/tests/test_middleware.py Adds middleware tests for raw-callable inputs and minor cleanups.
bindings/python/tests/test_classify_mode.py Adds tests for AUTO/MANUAL/HYBRID validation semantics.
bindings/python/tests/test_classify.py Adds unit tests for heuristic + LLM (mocked) classification APIs.
bindings/python/tests/test_basic.py Adds integration tests for decorators + raw callable registration and recovery.
bindings/python/python/effect_log/middleware/pydantic_ai.py make_tooldefs() now supports raw callables + optional explicit effects.
bindings/python/python/effect_log/middleware/openai_agents.py make_tools() now supports raw callables + optional explicit effects.
bindings/python/python/effect_log/middleware/langgraph.py Adds auto-classify support for raw tool objects in wrappers/tooldef creation.
bindings/python/python/effect_log/middleware/crewai.py Adds auto-classify support for raw tool objects in tooldef creation.
bindings/python/python/effect_log/middleware/bub.py Adds auto-classify support for raw Bub tool classes in tooldef creation.
bindings/python/python/effect_log/middleware/anthropic.py make_tooldefs() now supports raw callables + optional explicit effects.
bindings/python/python/effect_log/middleware/init.py Updates middleware package docs to mention auto-classification support.
bindings/python/python/effect_log/effect_log_native.pyi Notes that the Python wrapper is preferred for auto-classification support.
bindings/python/python/effect_log/classify.py Implements heuristic + optional LLM classifier and batch tooling/reporting.
bindings/python/python/effect_log/init.py Introduces wrapper EffectLog, ClassifyMode, and decorators for auto-classify.
README.md Updates public docs to highlight AUTO/MANUAL/HYBRID usage and inspection/overrides.
Comments suppressed due to low confidence (1)

bindings/python/python/effect_log/middleware/langgraph.py:17

  • The usage example assigns wrapped twice in a row, which makes the snippet confusing (the first assignment is immediately overwritten). Consider keeping just one variant (raw tools vs explicit spec dicts) or renaming the variables so both examples can coexist clearly.
    # Option 1: Wrap existing LangChain tools (auto-classified or explicit)
    wrapped = effect_logged_tools(log, [search_tool, send_email_tool])
    wrapped = effect_logged_tools(log, [
        {"tool": search_tool, "effect": EffectKind.ReadOnly},
        {"tool": send_email_tool, "effect": EffectKind.IrreversibleWrite},
    ])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +530 to +540
# Log the classification
if confidence >= 0.6:
logger.info("%s -> %s (%.2f, %s)", fname, _kind_name(kind), confidence, reason)
else:
logger.warning(
"%s -> %s (%.2f, %s — consider specifying explicitly)",
fname,
_kind_name(kind),
confidence,
reason,
)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classify_effect_kind() will log a WARNING for common cases where only the name-prefix layer matches: the best possible confidence from name-only is 0.50 (weight), which is always below the 0.6 INFO threshold. This likely produces noisy warnings even for clearly classifiable tools. Consider adjusting the logging threshold (e.g., <=0.5), or redefining confidence so a strong single-layer match can reach the INFO level.

Copilot uses AI. Check for mistakes.
Comment on lines +607 to +613
report = ClassificationReport()
for func in funcs:
name = getattr(func, "__name__", str(func))
result = classify_effect_kind(func, name)
report.results[name] = result
report._funcs[name] = func
return report
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

classify_tools() stores results in dicts keyed only by func.__name__, so multiple callables with the same name will silently overwrite earlier entries (and report.apply() would drop tools). Consider disambiguating duplicates (e.g., include module + qualname, or detect collisions and raise).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants