Auto-classify EffectKind by xudong963 · Pull Request #8 · xudong963/effect-log

xudong963 · 2026-03-16T14:30:27Z

Auto-classify EffectKind: Make It Invisible

Summary

The biggest adoption barrier was requiring a manual EffectKind tag for every tool. This PR removes it entirely — users just pass functions, effect-log handles classification automatically.

Before:

  tools = [
      ToolDef("search_db",  EffectKind.ReadOnly,         search_db),
      ToolDef("send_email", EffectKind.IrreversibleWrite, send_email),
      ToolDef("upsert",     EffectKind.IdempotentWrite,   upsert_record),
  ]
  log = EffectLog("task-001", tools=tools)

After:

  log = EffectLog("task-001", tools=[search_db, send_email, upsert_record])

What's new

classify.py — 4-layer weighted heuristic classifier:
- Name prefix matching (weight 0.50): search_ → ReadOnly, send_ → IrreversibleWrite, upsert_ → IdempotentWrite, etc.
- Docstring keyword analysis (0.25)
- Parameter name signals (0.15): to/recipient → IrreversibleWrite, query → ReadOnly
- Source AST patterns (0.10): requests.post() → IrreversibleWrite
EffectLog Python wrapper — accepts raw callables alongside ToolDef. Auto-classifies callables, with overrides={} for corrections.
@ tool() / @ auto_tool — decorators now work without explicit EffectKind.
All 6 middleware updated — make_tooldefs() / make_tools() accept raw callables. Backward compatible: dicts with explicit "effect" still work.
classify_tools() — batch inspection API with printable report and .apply(overrides=).

Safety

Low confidence → defaults to IrreversibleWrite (never re-executes ambiguous tools)
Compensatable auto-downgrades to IrreversibleWrite (requires compensation fn)
Explicit always wins: overrides=, ToolDef(kind=), @tool(EffectKind.X) bypass classification
All classifications logged via effect_log.classify logger (INFO ≥ 0.6, WARNING < 0.6)

Copilot

Pull request overview

This PR removes the need to manually tag each tool with an EffectKind by adding auto-classification to the Python bindings and updating middleware/helpers and docs to accept raw callables.

Changes:

Added an auto-classification system (effect_log.classify) and a Python EffectLog wrapper that supports AUTO / MANUAL / HYBRID modes.
Updated middleware make_tooldefs() / make_tools() helpers (and examples/docs) to accept raw callables with inferred effect kinds.
Added/expanded Python tests covering classification behavior, modes, decorators, and middleware acceptance of raw callables.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
examples/crash_recovery.py	Uses `EffectLog.manual()` to keep the demo explicitly effect-tagged.
examples/README.md	Documents callable-based usage and middleware auto-classification.
bindings/python/tests/test_middleware.py	Adds middleware tests for raw-callable inputs and minor cleanups.
bindings/python/tests/test_classify_mode.py	Adds tests for AUTO/MANUAL/HYBRID validation semantics.
bindings/python/tests/test_classify.py	Adds unit tests for heuristic + LLM (mocked) classification APIs.
bindings/python/tests/test_basic.py	Adds integration tests for decorators + raw callable registration and recovery.
bindings/python/python/effect_log/middleware/pydantic_ai.py	`make_tooldefs()` now supports raw callables + optional explicit effects.
bindings/python/python/effect_log/middleware/openai_agents.py	`make_tools()` now supports raw callables + optional explicit effects.
bindings/python/python/effect_log/middleware/langgraph.py	Adds auto-classify support for raw tool objects in wrappers/tooldef creation.
bindings/python/python/effect_log/middleware/crewai.py	Adds auto-classify support for raw tool objects in tooldef creation.
bindings/python/python/effect_log/middleware/bub.py	Adds auto-classify support for raw Bub tool classes in tooldef creation.
bindings/python/python/effect_log/middleware/anthropic.py	`make_tooldefs()` now supports raw callables + optional explicit effects.
bindings/python/python/effect_log/middleware/init.py	Updates middleware package docs to mention auto-classification support.
bindings/python/python/effect_log/effect_log_native.pyi	Notes that the Python wrapper is preferred for auto-classification support.
bindings/python/python/effect_log/classify.py	Implements heuristic + optional LLM classifier and batch tooling/reporting.
bindings/python/python/effect_log/init.py	Introduces wrapper `EffectLog`, `ClassifyMode`, and decorators for auto-classify.
README.md	Updates public docs to highlight AUTO/MANUAL/HYBRID usage and inspection/overrides.

Comments suppressed due to low confidence (1)

bindings/python/python/effect_log/middleware/langgraph.py:17

The usage example assigns wrapped twice in a row, which makes the snippet confusing (the first assignment is immediately overwritten). Consider keeping just one variant (raw tools vs explicit spec dicts) or renaming the variables so both examples can coexist clearly.

    # Option 1: Wrap existing LangChain tools (auto-classified or explicit)
    wrapped = effect_logged_tools(log, [search_tool, send_email_tool])
    wrapped = effect_logged_tools(log, [
        {"tool": search_tool, "effect": EffectKind.ReadOnly},
        {"tool": send_email_tool, "effect": EffectKind.IrreversibleWrite},
    ])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T14:37:52Z

+    # Log the classification
+    if confidence >= 0.6:
+        logger.info("%s -> %s (%.2f, %s)", fname, _kind_name(kind), confidence, reason)
+    else:
+        logger.warning(
+            "%s -> %s (%.2f, %s — consider specifying explicitly)",
+            fname,
+            _kind_name(kind),
+            confidence,
+            reason,
+        )


classify_effect_kind() will log a WARNING for common cases where only the name-prefix layer matches: the best possible confidence from name-only is 0.50 (weight), which is always below the 0.6 INFO threshold. This likely produces noisy warnings even for clearly classifiable tools. Consider adjusting the logging threshold (e.g., <=0.5), or redefining confidence so a strong single-layer match can reach the INFO level.

Copilot · 2026-03-22T14:37:52Z

+    report = ClassificationReport()
+    for func in funcs:
+        name = getattr(func, "__name__", str(func))
+        result = classify_effect_kind(func, name)
+        report.results[name] = result
+        report._funcs[name] = func
+    return report


classify_tools() stores results in dicts keyed only by func.__name__, so multiple callables with the same name will silently overwrite earlier entries (and report.apply() would drop tools). Consider disambiguating duplicates (e.g., include module + qualname, or detect collisions and raise).

xudong963 added 3 commits March 16, 2026 22:27

Auto-classify EffectKind

7753e68

update

cffa48b

add manually

d91882c

xudong963 requested a review from Copilot March 22, 2026 14:34

Copilot started reviewing on behalf of xudong963 March 22, 2026 14:35 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-classify EffectKind#8

Auto-classify EffectKind#8
xudong963 wants to merge 3 commits intomasterfrom
auto

xudong963 commented Mar 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xudong963 commented Mar 16, 2026

Summary

What's new

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants