Conversation
There was a problem hiding this comment.
Pull request overview
This PR removes the need to manually tag each tool with an EffectKind by adding auto-classification to the Python bindings and updating middleware/helpers and docs to accept raw callables.
Changes:
- Added an auto-classification system (
effect_log.classify) and a PythonEffectLogwrapper that supports AUTO / MANUAL / HYBRID modes. - Updated middleware
make_tooldefs()/make_tools()helpers (and examples/docs) to accept raw callables with inferred effect kinds. - Added/expanded Python tests covering classification behavior, modes, decorators, and middleware acceptance of raw callables.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| examples/crash_recovery.py | Uses EffectLog.manual() to keep the demo explicitly effect-tagged. |
| examples/README.md | Documents callable-based usage and middleware auto-classification. |
| bindings/python/tests/test_middleware.py | Adds middleware tests for raw-callable inputs and minor cleanups. |
| bindings/python/tests/test_classify_mode.py | Adds tests for AUTO/MANUAL/HYBRID validation semantics. |
| bindings/python/tests/test_classify.py | Adds unit tests for heuristic + LLM (mocked) classification APIs. |
| bindings/python/tests/test_basic.py | Adds integration tests for decorators + raw callable registration and recovery. |
| bindings/python/python/effect_log/middleware/pydantic_ai.py | make_tooldefs() now supports raw callables + optional explicit effects. |
| bindings/python/python/effect_log/middleware/openai_agents.py | make_tools() now supports raw callables + optional explicit effects. |
| bindings/python/python/effect_log/middleware/langgraph.py | Adds auto-classify support for raw tool objects in wrappers/tooldef creation. |
| bindings/python/python/effect_log/middleware/crewai.py | Adds auto-classify support for raw tool objects in tooldef creation. |
| bindings/python/python/effect_log/middleware/bub.py | Adds auto-classify support for raw Bub tool classes in tooldef creation. |
| bindings/python/python/effect_log/middleware/anthropic.py | make_tooldefs() now supports raw callables + optional explicit effects. |
| bindings/python/python/effect_log/middleware/init.py | Updates middleware package docs to mention auto-classification support. |
| bindings/python/python/effect_log/effect_log_native.pyi | Notes that the Python wrapper is preferred for auto-classification support. |
| bindings/python/python/effect_log/classify.py | Implements heuristic + optional LLM classifier and batch tooling/reporting. |
| bindings/python/python/effect_log/init.py | Introduces wrapper EffectLog, ClassifyMode, and decorators for auto-classify. |
| README.md | Updates public docs to highlight AUTO/MANUAL/HYBRID usage and inspection/overrides. |
Comments suppressed due to low confidence (1)
bindings/python/python/effect_log/middleware/langgraph.py:17
- The usage example assigns
wrappedtwice in a row, which makes the snippet confusing (the first assignment is immediately overwritten). Consider keeping just one variant (raw tools vs explicit spec dicts) or renaming the variables so both examples can coexist clearly.
# Option 1: Wrap existing LangChain tools (auto-classified or explicit)
wrapped = effect_logged_tools(log, [search_tool, send_email_tool])
wrapped = effect_logged_tools(log, [
{"tool": search_tool, "effect": EffectKind.ReadOnly},
{"tool": send_email_tool, "effect": EffectKind.IrreversibleWrite},
])
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Log the classification | ||
| if confidence >= 0.6: | ||
| logger.info("%s -> %s (%.2f, %s)", fname, _kind_name(kind), confidence, reason) | ||
| else: | ||
| logger.warning( | ||
| "%s -> %s (%.2f, %s — consider specifying explicitly)", | ||
| fname, | ||
| _kind_name(kind), | ||
| confidence, | ||
| reason, | ||
| ) |
There was a problem hiding this comment.
classify_effect_kind() will log a WARNING for common cases where only the name-prefix layer matches: the best possible confidence from name-only is 0.50 (weight), which is always below the 0.6 INFO threshold. This likely produces noisy warnings even for clearly classifiable tools. Consider adjusting the logging threshold (e.g., <=0.5), or redefining confidence so a strong single-layer match can reach the INFO level.
| report = ClassificationReport() | ||
| for func in funcs: | ||
| name = getattr(func, "__name__", str(func)) | ||
| result = classify_effect_kind(func, name) | ||
| report.results[name] = result | ||
| report._funcs[name] = func | ||
| return report |
There was a problem hiding this comment.
classify_tools() stores results in dicts keyed only by func.__name__, so multiple callables with the same name will silently overwrite earlier entries (and report.apply() would drop tools). Consider disambiguating duplicates (e.g., include module + qualname, or detect collisions and raise).
Auto-classify EffectKind: Make It Invisible
Summary
The biggest adoption barrier was requiring a manual EffectKind tag for every tool. This PR removes it entirely — users just pass functions, effect-log handles classification automatically.
Before:
After:
What's new
Safety