Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

Full release notes with details on each version: [GitHub Releases](https://github.com/safishamsi/graphify/releases)

## 0.9.2 (2026-06-29)

- Feat: type-aware Ruby member-call resolution (#1499, thanks @vamsipavanmahesh). `p.run` is now resolved by the inferred type of the receiver (`p = Processor.new` ⇒ `Processor#run`) instead of by globally-unique method name, so the edge survives name collisions (an unrelated `Worker#run` no longer makes it ambiguous) and never points at the wrong method. Introduces a small resolver-registry framework that the existing Swift (#1356) and Python (#1446) cross-file passes register into. Receiver types are inferred only from unambiguous local `var = ClassName.new` bindings; a call whose receiver type can't be proven resolves to nothing rather than to a guess — a deliberate precision-over-recall change for Ruby member calls.
- Feat: resolve workspace imports through the package's `exports` map (#1308, thanks @guyoron1). A subpath import like `import { x } from "@scope/pkg/browser"` now resolves through the package.json `exports` map (string values, condition objects, nested conditions, and `./*` wildcard patterns) instead of falling back to a bare path string, falling back to the existing bare-path/index resolution when there's no exports map or no match. `default` is consulted last (Node's catch-all), and an export target that escapes the package directory is rejected.
- Fix: import edges silently dropped on codebases using tsconfig path aliases or workspace packages (#1529), a regression from the 0.9.0 full-repo-relative node-ID change. Relative imports resolve to repo-relative paths and matched fine, but alias (`@/lib/utils`) and workspace imports resolve to absolute paths, so the import-target ID baked in the on-disk prefix and no longer matched the repo-relative definition node — the edge was dropped at build (common on Next.js/SvelteKit). The id-remap post-pass now also registers the absolute-resolved form, so alias/workspace import targets land on the real node again.
- Fix: tsconfig `compilerOptions.paths` fallback targets are now honored (#1531, thanks @oleksii-tumanov). A `paths` value is an ordered list (`"@app/*": ["src/app/*", "lib/app/*"]`) that `tsc` tries in turn; graphify kept only the first entry, so an import whose file lived at a later target was dropped or misresolved. Each target is now tried in order and the first that resolves to a real file wins (no false edge when none exist).
- Fix: the semantic (LLM) extraction cache is now pruned (#1527, thanks @mwolter805). The AST cache was version-swept but the content-hash-keyed semantic cache had no cleanup, so every content change or file deletion left an orphan entry and `graphify-out/cache/semantic/` grew unbounded. Orphan entries are now removed at the end of `extract`, computed against the full live document set (not the incremental changed subset, which would have evicted still-valid entries) and only touching `cache/semantic/`; the cache stays unversioned so releases never re-bill LLM extraction.
- Fix: three Objective-C extractor bugs (#1475, thanks @JabberYQ for the detailed report and test repo). (1) `.h` headers using `NS_ASSUME_NONNULL_BEGIN` before `@interface` produced no class node — tree-sitter-objc can't expand the argument-less macro and fails to emit a `class_interface` node at all, so the macro is now blanked (offset-preserving) before parsing. (2) Quoted `#import "X.h"` edges dangled once a `.h`/`.m` pair existed (the bare-stem target was salted away during id-disambiguation); imports now resolve to the real header file node, fixing the equivalent latent C `#include` bug too. (3) `[[Foo alloc] init]` now emits a `references` edge to the allocated class, resolved only to an unambiguous class (no false edges). Dot-syntax property accesses and `@selector(...)` target-action edges remain follow-ups.
- Fix: Swift type-qualified static calls now resolve as EXTRACTED rather than INFERRED (#1533, thanks @JabberYQ). `SessionType.staticMethod()` / `Singleton.shared.method()` name the receiver type explicitly in source, so the resolved edge is an exact reference, matching the Python qualified-class-method pass; instance calls typed via local inference (`obj.method()`) stay INFERRED.
- Fix: enforce the API timeout in the secondary LLM dispatch path (#1442, thanks @DhruvTilva). `_call_llm` (used by the dedup LLM tiebreaker) built its Anthropic/OpenAI clients without `timeout`, so requests there ignored `GRAPHIFY_API_TIMEOUT` and could hang — it now passes the timeout like the primary extraction paths.
- Fix: `to_graphml` no longer raises `ValueError` on a node/edge with a `None` attribute value — null fields are coerced to `""` before writing (#1502, thanks @antonioscarinci).
- Feat: `graphify save-result` accepts `--answer-file` as an alternative to `--answer`, so a long or multi-line answer can be read from a file instead of an inline shell argument (#1502, thanks @antonioscarinci).
- Fix: generated install/skill guidance is now host-generic (#1530, thanks @ari-mitophane). The wording no longer tells agents to invoke a literal `skill` tool with `skill: "graphify"` (host-specific and invalid in many environments); it now points to the installed graphify skill or instructions.
- Security: bump `msgpack` to 1.2.1 (GHSA-6v7p-g79w-8964) and `pydantic-settings` to 2.14.2 (GHSA-4xgf-cpjx-pc3j), and drop the unused `safety` dev dependency, which only pulled in `nltk` (an unpatched HIGH advisory). All transitive; the two HIGH-severity ones were dev-tooling only and never in the published wheel. `pip-audit` (already run in CI) continues to provide dependency-CVE scanning.

## 0.9.1 (2026-06-28)

- Fix: rate-limited (HTTP 429) extraction chunks are now retried instead of dropped (#1523, thanks @bercedev). The provider SDKs back off and honor `Retry-After`, but the SDK default of 2 retries was too low for strict per-org concurrency/RPM caps (e.g. Moonshot/kimi), so a parallel `extract` 429'd, each chunk logged `chunk N failed`, and was silently lost (incomplete graph + console spam). The OpenAI-compatible, Azure, and Anthropic clients are now built with a higher `max_retries` (default 6, override via `GRAPHIFY_MAX_RETRIES`). For very tight accounts, `--max-concurrency 1` further reduces the concurrency that triggers org-level limits.
Expand Down
85 changes: 75 additions & 10 deletions graphify/extract.py
Original file line number Diff line number Diff line change
Expand Up @@ -7883,6 +7883,24 @@ def _disambiguate_colliding_node_ids(
if len(candidates) == 1:
unambiguous_remaps[old_id] = next(iter(candidates))

# A C/ObjC/C++ `#include "foo.h"` / `#import "foo.h"` resolves to the header's
# file node, but `foo.h` and its sibling `foo.c`/`foo.m`/`foo.cpp` collapse to
# the same `foo` file id, so disambiguation salts them apart by path. A
# cross-file import edge from a THIRD file carries neither salt's source_key, so
# the (target, edge_source_key) lookup misses and the edge dangles on the now
# dead `foo` id. Repoint those import edges to the HEADER variant (the include
# always targeted the header), keyed by the original colliding id (#1475).
_HEADER_SUFFIXES = (".h", ".hpp", ".hh", ".hxx")
header_remaps: dict[str, str] = {}
for old_id in ambiguous_ids:
for node in by_id.get(old_id, []):
sk = _node_disambiguation_source_key(node, root)
if sk and Path(sk).suffix.lower() in _HEADER_SUFFIXES:
new_id = remap.get((old_id, sk))
if new_id:
header_remaps[old_id] = new_id
break

for edge in edges:
edge_source_key = _source_key(str(edge.get("source_file", "")), root)
source_key = (edge.get("source", ""), edge_source_key)
Expand All @@ -7891,7 +7909,15 @@ def _disambiguate_colliding_node_ids(
edge["source"] = remap[source_key]
elif edge.get("source") in unambiguous_remaps:
edge["source"] = unambiguous_remaps[str(edge["source"])]
if target_key in remap:
# imports/imports_from always target a header file, so they must resolve to
# the header variant BEFORE the same-source-file salt is considered. Keying
# the import target by the importer's own source file mis-points a `.m`
# importing its own `.h` back at itself (self-loop), and is wrong for any
# cross-file import whose importer shares the colliding id (#1475).
if (edge.get("relation") in ("imports", "imports_from")
and edge.get("target") in header_remaps):
edge["target"] = header_remaps[str(edge["target"])]
elif target_key in remap:
edge["target"] = remap[target_key]
elif edge.get("target") in unambiguous_remaps:
edge["target"] = unambiguous_remaps[str(edge["target"])]
Expand Down Expand Up @@ -9466,9 +9492,10 @@ def _resolve_swift_member_calls(
(#543/#1219). Swift extractors record the receiver of each member call and a
per-file ``name -> type`` table (``swift_type_table``); this pass uses them to
type the receiver, then emits an edge ONLY when that type name resolves to
exactly one definition. Everything it adds is INFERRED (type inference, not an
explicit import), and the line-12503 drop stays intact: this is purely
additive and fires only on receiver-typed Swift calls.
exactly one definition. A type-qualified call (``Type.staticMethod()``) is
EXTRACTED (the type is named explicitly in source); an instance call typed via
local inference (``obj.method()``) is INFERRED. The shared-pass member-call drop
stays intact: this is purely additive and fires only on receiver-typed Swift calls.

Must run after id-disambiguation so node ids and caller_nids are final.
"""
Expand Down Expand Up @@ -9525,8 +9552,10 @@ def _key(label: str) -> str:
# declaring file's local type table.
if receiver[:1].isupper():
type_name = receiver
type_qualified = True
else:
type_name = type_table_by_file.get(rc.get("source_file", ""), {}).get(receiver)
type_qualified = False
if not type_name:
continue
type_defs = type_def_nids.get(_key(type_name), [])
Expand All @@ -9542,13 +9571,17 @@ def _key(label: str) -> str:
if target == caller or (caller, target) in existing_pairs:
continue
existing_pairs.add((caller, target))
# A type-qualified call (`Type.staticMethod()`) names the receiver type
# explicitly in source, so it is an exact reference — EXTRACTED, matching
# the Python qualified-class-method pass (#1533). An instance call whose
# receiver type came from local inference (`obj.method()`) stays INFERRED.
all_edges.append({
"source": caller,
"target": target,
"relation": relation,
"context": "call",
"confidence": "INFERRED",
"confidence_score": 0.8,
"confidence": "EXTRACTED" if type_qualified else "INFERRED",
"confidence_score": 1.0 if type_qualified else 0.8,
"source_file": rc.get("source_file", ""),
"source_location": rc.get("source_location"),
"weight": 1.0,
Expand Down Expand Up @@ -9673,6 +9706,13 @@ def extract_objc(path: Path) -> dict:
language = Language(tsobjc.language())
parser = Parser(language)
source = path.read_bytes()
# tree-sitter-objc cannot expand these argument-less annotation macros (no
# trailing ';'), and their presence before @interface makes the parser fail to
# emit a class_interface node (#1475). Blank them to equal-length spaces so byte
# offsets / line numbers are preserved and the interface parses.
_OBJC_BLANK_MACROS = (b"NS_ASSUME_NONNULL_BEGIN", b"NS_ASSUME_NONNULL_END")
for _m in _OBJC_BLANK_MACROS:
source = source.replace(_m, b" " * len(_m))
tree = parser.parse(source)
root = tree.root_node
except Exception as e:
Expand Down Expand Up @@ -9765,10 +9805,18 @@ def walk(node, parent_nid: str | None = None) -> None:
for sub in child.children:
if sub.type == "string_content":
raw = _read(sub)
module = raw.split("/")[-1].replace(".h", "")
if module:
tgt_nid = _make_id(module)
add_edge(file_nid, tgt_nid, "imports", line, context="import")
# Resolve the quoted include to a real file so the target id
# matches the (possibly disambiguated) node id _make_id gives
# that file; the bare-stem id never survives
# _disambiguate_colliding_node_ids when a .h/.m pair exists,
# so the edge dangled and was dropped (#1475).
resolved = _resolve_c_include_path(raw, str_path)
if resolved is not None:
add_edge(file_nid, _make_id(str(resolved)), "imports", line, context="import")
else:
module = raw.split("/")[-1].replace(".h", "")
if module:
add_edge(file_nid, _make_id(module), "imports", line, context="import")
return

if t == "module_import":
Expand Down Expand Up @@ -9903,6 +9951,23 @@ def walk(node, parent_nid: str | None = None) -> None:
for caller_nid, body_node in method_bodies:
def walk_calls(n) -> None:
if n.type == "message_expression":
# `[[Foo alloc] init]` is a message_expression whose method is the
# identifier `alloc` and whose receiver is the bare class identifier
# `Foo`; resolve that class name and emit a `references` edge so the
# allocating method links to the allocated type. ensure_named_node
# emits a sourceless stub for unknown names, which the corpus rewire
# collapses ONLY when exactly one real class of that name exists, so an
# unknown/ambiguous class produces no false resolved edge (#1475).
meth = n.child_by_field_name("method")
recv = n.child_by_field_name("receiver")
if (meth is not None and meth.type == "identifier" and _read(meth) == "alloc"
and recv is not None and recv.type == "identifier"):
tname = _read(recv)
ref_line = n.start_point[0] + 1
type_nid = ensure_named_node(tname, ref_line)
if type_nid != caller_nid:
edges.append(_semantic_reference_edge(
caller_nid, type_nid, "type", str_path, ref_line))
# [receiver sel] and [receiver kw1:a kw2:b] both parse to a
# message_expression whose selector parts carry the field name
# "method" (one for a simple selector, several for a compound one);
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "graphifyy"
version = "0.9.1"
version = "0.9.2"
description = "AI coding assistant skill (Claude Code, CodeBuddy, Codex, OpenCode, Kilo Code, Cursor, Gemini CLI, Aider, OpenClaw, Factory Droid, Trae, Hermes, Kiro, Pi, Devin CLI, Google Antigravity) - turn any folder of code, docs, papers, images, or videos into a queryable knowledge graph"
readme = "README.md"
license = { file = "LICENSE" }
Expand Down
Loading
Loading