Skip to content

[pull] v8 from safishamsi:v8#100

Merged
pull[bot] merged 23 commits into
miqdigital:v8from
safishamsi:v8
Jul 1, 2026
Merged

[pull] v8 from safishamsi:v8#100
pull[bot] merged 23 commits into
miqdigital:v8from
safishamsi:v8

Conversation

@pull

@pull pull Bot commented Jul 1, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

safishamsi and others added 23 commits July 1, 2026 10:12
…1574, #1571)

build_merge backs `graphify --update`. Two incremental-update data bugs:

#1574 — it read only nodes+edges from the existing graph.json, never
hyperedges, and build() only sees the new chunks' hyperedges. So every
--update collapsed the graph's hyperedge set (the highest-signal semantic
groupings) down to just the re-extracted files'. Now existing hyperedges are
carried forward via attach_hyperedges (id-dedup): re-extracted files' prior
hyperedges are replaced by their new version (by source_file), deleted files'
are dropped via the prune set, and unchanged/global ones are preserved. This
mirrors what watch.py already did.

#1571 — when a caller omits `root`, absolute prune_sources (from
detect_incremental) never relativized to the stored relative source_file
keys, so deleted files' nodes survived as ghosts and accumulated across runs.
Added _infer_merge_root: fall back to the committed graphify-out/.graphify_root
marker, else the output dir's parent. This root now drives BOTH the prune set
and the replace-per-source normalization, so both work without an explicit
root. The CLI --update path and all shipped runbooks already pass root; this
hardens the library for any other caller.

5 tests: hyperedge preservation (unchanged/global kept, re-extracted replaced,
with and without root), deleted-file hyperedge prune, and root-less prune via
both the grandparent and .graphify_root-marker fallbacks. Full suite 2742.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…slice 3)

Reflective dispatch by string literal — getattr(obj, "handler") — resolves the
attribute name to a callable def and emits it under indirect_call (context
"getattr", INFERRED) at both function and module scope, so `graphify affected
handler` now covers getattr call sites.

The name is a STRING, not an identifier: it names an attribute and is never
shadowed by a param/local, so it resolves without the identifier shadow guard —
the inverse of the #1565/#1566 identifier paths. A dynamic name (a variable,
f-string, concatenation, or any expression) is not statically resolvable and emits
nothing; obj.getattr(...) (a method, not the builtin) and the 1-arg form are ignored.

Refactors the shared resolve-and-emit core out of _emit_indirect_ref into
_emit_indirect_by_name so the getattr path reuses it (callable-target-only,
cross-file deferral, dedup) without duplicating the guard; the identifier wrapper
is behavior-preserving. Full suite green.
…n LLM)

Community labels defaulted to "Community N" whenever no LLM backend was configured,
making the report + suggested questions unreadable ("why does log_action connect
Community 70 to Community 129?"). Add `label_communities_by_hub`: name each community
after its highest-degree member — the structural hub — so the report reads "log_action"
/ "auth" with zero LLM cost. Ties break by node id for run-to-run stability; a community
with no members in the graph keeps the "Community N" placeholder.

Wired as the default base at both label-building sites — the label/standalone path in
__main__ and the watch/detect rebuild in watch.py. A configured LLM naming pass still
runs and overrides these with richer names; its no-backend placeholder fallback is
guarded so it can't clobber the hub labels.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…to-called

deduplicate_by_label is never wired into build(); the active dedup path is
deduplicate_entities (imported and called in build). Its docstring claimed
"Called in build() automatically," which was never true. Correct it to say the
helper is dormant/unused and to warn that it merges by label alone with no
file_type guard, so it must not be enabled for code nodes — same-label symbols
from different files/packages (e.g. two Account types) would collapse, the
cross-file conflation deduplicate_entities deliberately avoids for code (#1205).

Docstring only; no behavior change. The function is unused and superseded, so it
could reasonably be deleted instead — left in place here, flagged for your call.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Wrap unguarded json.loads() in build_merge(), load_graph(), and
_read_json_file() so corrupted graph.json files produce an actionable
RuntimeError instead of an unhelpful JSONDecodeError traceback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ata loss (#1504)

When two LLM extraction chunks each process a file with the same name in
different directories, they independently generate the same node IDs and
deduplicate_entities() silently drops one node (first-writer-wins). The
data loss had no indication in any log, counter, or output.

Adds a stderr WARNING when a duplicate ID comes from a different
source_file, telling the user which files collided and recommending the
per-subfolder extract + merge-graphs workflow to avoid it.
`class Dog < Animal` exposes the base in the `superclass` field, but the
inheritance handler in `_extract_generic` had branches for
java/kotlin/c#/scala/cpp/php/swift/python and none for Ruby, so every Ruby
`inherits` edge was silently dropped (contains/methods/calls unaffected).

Add a Ruby branch that reads the `superclass` field, handling both a bare
`constant` (`< Animal`) and a `scope_resolution` (`< Foo::Bar` -> Bar).
Adds a subclass to the Ruby fixture and a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tree-sitter-groovy exposes inheritance via the same `superclass` and
`interfaces`/`type_list` fields as tree-sitter-java, but the inheritance-
emitting block in `_extract_generic` was gated on
`ts_module == "tree_sitter_java"`. Groovy was the only class-based JVM
language in the file with no inheritance handler, so every Groovy
`extends`/`implements` was silently dropped (contains/methods/imports/calls
were unaffected).

Widen the gate to include `tree_sitter_groovy`; the existing
`_emit_java_parent_type` path handles the identical node shapes verbatim.
Adds a base class + interface to the Groovy fixture and two regression
tests (extends -> inherits, implements -> implements).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
#1537 shipped with a manual test checklist only. Add automated tests that
corrupt a graph.json and assert the actionable RuntimeError at all three load
paths (build_merge, affected.load_graph, diagnostics._read_json_file) plus a
happy-path guard. Also record the six merged small fixes in the changelog.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…low-up)

Rigorous smoke testing surfaced an edge case the canonical-tmp unit tests
couldn't reach: when the scan root is under a symlink (macOS /var ->
/private/var, a symlinked home or git worktree), the absolute prune path and
the resolved root differ by prefix, so _norm_source_file's lexical
relative_to fails and the prune/replace match silently misses — deleted
files' ghost nodes survive. Latent in the pre-existing #1007 path too, now
that build_merge resolves the root.

Fix: when lexical relative_to fails, retry with both sides fully resolved.
Only the failure path resolves, so the common lexical match stays
filesystem-free (no per-node stat on the hot replace-per-source loop).

Adds a symlinked-root prune regression test (POSIX-only). Full suite 2768,
and the full end-to-end smoke battery (indirect_call all contexts, JS,
Ruby/Groovy inherits, hyperedge preservation, symlinked-root ghost prune,
corrupt-json errors, dedup collision warning) is green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
indirect_call dispatch arc (call args + cross-file, dispatch tables,
assignment/return, getattr, JS/TS), two incremental-update data fixes
(hyperedge preservation + ghost-node prune, incl. symlinked-root hardening),
direction-aware skill-version warning, deterministic hub community labels,
Ruby/Groovy inheritance edges, corrupt-graph.json error handling, cross-chunk
collision warning, Windows hook worker limit.

Built wheel validated in a clean venv: CLI reports 0.9.4, import resolves to
the installed package, new-feature smoke battery green, and a real `graphify
extract` produces indirect_call + inherits edges end-to-end.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…9.4 regression)

Local install-testing of 0.9.4 surfaced that `graphify extract .` dropped every
cross-file indirect_call edge — the headline feature, broken on the primary code
path — while the extract() API worked. Root cause: the cross-file callable-target
guard unioned per-file `callable_nids` (pre-remap ids), but extract() rewrites node
ids afterward (id_remap / prefix sym_remap / _disambiguate_colliding_node_ids). When
the scan root relativizes ids (cache_root == project root, which the CLI passes), the
guard set went stale and `tgt not in callable_nids` rejected every remapped target.
In-file indirect edges survived (emitted with consistently-remapped endpoints), which
masked it — only cross-file dropped.

Fix: mark callable defs with a `_callable` attribute on the node dict instead of
exporting an id list. A marker rides through every id remap; callable_nids is rebuilt
from the final (post-remap) nodes right before the pass that uses it, and the marker
is stripped before output (like origin_file). Regression test extracts with
cache_root == project root (the CLI shape) and asserts the cross-file edge survives
and _callable never ships to graph.json. Full suite 2769.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reported from a real run: after re-scoping a repo (5807->3710 nodes), the
228-community re-cluster kept run-1's 300 saved labels, so cids now covering a
different community wore the wrong (LLM) names — silently. cluster-only reused
.graphify_labels.json wholesale, and the overlap-based cid remap grabs a prior
cid on any overlap, inheriting a stale name.

Fix: write a per-community membership signature (sha256 of sorted member ids)
beside the labels. On reuse, keep a saved label only when the community's
signature is unchanged; a changed community is renamed by its deterministic
hub (correct-by-construction) with a warning to run `graphify label` for fresh
LLM names. For label files predating the signature, fall back to a community-
count check (a differing count means a different clustering -> don't trust cid
labels). Unchanged graphs reuse labels silently — no false warnings.

Verified: stale legacy labels (42) on a 12-community graph -> warned + hub-
renamed all + sig written; rerun on the unchanged graph -> silent reuse, labels
stable. Unit tests for the signature (deterministic, order-independent, changes
on membership change). Full suite 2771.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`alias Foo.{Bar, Baz}` (and the same import/require/use brace form) emitted
NO imports edges. tree-sitter-elixir represents it as a `dot` node holding the
base alias plus a trailing `tuple` of member aliases, but the import handler
only matched a bare `alias` child, so every multi-alias import was silently
dropped.

Add `_get_alias_modules`, which expands the brace form to `Foo.Bar`,
`Foo.Baz`, … while leaving the single form (`alias Foo.Bar`) unchanged. Adds a
fixture line + regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Function calls (`y = f(x)`) were silently dropped — only `subroutine_call`
(`call sub(...)`) was handled in walk_calls. tree-sitter-fortran represents a
function invocation as a `call_expression`, which had no branch, so every
function-to-function call produced no edge.

Handle `call_expression`. Because Fortran uses the same `name(...)` syntax for
array indexing, the callee is resolved against procedures defined in the file
(`target_nid in seen_ids`) before emitting — so array accesses like `arr(i)`
cannot fabricate spurious `calls` edges. Adds a function + caller to the
fixture and a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Enum variant payload types were silently dropped — `struct_item` and
`trait_item` had type-reference handlers but `enum_item` had none, so the
variant field types were never traversed.

Add an `enum_item` branch that walks
`enum_variant_list -> enum_variant -> ordered_field_declaration_list`
(tuple variants, `Click(Logger)`) and `field_declaration_list` (struct
variants, `Resize { size: Dim }`), emitting a `references` edge from the enum
to each field type. Reuses the same type collection as the struct path. Adds
an enum to the fixture and a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, #1579)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… forms

Only bare-identifier imports (`using Foo`) emitted edges. tree-sitter-julia
wraps qualified paths in `scoped_identifier` (`using Base.Threads`), relative
paths in `import_path` (`using ..Sibling`), and the package of a
`selected_import` may itself be a `scoped_identifier`
(`import Base.Threads: nthreads`). None of those were matched, so qualified and
relative imports were silently dropped, and scoped selected-imports pointed at
the selected symbol instead of the module.

Resolve the module name from identifier / scoped_identifier / import_path in
all three positions. Adds fixture lines + a regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
extract_rust() only traversed field_declaration_list (named-struct
bodies), so tuple structs -- whose positional fields nest under
ordered_field_declaration_list -- had every field type reference
silently dropped from the graph.

This is the same node shape the enum handler already accounts for
(tuple variants nest their types under ordered_field_declaration_list);
the struct path was simply left behind. Add an additive branch that,
for each type node in a tuple struct's ordered_field_declaration_list,
collects type refs via _rust_collect_type_refs and emits references
edges with the appropriate field / generic_arg context. The
named-struct path is untouched.

For `struct Wrapper(Logger, Config);` with Logger/Config defined
in-file, no field edges were produced before; both are now emitted.

Adds test_rust_tuple_struct_field_references and a tuple struct to the
shared Rust fixture covering plain and generic positional field types.
The SystemVerilog class-body field regex in _augment_systemverilog_semantics
matched only unqualified `<type> <name>;` declarations. Its `^\s*` prefix
consumes leading whitespace but not leading class-property qualifiers, so a
qualified field such as `rand Config m_cfg;` (three tokens) failed the
two-token shape and its type reference was silently dropped from the graph.

Consume optional leading qualifiers (rand/randc/local/protected/static/const/
automatic/var) before the type token. Zero qualifiers preserves the existing
behavior; the type and name capture are unchanged.

Adds test_systemverilog_qualified_field_references plus rand- and
protected-qualified fields (and a Config class) to the shared .sv fixture.
, #1582, #1583)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pull pull Bot locked and limited conversation to collaborators Jul 1, 2026
@pull pull Bot added the ⤵️ pull label Jul 1, 2026
@pull pull Bot merged commit 532a20e into miqdigital:v8 Jul 1, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants