Skip to content

pred path --all OOMs/hangs: big_o_normal_form exponentially expands composed overhead expressions #1069

@isPANN

Description

@isPANN

Summary

pred path <S> <T> --all --max-paths N (e.g. pred path KSat QUBO --all) can OOM / hang for an unbounded amount of time. It is platform-dependent: on macOS small N is instant and it only blows up at larger N; on CI (Linux) it hung even at N = 3, where the integration test test_path_all_max_paths_truncates ran ~113 s and the job was SIGTERM-killed (exit 143).

Root cause: exponential expansion in big_o_normal_form

Path enumeration is not the problem. The KSat→QUBO subgraph has only ~108 simple paths total; find_paths_up_to(limit=11) returns 8 paths in 89 µs, and enumerating all 108 takes 2.3 ms.

The blowup is in rendering the Big-O of each path's composed overhead:

  1. For each enumerated path, the CLI computes compose_path_overhead(path) — a nested symbolic expression (product/power of sums). This stays compact (the worst is ~2 KB of text).
  2. To display O(...), the CLI calls big_o_normal_form() (src/big_o.rs), via big_o_ofoverhead_to_json / format_path_text in problemreductions-cli/src/commands/graph.rs.
  3. big_o_normal_form expands the nested expression into a normal form (sum of monomials). Expanding nested (sum)² · (sum)² structures is exponential in nesting depth. For paths that pass through quadratic-overhead reductions (e.g. QuadraticAssignment, whose overhead squares its input), the composed expression becomes "squares of squares of sums", and expansion explodes into multi-GB of monomials → OOM / hang.

Concrete reproduction

The exploding path (index 34 of the enumeration) is:

KSatisfiability → Satisfiability → KSatisfiability → DecisionMinimumVertexCover
  → HamiltonianCircuit → QuadraticAssignment → ILP → QUBO

Its composed num_vars overhead (compact nested form, 2113 chars) is of the shape:

(12*(…) + …)² · (12*(…) + …)²  +  (… + 3·(…)²·(…)²) · (…)

big_o_normal_form on this never returns (OOM-killed). The growth is clearly super-linear in the expansion: a sibling path with a 1363-char nested expr expands to 4808 chars in 22 ms; the 2113-char one blows past available memory.

Why it is platform-dependent

find_paths_up_to(limit=N) returns paths in edge-insertion order, which comes from inventory link order and differs between Linux and macOS. The CLI renders Big-O for the first N returned paths. On macOS the first few are shallow/benign, so --max-paths 3 is instant and only --max-paths 10 pulls in an explosive path; on Linux an explosive path lands in the first 3, so even --max-paths 3 OOMs. Removing an unrelated reduction edge in babfd497 perturbed the Linux order enough to surface an explosive path within the first 3 (the prior commit 301da58 happened to order benign paths first and passed CI).

Impact

  • CI: platform-dependent OOM that blocks unrelated PRs; any change perturbing inventory order can move an explosive path into the rendered prefix.
  • Product: pred path --all and the MCP find_path tool can OOM/hang for users on paths that traverse quadratic-overhead reductions.

Affected code

  • src/big_o.rs: big_o_normal_form (the exponential expansion).
  • problemreductions-cli/src/commands/graph.rs: big_o_of, overhead_to_json, format_path_text, path_all (note path_all builds the text rendering — which calls big_o_ofunconditionally, even in --json mode).

Proposed fix

Compute the asymptotic Big-O without fully expanding to monomial normal form — e.g. derive per-variable degrees / the dominant term structurally, or simplify before expansion, and/or add a hard guard (size/time cap) on expansion. Path enumeration does not need changing.


Original (incorrect) analysis — kept for history

The original report attributed the hang to all_simple_paths::<…>().take(limit) in find_paths_up_to_mode_bounded doing unbounded DFS exploration with platform-dependent ordering. Subsequent measurement disproved this: the search space from KSat is only ~108 paths / a few hundred partial paths, and enumeration completes in microseconds on any order. The platform-dependence is real but acts through which paths get their Big-O rendered first, not through enumeration cost.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions