Skip to content

hydra: refactor proxy exec list#7802

Open
hzhou wants to merge 5 commits into
pmodels:mainfrom
hzhou:2605_hydra_exec
Open

hydra: refactor proxy exec list#7802
hzhou wants to merge 5 commits into
pmodels:mainfrom
hzhou:2605_hydra_exec

Conversation

@hzhou
Copy link
Copy Markdown
Contributor

@hzhou hzhou commented May 6, 2026

Pull Request Description

In a round-robin rank assignment, the launch list for each proxy are not in consecutive ranks, this resulted in duplicate exec arguments, which potentially can be very long due to environment strings. Avoid duplication by using separate HYD_proxy_exec struct.

[skip warnings]

  • FIXME: returning strings from MPL_hash to MPIR_proctable is questionable because the strings may get reallocated. What we could do is a two-round. First insert all the strings to MPL_hash, then freeze the hash table, and set proctable.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou hzhou force-pushed the 2605_hydra_exec branch from 611f66d to 54ea39c Compare May 6, 2026 21:57
hzhou added 5 commits May 6, 2026 17:18
Since MPL_hash is effectively a memory storage device, add a memory
class to track it.

Use bool instead of int or char value of 0 and 1.

Fix MPL_hash_has, it need check whether the hash is empty.
The default usage of mpl_hash is a string to string hash, but it can be
used as a string set or a string store.
The previous code assumes a round-robin rank assignment. This may be
incorrect now that we use rank table. Directly use the rank info stored
in the exec struct in the proxy to avoid re-calculate.

Also use MPL_hash to simplify the string storage.
Rather than unnecessarily duplicate HYD_exec, which in turn duplicates
strings and environments, use a separate struct that only holds a
pointer to HYD_exec.
Do not repeat exec infos just because multiple processes are not
consecutive on a proxy. Instead, use a separate launch_list for launch
groups.

Add util function HYDU_free_launch_list for freeing linked list of
struct HYD_proxy_exec.

Cleanup struct HYD_exec, removing unused field start_rank and ref_count.
@hzhou hzhou force-pushed the 2605_hydra_exec branch from 54ea39c to 5e73518 Compare May 6, 2026 22:18
@hzhou
Copy link
Copy Markdown
Contributor Author

hzhou commented May 6, 2026

test:mpich/ch3/most
test:mpich/ch4/most
test:mpich/pmi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant