Skip to content

CLI: Rewrite ForEachAsync to use threadpool, add timeout#40675

Draft
dkbennett wants to merge 9 commits into
masterfrom
user/dkbennett/asyncoptimize
Draft

CLI: Rewrite ForEachAsync to use threadpool, add timeout#40675
dkbennett wants to merge 9 commits into
masterfrom
user/dkbennett/asyncoptimize

Conversation

@dkbennett
Copy link
Copy Markdown
Member

Summary of the Pull Request

This is an improvement of the ForEachAsync generic method to use the windows thread pool and keep the pool full of workers whenever one completes instead of waiting for every worker to be complete before starting another batch. This also adds a timeout to the method so workers do not execute endlessly and does some refactoring for easier debugging.

PR Checklist

  • Closes: Link to issue #xxx
  • Communication: I've discussed this with core contributors already. If work hasn't been agreed, this work might be rejected
  • Tests: Added/updated if needed and all pass
  • Localization: All end user facing strings can be localized
  • Dev docs: Added/updated if needed
  • Documentation updated: If checked, please file a pull request on our docs repo and link it here: #xxx

Detailed Description of the Pull Request / Additional comments

  • Rewrites ForEachAsync, refactoring out classes and methods into details instead of lambdas for easier debugging and nicer stack frames.
  • Uses windows thread pool and wil wrappers consistent with elsewhere in the codebase.
  • Fills the pool and keeps it full as work completes so longer running threads dont hold up work on other threads and we have full utilization for the duration.
  • Handles timeouts and cancellation of all workers if one worker times out.
  • Adds to and updates tests.
  • Updates container stats usage of this to support the cancellation event.

Validation Steps Performed

  • Manual running of stats with many containers.
  • Updated unit tests run quickly and cleanly.

Copilot AI review requested due to automatic review settings May 29, 2026 23:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rewrites the WSLC ForEachAsync helper to use the Windows thread pool with bounded concurrency and cooperative cancellation, then updates container stats collection and unit coverage to use the new API.

Changes:

  • Replaces std::async batch execution with a reusable thread-pool worker implementation.
  • Adds timeout/cancel-drain parameters and updates container stats to pass a cancellation handle.
  • Expands unit tests for pool sizing, dispatching beyond the initial pool, timeout, and cancellation behavior.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/windows/wslc/core/AsyncExecution.h Reimplements ForEachAsync with thread-pool workers, cancellation, and timeout handling.
src/windows/wslc/tasks/ContainerTasks.cpp Updates stats collection to the new ForEachAsync signature and timeout parameters.
test/windows/wslc/WSLCCLIExecutionUnitTests.cpp Updates existing tests and adds coverage for new pool and timeout behaviors.

Comment thread src/windows/wslc/core/AsyncExecution.h Outdated
Comment thread src/windows/wslc/core/AsyncExecution.h Outdated
Comment thread src/windows/wslc/core/AsyncExecution.h Outdated
Comment thread src/windows/wslc/tasks/ContainerTasks.cpp Outdated
Copilot AI review requested due to automatic review settings May 30, 2026 00:18
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comment thread src/windows/wslc/core/AsyncExecution.h
Comment thread src/windows/wslc/core/AsyncExecution.h
Comment thread src/windows/wslc/core/AsyncExecution.h
Comment thread src/windows/wslc/tasks/ContainerTasks.cpp Outdated
Copilot AI review requested due to automatic review settings May 30, 2026 00:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/windows/wslc/core/AsyncExecution.h Outdated
Comment thread src/windows/wslc/core/AsyncExecution.h Outdated
Comment thread src/windows/wslc/core/AsyncExecution.h
Copilot AI review requested due to automatic review settings May 30, 2026 00:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread src/windows/wslc/core/AsyncExecution.h
Comment thread src/windows/wslc/core/AsyncExecution.h Outdated
@dkbennett dkbennett marked this pull request as ready for review May 30, 2026 08:12
@dkbennett dkbennett requested a review from a team as a code owner May 30, 2026 08:12
Copilot AI review requested due to automatic review settings May 30, 2026 08:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

Comment thread src/windows/wslc/core/AsyncExecution.h Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 1, 2026 16:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Collaborator

@OneBlue OneBlue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a generic threadpool that we can use across the codebase has a lot of value, thank you for doing this !

I think we need to design it a bit more defensively though (runway threads will unfortunately cause us a lot of pain, so we should avoid "timing out" on cancellation)

I also recommend checking out src\windows\common\WslCoreMessageQueue.h, which uses a threadpool as well. I think a good target would be to move that class to a generic threadpool, something like:

template <TResult>
struct WorkItem
{
    TResult result;
    std::exception_ptr error;
    void Wait();
    void Cancel();    
};

class ThreadPool
{
    ThreadPool(min-threads, max-threads);

    template <TFunction>
    WorkItem SubmitWork(Function &&Work);
}

Should cover both usecases

void CancelAndDrainInFlight() noexcept
{
context->cancelEvent.SetEvent();
::WaitForMultipleObjects(static_cast<DWORD>(doneHandles.size()), doneHandles.data(), TRUE, cancelDrainMs);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't recommend having a "cancel timeout" here. If this timeout ever hits, this will leak threads which will most likely to undefined behavior / crashes

DWORD cancelDrainMs{};

WorkerPool(size_t workerCount, TWork onWork, std::chrono::milliseconds timeout_, std::chrono::milliseconds cancelDrainTimeout) :
context(std::make_shared<TSharedContext>(std::move(onWork))),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming that we don't leak threads, context doesn't need to be a pointer here (could just be a regular class field)

std::vector<HANDLE> doneHandles;
std::shared_ptr<TSharedContext> context;
std::chrono::milliseconds timeout;
DWORD timeoutMs{};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeoutMs is unused

// On any exception from Launch, Drain, or the user callbacks (onSuccess/onError),
// signal cancellation and wait for in-flight workers before rethrowing. This guarantees
// no background thread pool callbacks outlive the ForEachAsync call.
auto cancelOnError = wil::scope_exit([&pool] { pool.CancelAndDrainInFlight(); });
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend doing this in the WorkerPool's destuctor so there's a strong guarantee of no leaked threads

context(std::make_shared<TSharedContext>(std::move(onWork))),
timeout(timeout_),
timeoutMs(timeout_ == std::chrono::milliseconds::max() ? INFINITE : static_cast<DWORD>(timeout_.count())),
cancelDrainMs(static_cast<DWORD>(cancelDrainTimeout.count()))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have a Launch() method, I'd recommend only creating the workers there. That way this threadpool can dynamically resize itself as needed (which will allow us to reuse it in other places)

{
onSuccess(*batchResult.result);
worker.workerResult.hasError = true;
worker.workerResult.error = wil::ResultException{wil::ResultFromCaughtException()};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This catch should cover both catch() cases

TItem item;
std::optional<TItem> item;
std::optional<TResult> result;
wil::ResultException error{S_OK};
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplify things, I would recommend storing the error as a std::exception_ptr (null if no error was thrown).

This will have the benefit of allowing us to rethrow non-wil exceptions easily

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and we can get rid of hasError)

// keeping the worker and its SharedContext alive for the full duration of the callback
// regardless of WorkerPool lifetime. Re-create the work item each launch so the
// context pointer is fresh.
auto* ctx = new std::shared_ptr<TSharedWorker>(worker);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we merge SharedContext and SharedWorker into one structure that the threadpool owns, and pass a pointer to that structure directly as the context ? This would simplify things a lot, and get rid of the shared_ptr pointer.

If we store them as an std::list, the pointers never get invalidated

NON_MOVABLE(SharedContext);

TWork onWork;
wil::unique_event cancelEvent;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this cancel event should be in the threadpool structure, since the threadpool logic doesn't actually look at it.

Callers can capture a cancel event in their work callback in they need to

using TSharedContext = SharedContext<TWork>;

std::vector<std::shared_ptr<TSharedWorker>> workers;
std::vector<HANDLE> doneHandles;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the "timeout on cancel" logic, we can get rid of those

@dkbennett dkbennett marked this pull request as draft June 1, 2026 19:14
@dkbennett
Copy link
Copy Markdown
Member Author

Converting back to draft to do some rework per @OneBlue comments. There's not huge urgency here so can take a bit of time and ensure it is the best it can be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants