Skip to content

feat: SpikeAsync#42

Open
Kelvin-Ng wants to merge 5 commits into
mainfrom
feat/spike-async
Open

feat: SpikeAsync#42
Kelvin-Ng wants to merge 5 commits into
mainfrom
feat/spike-async

Conversation

@Kelvin-Ng

@Kelvin-Ng Kelvin-Ng commented Mar 13, 2026

Copy link
Copy Markdown

Overview

See ASYNC_API.md for a detailed description of the API and design.

Now changed to use nrta API instead of thread pool.

The current implementation is tested and ready to merge.

@Kelvin-Ng Kelvin-Ng changed the title SpikeAsync implementation feat: SpikeAsync Mar 13, 2026
@Kelvin-Ng Kelvin-Ng requested a review from vgene March 13, 2026 01:08
@Kelvin-Ng Kelvin-Ng self-assigned this Mar 13, 2026
@Kelvin-Ng Kelvin-Ng added the enhancement New feature or request label Mar 13, 2026
@Kelvin-Ng Kelvin-Ng force-pushed the feat/spike-async branch 2 times, most recently from 3c31730 to 742f369 Compare March 13, 2026 22:50
@Kelvin-Ng Kelvin-Ng marked this pull request as ready for review March 13, 2026 22:52
@Kelvin-Ng Kelvin-Ng requested a review from a team March 13, 2026 22:52
@Kelvin-Ng Kelvin-Ng force-pushed the feat/spike-async branch 2 times, most recently from ee16ae3 to 2549212 Compare March 23, 2026 23:04
@Kelvin-Ng Kelvin-Ng force-pushed the feat/spike-async branch from 2549212 to 3eafa87 Compare May 6, 2026 23:04
@Kelvin-Ng

Copy link
Copy Markdown
Author

Ported to nrta and is ready to merge.

@vgene vgene left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with a few questions/requests (can be addressed in future commits)

// (used for trace api)
} nrt_tensor_storage_t;

typedef struct nrt_tensor {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this imported from tensor definition from nrt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like revealing things from within nrt and could lead to divergence.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True... but nrt does not expose this. This is copied from nrt. The ideal world is to ask NRT expose this in their header.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am talking to the runtime team to see if they can add two APIs to allow me to get the core ID associated with a model and a tensor respectively. With that we no longer need these hacks.

Comment thread spike/ASYNC_API.md

You may wonder whether CUDA-style stream APIs already solve this. They do provide asynchronous dispatch, but two problems make them awkward for this pattern.

#### Problem 1: CPU Work Cannot Overlap Naturally

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the comparison with CUDA stream should be here. The high level description of a stream is enough. It feels awkward to compare this async solution to a common pattern from a different product. We can just list the pros and cons of stream later

Comment thread spike/ASYNC_API.md Outdated
Comment thread spike/src/include/spike.h
#include "tensor.h"
#include "tensor_set.h"

#include <nrt/nrt_async.h>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this import should be in nrt_wrapper.h with other nrt/.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay will change that.

Comment thread spike/src/include/spike.h

#include <nrt/nrt_async.h>

#include <nanobind/nanobind.h>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not comfortable with having nanobind here. The pattern for layer separation was having nanobind related stuff only in python_bindings.cpp and the spike.h/cpp layer provides general helper function that bridge Nrt function to a higher level abstraction. Let's rethink unless it's absolutely necessary to have ndarray here.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually thought about this before, but the conclusion was that there is no way to avoid this.

The fundamental problem is that for async APIs, we need to manipulate the nanobind objects instead of their underlying raw pointers. This is because raw pointers has no lifetime and so when the function returns, there is no guarantee that they still exist. This is okay for synchronous operations because when a sync operation returns, it is done. However, this is not true for async operations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants