feat: SpikeAsync by Kelvin-Ng · Pull Request #42 · aws-neuron/nkipy

Kelvin-Ng · 2026-03-13T01:06:15Z

Overview

See ASYNC_API.md for a detailed description of the API and design.

Now changed to use nrta API instead of thread pool.

The current implementation is tested and ready to merge.

Kelvin-Ng · 2026-06-18T23:00:35Z

Ported to nrta and is ready to merge.

…ck completely as it is no longer necessary

vgene

Approve with a few questions/requests (can be addressed in future commits)

vgene · 2026-06-18T23:53:06Z

+                        // (used for trace api)
+} nrt_tensor_storage_t;
+
+typedef struct nrt_tensor {


is this imported from tensor definition from nrt?

It feels like revealing things from within nrt and could lead to divergence.

True... but nrt does not expose this. This is copied from nrt. The ideal world is to ask NRT expose this in their header.

I am talking to the runtime team to see if they can add two APIs to allow me to get the core ID associated with a model and a tensor respectively. With that we no longer need these hacks.

vgene · 2026-06-19T00:02:24Z

+
+You may wonder whether CUDA-style stream APIs already solve this. They do provide asynchronous dispatch, but two problems make them awkward for this pattern.
+
+#### Problem 1: CPU Work Cannot Overlap Naturally


I don't think the comparison with CUDA stream should be here. The high level description of a stream is enough. It feels awkward to compare this async solution to a common pattern from a different product. We can just list the pros and cons of stream later

vgene · 2026-06-19T00:05:39Z

 #include "tensor.h"
+#include "tensor_set.h"
+
+#include <nrt/nrt_async.h>


this import should be in nrt_wrapper.h with other nrt/.

Okay will change that.

vgene · 2026-06-19T00:10:27Z

+
+#include <nrt/nrt_async.h>
+
+#include <nanobind/nanobind.h>


I'm not comfortable with having nanobind here. The pattern for layer separation was having nanobind related stuff only in python_bindings.cpp and the spike.h/cpp layer provides general helper function that bridge Nrt function to a higher level abstraction. Let's rethink unless it's absolutely necessary to have ndarray here.

I actually thought about this before, but the conclusion was that there is no way to avoid this.

The fundamental problem is that for async APIs, we need to manipulate the nanobind objects instead of their underlying raw pointers. This is because raw pointers has no lifetime and so when the function returns, there is no guarantee that they still exist. This is okay for synchronous operations because when a sync operation returns, it is done. However, this is not true for async operations.

Kelvin-Ng changed the title ~~SpikeAsync implementation~~ feat: SpikeAsync Mar 13, 2026

Kelvin-Ng requested a review from vgene March 13, 2026 01:08

Kelvin-Ng self-assigned this Mar 13, 2026

Kelvin-Ng added the enhancement New feature or request label Mar 13, 2026

Kelvin-Ng force-pushed the feat/spike-async branch 2 times, most recently from 3c31730 to 742f369 Compare March 13, 2026 22:50

Kelvin-Ng marked this pull request as ready for review March 13, 2026 22:52

Kelvin-Ng requested a review from a team March 13, 2026 22:52

Kelvin-Ng force-pushed the feat/spike-async branch from 742f369 to caa62ad Compare March 14, 2026 00:31

Kelvin-Ng force-pushed the feat/spike-async branch 2 times, most recently from ee16ae3 to 2549212 Compare March 23, 2026 23:04

Kelvin-Ng force-pushed the feat/spike-async branch from 2549212 to 3eafa87 Compare May 6, 2026 23:04

Kelvin-Ng added 4 commits June 18, 2026 22:58

SpikeAsync implementation

026cab4

Add a section to explain when to use SpikeAsync

5732e19

Add a performance test for SpikeAsync

b5de23b

Port SpikeAsync to use nrta instead of thread pool

8778fa5

Kelvin-Ng force-pushed the feat/spike-async branch from 3eafa87 to 8778fa5 Compare June 18, 2026 22:58

Update ASYNC_API.md to reflect new implementation; remove init_nonblo…

dc7a734

…ck completely as it is no longer necessary

vgene approved these changes Jun 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: SpikeAsync#42

feat: SpikeAsync#42
Kelvin-Ng wants to merge 5 commits into
mainfrom
feat/spike-async

Kelvin-Ng commented Mar 13, 2026 •

edited

Loading

Uh oh!

Kelvin-Ng commented Jun 18, 2026

Uh oh!

vgene left a comment

Uh oh!

vgene Jun 18, 2026

Uh oh!

vgene Jun 18, 2026

Uh oh!

Kelvin-Ng Jun 19, 2026

Uh oh!

Kelvin-Ng Jun 19, 2026

Uh oh!

vgene Jun 19, 2026

Uh oh!

Uh oh!

vgene Jun 19, 2026

Uh oh!

Kelvin-Ng Jun 19, 2026

Uh oh!

vgene Jun 19, 2026

Uh oh!

Kelvin-Ng Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		You may wonder whether CUDA-style stream APIs already solve this. They do provide asynchronous dispatch, but two problems make them awkward for this pattern.

		#### Problem 1: CPU Work Cannot Overlap Naturally

Conversation

Kelvin-Ng commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Uh oh!

Kelvin-Ng commented Jun 18, 2026

Uh oh!

vgene left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kelvin-Ng commented Mar 13, 2026 •

edited

Loading