feat: SpikeAsync#42
Conversation
3c31730 to
742f369
Compare
742f369 to
caa62ad
Compare
ee16ae3 to
2549212
Compare
2549212 to
3eafa87
Compare
3eafa87 to
8778fa5
Compare
|
Ported to nrta and is ready to merge. |
…ck completely as it is no longer necessary
vgene
left a comment
There was a problem hiding this comment.
Approve with a few questions/requests (can be addressed in future commits)
| // (used for trace api) | ||
| } nrt_tensor_storage_t; | ||
|
|
||
| typedef struct nrt_tensor { |
There was a problem hiding this comment.
is this imported from tensor definition from nrt?
There was a problem hiding this comment.
It feels like revealing things from within nrt and could lead to divergence.
There was a problem hiding this comment.
True... but nrt does not expose this. This is copied from nrt. The ideal world is to ask NRT expose this in their header.
There was a problem hiding this comment.
I am talking to the runtime team to see if they can add two APIs to allow me to get the core ID associated with a model and a tensor respectively. With that we no longer need these hacks.
|
|
||
| You may wonder whether CUDA-style stream APIs already solve this. They do provide asynchronous dispatch, but two problems make them awkward for this pattern. | ||
|
|
||
| #### Problem 1: CPU Work Cannot Overlap Naturally |
There was a problem hiding this comment.
I don't think the comparison with CUDA stream should be here. The high level description of a stream is enough. It feels awkward to compare this async solution to a common pattern from a different product. We can just list the pros and cons of stream later
| #include "tensor.h" | ||
| #include "tensor_set.h" | ||
|
|
||
| #include <nrt/nrt_async.h> |
There was a problem hiding this comment.
this import should be in nrt_wrapper.h with other nrt/.
|
|
||
| #include <nrt/nrt_async.h> | ||
|
|
||
| #include <nanobind/nanobind.h> |
There was a problem hiding this comment.
I'm not comfortable with having nanobind here. The pattern for layer separation was having nanobind related stuff only in python_bindings.cpp and the spike.h/cpp layer provides general helper function that bridge Nrt function to a higher level abstraction. Let's rethink unless it's absolutely necessary to have ndarray here.
There was a problem hiding this comment.
I actually thought about this before, but the conclusion was that there is no way to avoid this.
The fundamental problem is that for async APIs, we need to manipulate the nanobind objects instead of their underlying raw pointers. This is because raw pointers has no lifetime and so when the function returns, there is no guarantee that they still exist. This is okay for synchronous operations because when a sync operation returns, it is done. However, this is not true for async operations.
Overview
See ASYNC_API.md for a detailed description of the API and design.
Now changed to use nrta API instead of thread pool.
The current implementation is tested and ready to merge.