[docs] Improve API reference documentation#65
Conversation
CLA Signature Guide@Artimislyy , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
2 similar comments
CLA Signature Guide@Artimislyy , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
CLA Signature Guide@Artimislyy , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
|
Add |
| - `["npu"]` for NPU tensors only | ||
| - `["npu", "cpu"]` for NPU and CPU tensors | ||
| - `["cpu"]` for CPU tensors only | ||
| - `None` (default) for both `["npu", "cpu"]` |
There was a problem hiding this comment.
now the default value is ["npu", "cpu"]. check ray_ascend/__init__.py
|
|
||
| **Notes:** | ||
|
|
||
| - Must be called in both the driver process and each actor's `__init__` |
| | Variable | Default | Description | | ||
| | --- | --- | --- | | ||
| | `YR_DS_INIT_MODE` | `metastore` | Initialization mode (`metastore` or `etcd`) | | ||
| | `YR_DS_WORKER_PORT` | `31501` | YR DS worker port | |
There was a problem hiding this comment.
YR DS worker port -> openYuanrong Datasystem worker port
| @ray.method(tensor_transport="YR") | ||
| def transfer_cpu_tensor_via_rdma(self): | ||
| return torch.zeros(1024) | ||
| ``` |
There was a problem hiding this comment.
add a real use case e.g. driver calls transfer_cpu_tensor_via_rdma or transfer_npu_tensor_via_hccs and ray.get
| backend="HCCL", | ||
| group_name="my_group", | ||
| ) | ||
| ``` |
There was a problem hiding this comment.
add a real case e.g. call allreduce on each actor
|
|
||
| - Must be called in both the driver process and each actor's `__init__` | ||
| - Environment variables should be set in the driver process before calling | ||
| - YR backend initialization happens once across the cluster via a named actor |
There was a problem hiding this comment.
note that tensor_transport is case-insensitive. YR/yr both work; HCCL/hccl both work.
|
use pre-commit before commit https://ascend.github.io/ray-ascend/developer_guide/#coding-standards-and-submission |
CLA Signature Guide@Artimislyy , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
|
Please add signature to commit info. |
| ``` | ||
|
|
||
| Must be called in both the driver process and each actor's `__init__`. | ||
| ## cleanup_yr_resources |
There was a problem hiding this comment.
cleanup_yr_resources is used to clean datasystem workers and single controller initialized by register_yr_tensor_transport. I think they should be together to decrible.
| ## cleanup_yr_resources | ||
|
|
||
| ### Environment Variables | ||
| Clean up all YR resources. Delegates cleanup to coordinator, then kills the coordinator |
There was a problem hiding this comment.
Users don't know what the coordinator is. Perhaps we need more popular description.
|
|
||
| ### Environment Variables | ||
| Clean up all YR resources. Delegates cleanup to coordinator, then kills the coordinator | ||
| actor. |
There was a problem hiding this comment.
we can tell users like "ray stop can also clean yr worker".
There was a problem hiding this comment.
what may happen if users forgot cleanup yr
There was a problem hiding this comment.
YrDsCoordinary is detached. This actor process will continue to exist until ray shutdown
| | `YR_DS_INIT_MODE` | `metastore` | Initialization mode (`metastore` or `etcd`) | | ||
| | `YR_DS_WORKER_PORT` | `31501` | openYuanRong Datasystem worker port | | ||
| | `YR_DS_METASTORE_PORT` | `2379` | Metastore service port | | ||
| | `YR_DS_ETCD_ADDRESS` | - | Etcd address (required for etcd mode) | |
There was a problem hiding this comment.
if etcd mode but no etcd address is provided, is there any error message? @dpj135
There was a problem hiding this comment.
Raises error when os.getenv("YR_DS_ETCD_ADDRESS")
There was a problem hiding this comment.
ray-ascend/ray_ascend/utils/yr_utils.py
Line 669 in 15784db
ok
|
|
||
| ### Environment Variables | ||
| Clean up all YR resources. Delegates cleanup to coordinator, then kills the coordinator | ||
| actor. |
There was a problem hiding this comment.
what may happen if users forgot cleanup yr
| - ["npu", "cpu"] for NPU and CPU tensors | ||
| - ["cpu"] for CPU tensors only | ||
| - None (default) for both ["npu", "cpu"] | ||
| - `None` (uses `["npu", "cpu"]` by default) |
There was a problem hiding this comment.
None is not allowed, should raise error
| def transfer_npu_tensor(self): | ||
| return torch.tensor([1, 2, 3]).npu() | ||
|
|
||
| actor = RayActor.remote() |
There was a problem hiding this comment.
In order to use hccl in tensor transport, you need to create collective group first. see
CLA Signature Guide@Artimislyy , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
aa0a8ae to
24b1605
Compare
CLA Signature Guide@Artimislyy , thanks for your pull request. The following commit(s) are not associated with a signed Contributor License Agreement (CLA).
To sign CLA, click here. To check if your email is configured correctly, refer to the FAQs. Once you've signed the CLA or updating your email, please comment |
Signed-off-by: Artimislyy <2249614312@qq.com>
24b1605 to
9681123
Compare
CLA Signature PassArtimislyy, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
Signed-off-by: Artimislyy <2249614312@qq.com>
CLA Signature PassArtimislyy, thanks for your pull request. All authors of the commits have signed the CLA. 👍 |
| return torch.zeros(1024, device="npu") | ||
|
|
||
| @ray.method(tensor_transport="YR") | ||
| def transfer_cpu_tensor_via_rdma(self): |
There was a problem hiding this comment.
TODO: Support RDMA for cpu tensor transport

Description
Fix