C++ interfaces: Write directly to std::string#6425
Conversation
| }else{ | ||
| strg.resize(attr_size); | ||
| } | ||
| } |
There was a problem hiding this comment.
The proposed changes are technically valid but trade code simplicity for marginal (likely zero) performance gain.
Allocation: This does not actually eliminate heap allocation; it simply moves it from an explicit new[] to the implicit heap allocation triggered by strg.resize().
Complexity & Risk: Bypassing standard string assignment forces us to manually reinvent C-string truncation logic (strg.find('\0')), increasing code complexity and correctness risk.
My recommendation is to leave the original code alone, or modernize it safely using a standard RAII std::vector buffer to cleanly eliminate the manual delete[] calls without adding truncation complexity.
|
Thanks for providing the benchmark! I acknowledge that the performance improvement is real. I understand that for names ≤ ~15–23 bytes (SSO threshold), all heap allocations are eliminated and for longer names, the allocation count reduces from two to one. However, unless string name operations are demonstrably in the hot path for a given workload, the performance gain does not justify the complexity and the correctness risk for production library code. That said, I revise my recommendation toward the vector approach. It produces no find('\0') logic and no truncation complexity. It uses assign(data, len) or assign(char*), both of which have well-understood, correct semantics. // For example, for Attribute::p_read_fixed_len As a last note, we are discussing about retiring the HDF5 C++ API as there are more modern third-party alternatives available, such as HighFive, h5cpp, h5xx. |

Since C++11 std::string has been guaranteed to use a contiguous internal buffer. We can therefore write directly to internal memory. This eliminates the need for temporary allocations and in short string cases will eliminate all calls to malloc due to short string optimization.