Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 50 additions & 16 deletions hl/src/H5TBpublic.h
Original file line number Diff line number Diff line change
Expand Up @@ -139,8 +139,7 @@ extern "C" {
* \param[in] nfields The number of fields
* \param[in] nrecords The number of records
* \param[in] type_size The size in bytes of the structure
* associated with the table;
* This value is obtained with \c sizeof().
* associated with the table
* \param[in] field_names An array containing the names of the fields.
* Names longer than #HLTB_MAX_FIELD_LEN - 1 characters
* are silently truncated when read back by
Expand All @@ -160,6 +159,13 @@ extern "C" {
* \p dset_name attached to the object specified by the
* identifier loc_id.
*
* \p type_size can be obtained with \c sizeof(), if the data is

@fortnern fortnern Jun 4, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really just need the last offset plus the size of the last type (and any padding bytes desired after the last member). This is assuming the fields are sorted by offset, otherwise use the highest offest and the size of the type for the field with the highest offset.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, though I was mostly going for simplicity and as you already mentioned if the structure is just an untyped block of bytes the size should generally already be known. I do think the API design and documentation leaves a bit to be desired and telling someone how to calculate the size of their in-memory structure here may just be adding more potential for confusion than necessary.

@bmribler bmribler Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about: "Otherwise, \p type_size should be calculated based on the highest offset in \p field_offset, the size of its corresponding datatype in \p field_types, and any padding bytes desired after that field."

* stored in a predefined C struct. Otherwise, \p type_size should
* be calculated as the sum of calling H5Tget_size() for each of
* the given field datatypes in \p field_types, plus any padding
* bytes included in the structure, based on the given field offsets
* in \p field_offset.
*
*/
H5HL_DLL herr_t H5TBmake_table(const char *table_title, hid_t loc_id, const char *dset_name, hsize_t nfields,
hsize_t nrecords, size_t type_size, const char *field_names[],
Expand All @@ -182,8 +188,7 @@ H5HL_DLL herr_t H5TBmake_table(const char *table_title, hid_t loc_id, const char
* \fg_loc_id
* \param[in] dset_name The name of the dataset to overwrite
* \param[in] nrecords The number of records to append
* \param[in] type_size The size of the structure type,
* as calculated by \c sizeof().
* \param[in] type_size The size of the structure type
* \param[in] field_offset An array containing the offsets of
* the fields. These offsets can be
* calculated with the #HOFFSET macro
Expand All @@ -198,6 +203,10 @@ H5HL_DLL herr_t H5TBmake_table(const char *table_title, hid_t loc_id, const char
* identifier \p loc_id. The dataset is extended to hold the
* new records.
*
* \p type_size can be obtained with \c sizeof(), if the data is

@jhendersonHDF jhendersonHDF Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my reading, for most of these functions type_size is used to specify the size of the memory datatype, which may not necessarily be the same size as the file type (for example, if writing from a packed memory compound type). From that, H5TBget_field_info() is probably not what applications should call here. I believe the same advice from above about adding the sizes of the fields plus any structure padding still applies here and for several or all of the functions below.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is, if the data is not stored in a C struct, how is it stored? If the application is manually moving bytes around in memory then it should know these values.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from @jhendersonHDF's comment tells me that I should use the same description of type_size in H5TBmake_table() for all type_size entries. I'll go ahead with that unless @fortnern thinks otherwise, as your comment seemed to suggest.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fine though I added a comment about the method specified for make_table - you only need to look at the field with the highest offset.

* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBappend_records(hid_t loc_id, const char *dset_name, hsize_t nrecords, size_t type_size,
const size_t *field_offset, const size_t *dst_sizes, const void *buf);
Expand All @@ -212,8 +221,7 @@ H5HL_DLL herr_t H5TBappend_records(hid_t loc_id, const char *dset_name, hsize_t
* \param[in] dset_name The name of the dataset to overwrite
* \param[in] start The zero index record to start writing
* \param[in] nrecords The number of records to write
* \param[in] type_size The size of the structure type, as
* calculated by \c sizeof().
* \param[in] type_size The size of the structure type
* \param[in] field_offset An array containing the offsets of
* the fields. These offsets can be
* calculated with the #HOFFSET macro
Expand All @@ -227,6 +235,10 @@ H5HL_DLL herr_t H5TBappend_records(hid_t loc_id, const char *dset_name, hsize_t
* index position start of the table named \p dset_name attached
* to the object specified by the identifier \p loc_id.
*
* \p type_size can be obtained with \c sizeof(), if the data is
* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBwrite_records(hid_t loc_id, const char *dset_name, hsize_t start, hsize_t nrecords,
size_t type_size, const size_t *field_offset, const size_t *dst_sizes,
Expand All @@ -243,8 +255,7 @@ H5HL_DLL herr_t H5TBwrite_records(hid_t loc_id, const char *dset_name, hsize_t s
* \param[in] field_names The names of the fields to write
* \param[in] start The zero index record to start writing
* \param[in] nrecords The number of records to write
* \param[in] type_size The size of the structure type, as
* calculated by \c sizeof().
* \param[in] type_size The size of the structure type
* \param[in] field_offset An array containing the offsets of
* the fields. These offsets can be
* calculated with the #HOFFSET macro
Expand All @@ -259,6 +270,10 @@ H5HL_DLL herr_t H5TBwrite_records(hid_t loc_id, const char *dset_name, hsize_t s
* dataset named \p dset_name attached to the object specified
* by the identifier \p loc_id.
*
* \p type_size can be obtained with \c sizeof(), if the data is
* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBwrite_fields_name(hid_t loc_id, const char *dset_name, const char *field_names,
hsize_t start, hsize_t nrecords, size_t type_size,
Expand All @@ -278,8 +293,7 @@ H5HL_DLL herr_t H5TBwrite_fields_name(hid_t loc_id, const char *dset_name, const
* \param[in] field_index The indexes of the fields to write
* \param[in] start The zero based index record to start writing
* \param[in] nrecords The number of records to write
* \param[in] type_size The size of the structure type, as
* calculated by \c sizeof().
* \param[in] type_size The size of the structure type
* \param[in] field_offset An array containing the offsets of
* the fields. These offsets can be
* calculated with the #HOFFSET macro
Expand All @@ -294,6 +308,10 @@ H5HL_DLL herr_t H5TBwrite_fields_name(hid_t loc_id, const char *dset_name, const
* dataset named \p dset_name attached to the object
* specified by the identifier \p loc_id.
*
* \p type_size can be obtained with \c sizeof(), if the data is
* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBwrite_fields_index(hid_t loc_id, const char *dset_name, hsize_t nfields,
const int *field_index, hsize_t start, hsize_t nrecords,
Expand All @@ -315,8 +333,7 @@ H5HL_DLL herr_t H5TBwrite_fields_index(hid_t loc_id, const char *dset_name, hsiz
*
* \fg_loc_id
* \param[in] dset_name The name of the dataset to read
* \param[in] dst_size The size of the structure type,
* as calculated by \c sizeof()
* \param[in] dst_size The size of the structure type
* \param[in] dst_offset An array containing the offsets of
* the fields. These offsets can be
* calculated with the #HOFFSET macro
Expand All @@ -331,6 +348,10 @@ H5HL_DLL herr_t H5TBwrite_fields_index(hid_t loc_id, const char *dset_name, hsiz
* \p dset_name attached to the object specified by
* the identifier \p loc_id.
*
* \p dst_size can be obtained with \c sizeof(), if the data is
* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBread_table(hid_t loc_id, const char *dset_name, size_t dst_size, const size_t *dst_offset,
const size_t *dst_sizes, void *dst_buf);
Expand All @@ -349,7 +370,6 @@ H5HL_DLL herr_t H5TBread_table(hid_t loc_id, const char *dset_name, size_t dst_s
* \param[in] nrecords The number of records to read
* \param[in] type_size The size in bytes of the structure associated
* with the table
* (This value is obtained with \c sizeof().)
* \param[in] field_offset An array containing the offsets of the fields
* \param[in] dst_sizes An array containing the size in bytes of
* the fields
Expand All @@ -361,6 +381,10 @@ H5HL_DLL herr_t H5TBread_table(hid_t loc_id, const char *dset_name, size_t dst_s
* by \p field_names from a dataset named \p dset_name
* attached to the object specified by the identifier \p loc_id.
*
* \p type_size can be obtained with \c sizeof(), if the data is
* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBread_fields_name(hid_t loc_id, const char *dset_name, const char *field_names,
hsize_t start, hsize_t nrecords, size_t type_size,
Expand All @@ -384,7 +408,6 @@ H5HL_DLL herr_t H5TBread_fields_name(hid_t loc_id, const char *dset_name, const
* \param[in] nrecords The number of records to read
* \param[in] type_size The size in bytes of the structure associated
* with the table
* (This value is obtained with \c sizeof())
* \param[in] field_offset An array containing the offsets of the fields
* \param[in] dst_sizes An array containing the size in bytes of
* the fields
Expand All @@ -396,6 +419,10 @@ H5HL_DLL herr_t H5TBread_fields_name(hid_t loc_id, const char *dset_name, const
* by \p field_index from a dataset named \p dset_name attached
* to the object specified by the identifier \p loc_id.
*
* \p type_size can be obtained with \c sizeof(), if the data is
* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBread_fields_index(hid_t loc_id, const char *dset_name, hsize_t nfields,
const int *field_index, hsize_t start, hsize_t nrecords,
Expand All @@ -413,8 +440,7 @@ H5HL_DLL herr_t H5TBread_fields_index(hid_t loc_id, const char *dset_name, hsize
* \param[in] dset_name The name of the dataset to read
* \param[in] start The start record to read from
* \param[in] nrecords The number of records to read
* \param[in] type_size The size of the structure type,
* as calculated by \c sizeof()
* \param[in] type_size The size of the structure type
* \param[in] dst_offset An array containing the offsets of the
* fields. These offsets can be calculated
* with the #HOFFSET macro
Expand All @@ -428,6 +454,10 @@ H5HL_DLL herr_t H5TBread_fields_index(hid_t loc_id, const char *dset_name, hsize
* named \p dset_name attached to the object specified by the
* identifier \p loc_id.
*
* \p type_size can be obtained with \c sizeof(), if the data is
* stored in a predefined C struct. Otherwise, it can be obtained
* by calling H5TBget_field_info() if not already known.
*
*/
H5HL_DLL herr_t H5TBread_records(hid_t loc_id, const char *dset_name, hsize_t start, hsize_t nrecords,
size_t type_size, const size_t *dst_offset, const size_t *dst_sizes,
Expand Down Expand Up @@ -549,6 +579,10 @@ H5HL_DLL herr_t H5TBdelete_record(hid_t loc_id, const char *dset_name, hsize_t s
* \details H5TBinsert_record() inserts records into the middle of the table
* ("pushing down" all the records after it)
*
* \p dst_size can be obtained with \c sizeof() if the predefined C
* structure is available, otherwise, use H5Tget_size() on the
* compound datatype.
*
*/
H5HL_DLL herr_t H5TBinsert_record(hid_t loc_id, const char *dset_name, hsize_t start, hsize_t nrecords,
size_t dst_size, const size_t *dst_offset, const size_t *dst_sizes,
Expand Down
2 changes: 1 addition & 1 deletion src/H5FDonion.c
Original file line number Diff line number Diff line change
Expand Up @@ -1154,7 +1154,7 @@ H5FD__onion_open(const char *filename, unsigned flags, hid_t fapl_id, haddr_t ma
* We're getting this buffer from a fixed-size array in a struct, which
* will be garbage and not null-terminated if the user isn't careful.
* Be careful of this and do strndup first to ensure strdup gets a
* null-termianted string (HDF5 doesn't provide a strnlen call if you
* null-terminated string (HDF5 doesn't provide a strnlen call if you
* don't have one).
*/
if (NULL ==
Expand Down
54 changes: 54 additions & 0 deletions src/H5Fsuper_cache.c
Original file line number Diff line number Diff line change
Expand Up @@ -521,12 +521,36 @@ H5F__cache_superblock_deserialize(const void *_image, size_t len, void *_udata,
udata->btree_k[H5B_CHUNK_ID] = chunk_btree_k;

/* Remainder of "variable-sized" portion of superblock */

/* Check whether the image pointer will be out of bounds */
if (H5_IS_BUFFER_OVERFLOW(image, H5F_sizeof_addr(udata->f) * 4, end))
HGOTO_ERROR(H5E_FILE, H5E_OVERFLOW, NULL, "image pointer is out of bounds");

/* Get and verify base address, delay additional verification */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &sblock->base_addr /*out*/);
if (!H5_addr_defined(sblock->base_addr))
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "base address is undefined");

/* Get extension address, delay verification until stored eof is avail */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &sblock->ext_addr /*out*/);

/* Get and verify stored eof */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &udata->stored_eof /*out*/);
if (!H5_addr_defined(udata->stored_eof))
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "stored EOF address is undefined");
if (udata->stored_eof == 0)
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "stored EOF address cannot be 0");

/* Get driver address */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &sblock->driver_addr /*out*/);
if (H5_addr_defined(sblock->driver_addr) && sblock->driver_addr >= udata->stored_eof)
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "driver info block address exceeds end of file");

/* Validate base and extension addresses against stored_eof */
if (sblock->base_addr > udata->stored_eof)
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "base address exceeds stored EOF");
if (H5_addr_defined(sblock->ext_addr) && sblock->ext_addr >= udata->stored_eof)
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "superblock extension address exceeds stored EOF");

/* Allocate space for the root group symbol table entry */
if (sblock->root_ent)
Expand Down Expand Up @@ -579,10 +603,40 @@ H5F__cache_superblock_deserialize(const void *_image, size_t len, void *_udata,
HGOTO_ERROR(H5E_FILE, H5E_OVERFLOW, NULL, "image pointer is out of bounds");

/* Base, superblock extension, end of file & root group object header addresses */

/* Get and verify base address, delay additional verification */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &sblock->base_addr /*out*/);
if (!H5_addr_defined(sblock->base_addr))
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "base address is undefined");

/* Get extension address, delay verification until stored eof is avail */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &sblock->ext_addr /*out*/);

/* Get and verify stored eof */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &udata->stored_eof /*out*/);
if (!H5_addr_defined(udata->stored_eof))
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "stored EOF address is undefined");
if (udata->stored_eof == 0)
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "stored EOF address cannot be 0");

/* Get and verify root address */
H5F_addr_decode(udata->f, (const uint8_t **)&image, &sblock->root_addr /*out*/);
if (!H5_addr_defined(sblock->root_addr))
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "root address is undefined");
if (sblock->root_addr == 0)
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "root address cannot be 0");

/* Validate addresses against stored_eof.
Skip for VFDs that don't use absolute file offsets */
if (H5F_HAS_FEATURE(udata->f, H5FD_FEAT_DEFAULT_VFD_COMPATIBLE)) {
if (sblock->base_addr > udata->stored_eof)
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "base address exceeds stored EOF");
if (sblock->root_addr >= (udata->stored_eof - sblock->base_addr))
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "root group address beyond stored EOF");
if (H5_addr_defined(sblock->ext_addr) &&
sblock->ext_addr >= (udata->stored_eof - sblock->base_addr))
HGOTO_ERROR(H5E_FILE, H5E_BADVALUE, NULL, "superblock extension address exceeds stored EOF");
}

/* checksum verification already done in verify_chksum cb */

Expand Down
11 changes: 11 additions & 0 deletions src/H5Ppublic.h
Original file line number Diff line number Diff line change
Expand Up @@ -5763,6 +5763,17 @@ H5_DLL herr_t H5Pset_mdc_image_config(hid_t plist_id, H5AC_cache_image_config_t
* larger than the page buffer size, the subsequent call to H5Fcreate()
* using the \p plist_id will fail.
*
* The arguments min_meta_perc and min_raw_perc are to prevent one type
* of data from evicting hot pages of the other type, that is, pushing
* them out of the buffer. Setting a minimum percentage for each type
* reserves a portion of the page buffer for that type, ensuring both
* metadata and raw data can maintain a presence in the buffer.
*
* The following constraints apply to min_meta_perc and min_raw_perc:
* - Each must be between 0 and 100 inclusive
* - Their sum must not exceed 100, that is, together they can't reserve
* more than the entire page buffer.
*
* \note As of HDF5 1.14.4, this property will be ignored when an existing
* file is being opened and the file space strategy stored in the
* file isn't paged. This was previously a failure.
Expand Down
2 changes: 1 addition & 1 deletion src/H5VM.c
Original file line number Diff line number Diff line change
Expand Up @@ -1308,7 +1308,7 @@ H5VM_opvv(size_t dst_max_nseq, size_t *dst_curr_seq, size_t dst_len_arr[], hsize
* Function: H5VM_memcpyvv
*
* Purpose: Given source and destination buffers in memory (SRC & DST)
* copy sequences of from the source buffer into the destination
* copy sequences from the source buffer into the destination
* buffer. Each set of sequences has an array of lengths, an
* array of offsets, the maximum number of sequences and the
* current sequence to start at in the sequence.
Expand Down
Loading