Draft: Region per file storage strategy by TheAssembler1 · Pull Request #287 · hpc-io/pdc

TheAssembler1 · 2025-08-06T15:18:30Z

Adds the following enum so the storage strategy can be selected:

typedef enum pdc_region_writeout_strategy {
    /**
     * Store data as multiple regions inside a single file.
     * Overlapping writes that are not fully contained append new regions
     * to the end of the file, with metadata tracking region locations.
     * Supports incremental updates without rewriting large parts of the file.
     */
    STORE_REGION_BY_REGION_SINGLE_FILE = 0,

    /**
     * Store the entire object as a single flat file.
     * Reads and writes operate by seeking directly within the file.
     * No region metadata bookkeeping; simpler but less flexible for partial updates.
     */
    STORE_FLATTENED_SINGLE_FILE,

    /**
     * Store each flattened region in its own separate file.
     * Enables independent file management per region.
     */
    STORE_FLATTENED_REGION_PER_FILE
} pdc_region_writeout_strategy;

The STORE_REGION_BY_REGION_SINGLE_FILE is the default strategy. The STORE_FLATTENED_REGION_PER_FILE is the new strategy which stores each region of an object in a separate file. The region size the object is sliced into is decided in:

/**
 * Used decide how to split object into chunks each of which will be a file on disk
 */
static perr_t
PDC_shrink_file_dims(uint64_t *temp_file_dims, const uint64_t *obj_dims, uint8_t obj_ndim, size_t unit)

By default it will try to slice the object into regions that are 4 MB in size by halving the largest dimension of the object iteratively until within the <= 4 MB.

This is set here uint64_t max_bytes_per_file = 4ULL * 1024 * 1024; within the PDC_shrink_file_dims function.

TheAssembler1 · 2025-09-18T19:28:51Z

We might want to compare the performance between the storage strategies before merging.

jeanbez · 2026-01-27T19:09:45Z

@TheAssembler1 do you have any updates on this PR?

TheAssembler1 · 2026-05-19T18:22:30Z

Object level property for setting the storage strategy. Add documentation for setting the storage strategy. Add performance numbers.

TheAssembler1 · 2026-06-16T17:17:43Z

PDC Writeout Strategy Benchmark (cache=ON, Perlmutter)

Note: The job exceeded the wall-clock time limit and was cancelled by Slurm (STEP CANCELLED DUE TO TIME LIMIT). The benchmark completed 17 of 54 planned configurations (all cache=ON, 16-client server-scaling tests and the cache=ON, 1-server client-scaling tests up through 4 clients). The cache=OFF block and client counts >= 8 were not reached. Results below reflect completed tests only.

Table 1: Client Operation Times (avg across 5 steps)

All times in seconds.

Servers	Clients	Strategy	Obj Create	Xfer Create	Xfer Start	Xfer Wait	Xfer Close	Obj Close	Server Close
1	16	0 RGN/SINGLE	7.50e-04	3.97e-05	0.3303	0.2390	0.0970	6.74e-05	5.678
1	16	1 FLAT/SINGLE	7.22e-04	3.72e-05	0.2960	0.2653	0.0956	5.67e-05	4.699
1	16	2 FLAT/PER_FILE	7.06e-04	3.18e-05	0.7388	0.2115	0.0975	6.72e-05	102.376
2	16	0 RGN/SINGLE	7.26e-04	3.81e-05	0.5107	0.1005	0.0970	6.68e-05	3.326
2	16	1 FLAT/SINGLE	6.71e-04	3.82e-05	0.4808	0.1065	0.0966	5.76e-05	3.220
2	16	2 FLAT/PER_FILE	7.70e-04	4.54e-05	0.3535	0.4466	0.0966	6.12e-05	34.205
4	16	0 RGN/SINGLE	6.79e-03	4.04e-05	0.3905	0.0625	0.0983	6.61e-05	2.081
4	16	1 FLAT/SINGLE	5.88e-03	3.52e-05	0.6741	0.0281	0.0962	5.95e-05	1.858
4	16	2 FLAT/PER_FILE	8.33e-03	3.54e-05	0.4432	0.2086	0.0967	6.68e-05	9.973
1	1	0 RGN/SINGLE	5.09e-04	1.69e-05	0.1950	0.0096	0.0982	3.87e-05	1.149
1	1	1 FLAT/SINGLE	5.22e-04	1.49e-05	0.2113	0.0097	0.0790	3.66e-05	0.949
1	1	2 FLAT/PER_FILE	5.57e-04	1.72e-05	0.1951	0.0097	0.0977	4.36e-05	2.161
1	2	0 RGN/SINGLE	4.75e-04	1.51e-05	0.2039	0.0181	0.0975	3.34e-05	1.077
1	2	1 FLAT/SINGLE	5.22e-04	1.55e-05	0.1726	0.0177	0.0983	3.61e-05	0.888
1	2	2 FLAT/PER_FILE	4.67e-04	1.55e-05	0.1689	0.0227	0.0960	3.20e-05	4.445
1	4	0 RGN/SINGLE	6.59e-04	2.46e-05	0.2009	0.0612	0.0943	3.74e-05	1.883
1	4	1 FLAT/SINGLE	5.54e-04	2.72e-05	0.1785	0.0623	0.0973	5.07e-05	1.325
1	4	2 FLAT/PER_FILE	5.60e-04	3.72e-05	0.1597	0.0034	0.0924	4.57e-05	N/A

Config: 8,388,608 particles/rank, 5 steps, 20s sleep between steps, cache=ON, Lustre 256 OSTs. Incomplete configs: cache=OFF (all), cache=ON clients in {8, 16, 32} not reached before timeout.

TheAssembler1 · 2026-06-16T17:59:23Z

Writeout optimization: increased region slice size from 4 MB to 128 MB for STORE_FLATTENED_REGION_PER_FILE.

Tested on Perlmutter with 1 server, 16 clients, 8388608 particles/rank, 5 steps, 20s sleep, cache=ON.

The larger slice size reduces the number of individual file flush operations the server has to perform per object, which significantly cuts per-region flush time and total server drain time at shutdown.

Metric	Before (4 MB)	After (128 MB)	Improvement
Avg flush time per region	4.02s	2.01s	2x faster
Total close time	102.4s	35.3s	2.9x faster
Xfer wait (steps 0-3)	~4ms	~4ms	no regression
Xfer wait (step 4)	1.04s	1.23s	no regression

TheAssembler1 had a problem deploying to external August 6, 2025 15:18 — with GitHub Actions Failure

TheAssembler1 force-pushed the region_per_file branch from b2a04b2 to 36cdffb Compare August 6, 2025 15:23

TheAssembler1 had a problem deploying to external August 6, 2025 15:23 — with GitHub Actions Failure

TheAssembler1 had a problem deploying to external August 6, 2025 15:26 — with GitHub Actions Failure

TheAssembler1 had a problem deploying to external August 6, 2025 23:07 — with GitHub Actions Failure

TheAssembler1 force-pushed the region_per_file branch from f98bb9e to 79a45b9 Compare August 9, 2025 20:05

TheAssembler1 had a problem deploying to external August 9, 2025 20:05 — with GitHub Actions Failure

checkpoint

844ae19

TheAssembler1 force-pushed the region_per_file branch from a3ae926 to 844ae19 Compare September 17, 2025 14:00

Committing clang-format changes

a391dc3

TheAssembler1 marked this pull request as ready for review September 18, 2025 19:29

TheAssembler1 requested a review from a team as a code owner September 18, 2025 19:29

TheAssembler1 changed the title ~~Draft: Region Per File~~ Region Per File Sep 18, 2025

TheAssembler1 changed the title ~~Region Per File~~ Region per file storage strategy Sep 18, 2025

jeanbez changed the title ~~Region per file storage strategy~~ Draft: Region per file storage strategy Oct 21, 2025

jeanbez assigned jeanbez and unassigned jeanbez Oct 21, 2025

jeanbez added the type: enhancement New feature or request label Oct 21, 2025

Merge branch 'develop' into region_per_file

2222562

jeanbez requested review from houjun and jeanbez May 19, 2026 18:19

Merge branch 'develop' into region_per_file

01dffc5

TheAssembler1 self-assigned this May 21, 2026

TheAssembler1 added 3 commits May 22, 2026 18:55

add object property for setting object writeout strategy

5104f45

object property to set writeout strategy

ec791b8

scripts for testing

597bc3b

TheAssembler1 force-pushed the region_per_file branch from 8b02ce2 to 597bc3b Compare May 28, 2026 15:51

Committing clang-format changes

2e7b93f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft: Region per file storage strategy#287

Draft: Region per file storage strategy#287
TheAssembler1 wants to merge 8 commits into
hpc-io:developfrom
TheAssembler1:region_per_file

TheAssembler1 commented Aug 6, 2025 •

edited

Loading

Uh oh!

TheAssembler1 commented Sep 18, 2025

Uh oh!

jeanbez commented Jan 27, 2026

Uh oh!

TheAssembler1 commented May 19, 2026

Uh oh!

TheAssembler1 commented Jun 16, 2026 •

edited

Loading

Uh oh!

TheAssembler1 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

TheAssembler1 commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheAssembler1 commented Sep 18, 2025

Uh oh!

jeanbez commented Jan 27, 2026

Uh oh!

TheAssembler1 commented May 19, 2026

Uh oh!

TheAssembler1 commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PDC Writeout Strategy Benchmark (cache=ON, Perlmutter)

Table 1: Client Operation Times (avg across 5 steps)

Uh oh!

TheAssembler1 commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TheAssembler1 commented Aug 6, 2025 •

edited

Loading

TheAssembler1 commented Jun 16, 2026 •

edited

Loading