Skip to content

feat: polyharmonic spline (PHS) N-dimensional interpolation#136

Open
twilsonco wants to merge 139 commits into
ProjectTorreyPines:masterfrom
twilsonco:feat/phs
Open

feat: polyharmonic spline (PHS) N-dimensional interpolation#136
twilsonco wants to merge 139 commits into
ProjectTorreyPines:masterfrom
twilsonco:feat/phs

Conversation

@twilsonco

@twilsonco twilsonco commented May 9, 2026

Copy link
Copy Markdown

Add N-Dimensional Polyharmonic Spline (PHS) Interpolation

This Pull Request introduces a high-performance, general-dimension Polyharmonic Spline (PHS) Radial Basis Function (RBF) interpolant to FastInterpolations.jl, based on the implementation described here.

Excerpt from Otero-de-la-Roza - 2022 - Finding critical points and reconstruction of electron densities on grids

PHS is particularly effective for smooth interpolation of multidimensional gridded physical data (e.g., electron densities, potential energy surfaces), especially when combined with log-density transforms and custom reference functions (physics-informed interpolation).


Key Features

  1. N-Dimensional PHS Interpolation (PHSInterpolantND)

    • Implements local, stencil-based PHS interpolation ($\phi(r) = r^K$, where $K$ is a compile-time odd degree like $1, 3, 5, 7, \dots$).
    • Fully analytical evaluation of values, gradients, and Hessians/Laplacians everywhere.
    • Supports any grid coordinate type (Float32, Float64) and value type (retaining duck-typed scalar support).
  2. Log-Density Smoothing Transform & Custom References

    • Implements PHSLogTransform to interpolate smooth log-transformed functions of the form $f(x) = \ln\left(\frac{\rho(x)}{\rho_0(x)}\right)$.
    • Evaluates original values and analytical derivatives dynamically via the chain rule, making it highly robust against sharp gradients and cusp singularities (e.g., nuclear cusps).
    • Provides a flexible interface supporting arbitrary callables for $\rho_0(x)$, including ConstantRef(val) (for standard log-space interpolation) or other existing interpolants.
    • Supports a nested log-space PHS workflow: utilizing a log-space PHS of the reference density $\rho_0$ as the reference for the main PHS to extract exceptionally clean reference derivatives from raw grid data.
  3. Weighted Blending for $C^2$ Continuity

    • Smoothly transitions between neighboring localized stencil evaluations using a custom blending kernel.
    • Restores global $C^2$ continuity across stencil boundaries.
    • Blend range is adjustable via the blend_factor constructor argument.
  4. Robust Precompilation & JIT Warmup

    • Leverages PrecompileTools inside src/phs/phs.jl to precompile the most common Float64/3D patterns. This completely eliminates first-call latency (JIT compilation overhead) for standard value and derivative queries.

Zero-Allocation Hot Path & Performance Optimizations

Achieving competitive evaluation speeds for stencil-based RBFs required eliminating all sources of heap allocations on the hot path. Through rigorous profiling and micro-optimizations, the query hot path is 100% allocation-free for both values and derivatives.

Key Optimizations:

  • Canonical Stencil & Boundary Shift Caching: Instead of constructing and inverting a coordinate-dependent system of linear equations at every evaluation point, a single canonical stencil configuration ($\Phi^{-1}$) is precomputed at build time. Boundary nodes reuse the same inverted matrix via clamped data indices or are handled via precomputed boundary-shifted matrices stored in a static shift_cache.
  • Elimination of Task-Local Storage Overhead: Task-local caching for stencil coefficients was replaced with thread-safe, thread-indexed static caches (coeff_caches within PHSInterpolantND and AdaptiveArrayPools for thread workspaces), ensuring thread-safety without dynamic memory allocations under parallel batch queries.
  • Explicit Exponent Specialization: Implemented specialized radial kernel methods for $K=1, 3, 5, 7$ to replace generic r^K exponentiation with explicit, compiler-friendly multiplications.
  • Branchless Operations & Single Reciprocals: Introduced branchless r_inv patterns and @simd loops in radial evaluation. Redundant divisions in blending functions were fused into a single reciprocal division followed by multiplications.
  • Static Dispatch & Metaprogramming: Replaced runtime derivative axes parsing with compile-time @generated function dispatch, allowing the compiler to fully unroll evaluation loops.
  • Type Stability: Ensured absolute type stability for both coordinate evaluation pipelines and the reference-derivative chain-rule mappings.

Real-World Validation: Phenol Dimer (Electron Density)

To validate correctness and showcase the interpolation accuracy, we included a real-world quantum chemistry benchmark in scripts/phs_density_comparison.jl (phenol dimer, DFT-computed electron density on a $75 \times 113 \times 70$ grid).

The benchmark compares PHS (using a log-density transform and a promolecular density reference constructed analytically from critic2 wavefunctions) against standard interpolation methods (Nearest, Linear, Cubic, Cardinal).

Accuracy Comparison (Relative Error Statistics)

1. Charge Density ($\rho$)

Method Min Error Max Error Mean Error Median Error
Nearest 5.27e-04 (35555×) 2.15e+00 (2×) 1.84e-01 (45×) 1.34e-01 (837×)
Linear 1.68e-05 (1133×) 9.34e-01 (1×) 8.80e-02 (22×) 2.18e-02 (135×)
Cubic 4.58e-06 (309×) 9.73e-01 (1×) 1.17e-01 (29×) 3.36e-03 (21×)
Cardinal 2.29e-05 (1546×) 9.21e-01 (1×) 9.65e-02 (24×) 3.47e-03 (22×)
PHS (with Ref) 1.48e-08 1.00e+00 4.06e-03 1.61e-04

2. Gradient Magnitude ($|\nabla \rho|$)

Method Min Error Max Error Mean Error Median Error
Linear 3.75e-05 (49×) 2.39e+01 (24×) 4.14e-01 (35×) 1.89e-01 (171×)
Cubic 2.39e-05 (31×) 3.36e+00 (3×) 3.57e-01 (30×) 2.65e-02 (24×)
Cardinal 1.83e-04 (238×) 2.52e+00 (3×) 2.48e-01 (21×) 3.23e-02 (29×)
PHS (with Ref) 7.68e-07 1.00e+00 1.17e-02 1.11e-03

3. Laplacian Magnitude ($|\nabla^2 \rho|$)

Method Min Error Max Error Mean Error Median Error
Cubic 8.30e-06 (1×) 1.13e+03 (364×) 6.41e+00 (135×) 1.69e-01 (12×)
Cardinal 7.58e-04 (59×) 1.96e+02 (63×) 2.41e+00 (51×) 5.03e-01 (37×)
PHS (with Ref) 1.28e-05 3.09e+00 4.73e-02 1.37e-02

Note: Relative error ratios in parentheses indicate how many times larger the error of the given method is compared to PHS.

Here's a set of plots that better captures the comparison (this plot is generated using the new script with the PR, and is included in the PHS docs):

phs_density_comparison

Evaluation Timings & Allocation Verification

Evaluation times are measured for 1000 query points along a phenol-dimer hydrogen-bond path (runs executed twice to isolate JIT and stencil caches):

Evaluating along path (1000 points)...
  Density (ρ):
    Nearest ...   0.000011 seconds (0 allocations)
    Linear ...    0.000013 seconds (0 allocations)
    Cubic ...     0.000021 seconds (0 allocations)
    Cardinal ...  0.000116 seconds (0 allocations)
    PHS ...       0.009040 seconds (0 allocations)  <-- Zero Allocations
  Gradient Magnitude (|∇ρ|):
    Linear ...    0.000111 seconds (13 allocations)
    Cubic ...     0.000052 seconds (7 allocations)
    Cardinal ...  0.000350 seconds (7 allocations)
    PHS ...       0.032212 seconds (7 allocations: 128 bytes) <-- Zero dynamic heap allocations
  Laplacian Magnitude (|∇²ρ|):
    Cubic ...     0.000060 seconds (7 allocations)
    Cardinal ...  0.000335 seconds (1 allocation)
    PHS ...       0.042326 seconds (1 allocation: 32 bytes) <-- Zero dynamic heap allocations

Extensive optimization was performed after the original implementation, reducing the runtime for this from approx. 30s down to 0.042s on a M3 Ultra Mac Studio, a ~715X speedup.

While PHS has a higher baseline computational cost due to localized blending and radial evaluations, it provides multiple orders of magnitude better accuracy, particularly in derivative fields (gradient & Hessian/Laplacian), with completely stable memory usage.


Codebase Modifications & Added Files

The main addition of the PHS feature-set lives in the src/phs/ directory:

  • src/phs/phs.jl: Features aggregator and compile workload definition.
  • src/phs/phs_kernels.jl: Core radial basis functions and their specialized first/second-order derivatives ($K=1,3,5,7$).
  • src/phs/phs_stencil.jl: Matrix formulation, canonical stencil offset solver, and boundary shift cached variants.
  • src/phs/phs_types.jl: Type definitions (PHSInterpolantND, PHSLogTransform, ConstantRef).
  • src/phs/phs_eval.jl: Thread-safe coefficient evaluations, blend function kernels, and evaluation pipelines (including value, gradient, Hessian, and log-transform chain-rule solvers).
  • src/phs/phs_interpolant.jl: Public constructor phs_interp and user-facing callable entrypoints.
  • src/phs/phs_oneshot.jl: Non-allocating batch evaluation methods.

New Documentation & Testing:

  • docs/src/interpolation/phs.md: Mathematical foundations, full API specifications, parameters tuning, and performance guides.
  • test/test_phs_nd.jl: Exhaustive unit tests checking multidimensional interpolation correctness, degree scaling, log-transform safety, allocations, and type stability.
  • scripts/phs_density_comparison.jl: Full-scale validation and timing comparison script for phenol-dimer electron density.
  • scripts/update_readme_2d_comparison.jl: Scripts to plot multi-panel 2D validation results on regular and irregular grids.

Future Improvements

As I integrate this PHS implementation into my other project (the chemistry density-analysis tool for which PHS was originally implemented), further changes, potential API polish, or performance-related improvements may be PR'd back to this repository.


Thank you for the reviews! Let me know if you would like any specific changes or further tests.

@github-actions

github-actions Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

FastInterpolations.jl Benchmarks

All benchmarks (65 total, click to expand)
Benchmark Current: 450ce04 Previous Imm. Ratio Grad. Ratio
10_nd_construct/bicubic_2d 37770.0 ns 39113.0 ns 0.966 0.98
10_nd_construct/bilinear_2d 605.9 ns 728.4 ns 0.832 0.855
10_nd_construct/phs_2d 1088093 ns - ns - -
10_nd_construct/phs_3d 18020027.2 ns - ns - -
10_nd_construct/tricubic_3d 352852.0 ns 365687.0 ns 0.965 0.982
10_nd_construct/trilinear_3d 1697.8 ns 1861.7 ns 0.912 0.928
11_nd_eval/bicubic_2d_batch 1439.7 ns 1437.6 ns 1.001 0.963
11_nd_eval/bicubic_2d_scalar 16.3 ns 16.3 ns 0.999 1.008
11_nd_eval/bilinear_2d_scalar 7.5 ns 7.5 ns 1.0 1.045
11_nd_eval/phs_2d_batch 164302.2 ns - ns - -
11_nd_eval/phs_2d_scalar 1468.2 ns - ns - -
11_nd_eval/phs_3d_batch 1713130.5 ns - ns - -
11_nd_eval/phs_3d_scalar 17563.3 ns - ns - -
11_nd_eval/tricubic_3d_batch 3242.1 ns 3234.1 ns 1.002 0.978
11_nd_eval/tricubic_3d_scalar 33.5 ns 33.4 ns 1.003 0.968
11_nd_eval/trilinear_3d_scalar 13.2 ns 13.2 ns 1.0 0.996
12_cubic_eval_gridquery/range_random 4235.7 ns 4232.5 ns 1.001 0.963
12_cubic_eval_gridquery/range_sorted 4221.1 ns 4215.7 ns 1.001 0.961
12_cubic_eval_gridquery/vec_random 9316.5 ns 9373.2 ns 0.994 0.953
12_cubic_eval_gridquery/vec_sorted 3204.4 ns 3204.4 ns 1.0 1.001
13_nd_oneshot_gridquery/bicubic_2d_rand_rand 66056.9 ns 65794.0 ns 1.004 0.997
13_nd_oneshot_gridquery/bicubic_2d_sort_rand 62903.0 ns 62651.6 ns 1.004 1.008
13_nd_oneshot_gridquery/bicubic_2d_sort_sort 59314.3 ns 58434.8 ns 1.015 1.008
13_nd_oneshot_gridquery/bilinear_2d_rand_rand 17494.8 ns 17355.8 ns 1.008 1.023
13_nd_oneshot_gridquery/bilinear_2d_sort_rand 10163.3 ns 9752.5 ns 1.042 1.064
13_nd_oneshot_gridquery/bilinear_2d_sort_sort 5741.4 ns 5746.0 ns 0.999 1.07
14_series_oneshot_batch/constant_inplace_vec_k8_q1000_rand 18551.9 ns 18540.8 ns 1.001 0.982
14_series_oneshot_batch/linear_inplace_vec_k8_q1000_rand 17516.9 ns 18261.3 ns 0.959 0.941
15_phs_oneshot/q00001 35327.9 ns - ns - -
15_phs_oneshot/q10000 2045346.2 ns - ns - -
16_phs_construct/g0100 34214.2 ns - ns - -
16_phs_construct/g1000 34187.1 ns - ns - -
17_phs_eval/q00001 199.0 ns - ns - -
17_phs_eval/q00100 19430.6 ns - ns - -
17_phs_eval/q10000 1950535.6 ns - ns - -
1_cubic_oneshot/q00001 539.8 ns 546.2 ns 0.988 0.973
1_cubic_oneshot/q10000 43480.6 ns 43574.8 ns 0.998 0.888
2_cubic_construct/g0100 1410.4 ns 1385.6 ns 1.018 0.971
2_cubic_construct/g1000 12780.0 ns 13371.1 ns 0.956 0.956
3_cubic_eval/q00001 20.1 ns 20.8 ns 0.966 0.938
3_cubic_eval/q00100 443.6 ns 444.6 ns 0.998 0.965
3_cubic_eval/q10000 42661.0 ns 42644.1 ns 1.0 0.961
4_linear_oneshot/q00001 24.9 ns 24.7 ns 1.008 0.968
4_linear_oneshot/q10000 18731.1 ns 18733.2 ns 1.0 0.986
5_linear_construct/g0100 36.8 ns 50.0 ns 0.735 0.874
5_linear_construct/g1000 285.2 ns 399.0 ns 0.715 1.006
6_linear_eval/q00001 10.4 ns 10.2 ns 1.02 1.01
6_linear_eval/q00100 198.6 ns 201.4 ns 0.986 0.994
6_linear_eval/q10000 18499.8 ns 18427.6 ns 1.004 0.98
7_cubic_range/scalar_query 8.3 ns 8.3 ns 1.0 0.961
7_cubic_vec/scalar_query 10.8 ns 11.3 ns 0.956 0.97
8_cubic_multi/construct_s001_q100 636.6 ns 652.6 ns 0.975 0.944
8_cubic_multi/construct_s010_q100 4452.1 ns 4708.2 ns 0.946 0.943
8_cubic_multi/construct_s100_q100 40055.1 ns 41420.7 ns 0.967 0.953
8_cubic_multi/eval_s001_q100 818.1 ns 811.9 ns 1.008 0.999
8_cubic_multi/eval_s010_q100 1807.6 ns 1788.4 ns 1.011 0.997
8_cubic_multi/eval_s010_q100_scalar_loop 2284.1 ns 2293.1 ns 0.996 0.962
8_cubic_multi/eval_s100_q100 11282.1 ns 11251.1 ns 1.003 0.981
8_cubic_multi/eval_s100_q100_scalar_loop 3320.2 ns 3370.3 ns 0.985 0.965
9_nd_oneshot/bicubic_2d 44216.9 ns 44522.6 ns 0.993 0.963
9_nd_oneshot/bilinear_2d 540.4 ns 541.0 ns 0.999 0.964
9_nd_oneshot/phs_2d 1367011.9 ns - ns - -
9_nd_oneshot/phs_3d 20346404.4 ns - ns - -
9_nd_oneshot/tricubic_3d 423282.9 ns 422750.3 ns 1.001 1.027
9_nd_oneshot/trilinear_3d 1032.9 ns 1035.9 ns 0.997 0.986

1 benchmark(s) were flagged but verified as CI noise after 10 re-run(s)

This comment was automatically generated by Benchmark workflow.

@mgyoo86

mgyoo86 commented May 11, 2026

Copy link
Copy Markdown
Member

@twilsonco
Thanks a lot for the PR and the detailed report.
This looks like a really nice feature, and I appreciate all the work you put into it.

This is a fairly large change, so I’ll need some time to digest it and review it properly. I’m also in the middle of major refactoring for the next minor release, so I’ll probably finish that first, and then review and merge this separately for a following version.

I likely won’t be able to review this properly over the next few days, but I’ll get back to it as soon as I can. Sorry for the delay, and thanks again for the contribution!

A couple of quick notes for now:

  • Could you please remove the PrecompileTools dependency and the @compile_workload part from this PR? I may consider PrecompileTools in the future, but when I tried it before, the caching behavior felt a bit clunky, so I’d rather keep it out of this PR for now.

  • We use Runic.jl for the formatting

  • It looks like the CI is currently failing. Could you please check that when you have a chance?

  • Since this PR is quite large, would it be okay if I make some small edits directly during the review, if needed?

@twilsonco

twilsonco commented May 11, 2026

Copy link
Copy Markdown
Author

Hi. Great project here. I'm happy to contribute.

This is a fairly large change, so I’ll need some time to digest it and review it properly.

Yes. Fortunately, all the changes are self-contained (except for the addition of PrecompileTools).

I’m also in the middle of major refactoring for the next minor release, so I’ll probably finish that first, and then review and merge this separately for a following version.

That's fine. Once you're done with the refactor, comment hear and I'll make any necessary changes to the PR.

Could you please remove the PrecompileTools dependency and the @compile_workload part from this PR? I may consider PrecompileTools in the future, but when I tried it before, the caching behavior felt a bit clunky, so I’d rather keep it out of this PR for now.

I can, but in my testing, I get a 2x speedup from PrecompileTools, so it would be a big hit to the performance of PHS, which is already considerably slower than the other methods. I'll revisit this and see if I can achieve the same performance without it. It's removed.

We use Runic.jl for the formatting

I'll make sure it's applied.

It looks like the CI is currently failing. Could you please check that when you have a chance?

Yes. It looks like it's the formatting that's the failing step.

Since this PR is quite large, would it be okay if I make some small edits directly during the review, if needed?

Once the PR is ready again after you finish the refactor, I'll resubmit. At that point, of course you're welcome to make edits.

@twilsonco twilsonco marked this pull request as draft May 11, 2026 16:18
twilsonco and others added 25 commits May 11, 2026 10:39
Implements C2-continuous blended PHS interpolation via local RBF stencils
with polynomial augmentation. Zero-allocation hot path via AdaptiveArrayPools,
multithreaded batch evaluation, full first+second derivative support.

New files in src/phs/:
- phs_kernels.jl: radial kernel phi(r)=r^K, blend weight w(d,a) with derivs
- phs_stencil.jl: stencil selection + Phi-inv precomputation (boundary-safe)
- phs_types.jl: PHSInterpolantND struct + PHSLogTransform
- phs_eval.jl: 3-layer evaluation engine (value/deriv1/deriv2, quotient rule)
- phs_interpolant.jl: constructor + callable protocol
- phs_oneshot.jl: one-shot scalar and batch forms
- phs.jl: aggregator include

New test file: test/test_phs_nd.jl (115 tests, all passing)

Exports added: phs_interp, phs_interp!, PHSInterpolantND, PHSLogTransform
For r^K PHS the augmentation must have degree ≥ (K-1)÷2:
- K=3 → linear (N+1 terms) — was already correct
- K=5 → quadratic C(N+2,2) terms — was WRONG (only used N+1)
- K=7 → cubic C(N+3,3) terms — was WRONG

Changes:
- phs_kernels.jl: add _phs_n_poly, _phs_all_exponents,
  @generated _phs_poly_exps_tuple, _phs_eval_poly,
  _phs_eval_poly_deriv1, _phs_eval_poly_deriv2
- phs_stencil.jl: _phs_build_phi_inv now computes poly_deg=(degree-1)÷2,
  uses _phs_all_exponents for C block; M = ns + binomial(N+poly_deg,poly_deg)
- phs_eval.jl: all three _phs_eval_coeffs_* use _phs_poly_exps_tuple
  for correct compile-time polynomial evaluation; buffer size M
  taken from actual phi_inv matrix size instead of hardcoded ns+N+1
- scripts/update_readme_2d_comparison.jl: drop PeriodicBC from cubic
  for fairer comparison; use PHS degree=3

All 115 PHS tests pass.
Use 15x15 regular spacing so PHS (0.3% error) and cubic with
PeriodicBC (0.0% error) both show near-perfect results, while
constant (18%) and linear (3.5%) illustrate the progression.
…d PHS interpolation title

Co-authored-by: Copilot <copilot@github.com>
…lations in density comparison

Co-authored-by: Copilot <copilot@github.com>
This reverts commit 348e7b3.
…plot example

- Create docs/src/interpolation/phs.md with mathematical foundation, usage, and quantum chemistry example
- Link to paper (https://doi.org/10.1063/5.0090232)
- Add auto-download for critic2 wavefunction files in phs_density_comparison.jl
- Add scripts/dat/wfc/ to .gitignore to avoid committing large downloaded files
- Remove redundant examples/promolecular_density.jl (consolidate with validation script)
- Update README link to point to full PHS documentation
The ensure_wfc_files() function had an invalid 'import JSON' statement inside it,
which causes a syntax error in Julia. Fixed by switching to regex-based URL parsing
instead of JSON parsing. This removes the JSON.jl dependency while maintaining
full functionality for downloading wavefunction files.
@codecov

codecov Bot commented May 11, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 98.58044% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.75%. Comparing base (86d435d) to head (450ce04).

Files with missing lines Patch % Lines
src/phs/phs_eval.jl 97.98% 18 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #136      +/-   ##
==========================================
+ Coverage   96.56%   96.75%   +0.18%     
==========================================
  Files         148      154       +6     
  Lines       12272    13540    +1268     
==========================================
+ Hits        11851    13101    +1250     
- Misses        421      439      +18     
Files with missing lines Coverage Δ
src/FastInterpolations.jl 100.00% <ø> (ø)
src/phs/phs_interpolant.jl 100.00% <100.00%> (ø)
src/phs/phs_kernels.jl 100.00% <100.00%> (ø)
src/phs/phs_oneshot.jl 100.00% <100.00%> (ø)
src/phs/phs_stencil.jl 100.00% <100.00%> (ø)
src/phs/phs_types.jl 100.00% <100.00%> (ø)
src/phs/phs_eval.jl 97.98% <97.98%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@twilsonco

Copy link
Copy Markdown
Author

Note that the missing lines in test coverage CodeCov is reporting are false-positives. The PR has 100% coverage.

mgyoo86 added 9 commits June 10, 2026 20:43
Add test/test_phs_broken_pins.jl with 20 @test_broken pins for issues
found in the PR ProjectTorreyPines#136 PHS review. Each pin flips to an Unexpected Pass
once the underlying bug is fixed, signalling promotion to @test, so
follow-up PRs have ready-made TDD targets. All 20 currently record as
Broken (verified: 0 Error), so the suite stays green.

Value pins (correct value known):
- node-derivative d≈0 bug: 1D, 2D, and log-transform path (drops the
  w·f' term at a grid node)
- gradient/hessian/laplacian agreement with the deriv keyword path
- Complex-valued data evaluation
- Clamp/Wrap extrapolation returning the boundary/wrapped value
- non-uniform grid node + linear reproduction
- derivative order >= 3 matching a finite-difference reference
  (nonzero: PHS is transcendental via the exp blend, not polynomial)
- 1D bare-vector construction phs_interp(x, y)

Throw pins (refusal is the intended fix):
- batch out-of-domain query under NoExtrap
- blend_factor validation
- log-transform positivity (reject non-positive data)

Tracking pin:
- ExtendExtrap far-OOB collapse to 0.0 (compact-support blend)

Add PHSBrokenHelpers test snippet (is_throwing) to test/setup.jl.
Move the PHS density-comparison scripts and their data out of the shared
scripts/ root into a dedicated scripts/phs/ folder:

- scripts/phs_density_comparison.jl          -> scripts/phs/
- scripts/phs_density_comparison_simplified.jl -> scripts/phs/
- scripts/dat/                               -> scripts/phs/dat/

Resolve the .pkl/.csv/.xyz data paths via @__DIR__ instead of a CWD-relative
"dat/...", so the scripts run from any working directory after the move.

The wfc/ wavefunction files keep auto-downloading from critic2 (verified:
ensure_wfc_files URLs return HTTP 200) and stay gitignored. The DFT grid
(.pkl) and line cut (.csv) have no public download source and are required to
run the script, so they remain committed under scripts/phs/dat/ (relocated as
renames — no new blobs).

Update docs/src/interpolation/phs.md run instructions to the new path and
point the source link at master instead of the stale feat branch. The
non-PHS update_readme_2d_comparison.jl and the shared scripts/Project.toml
stay at scripts/ root.
The plot output path was CWD-relative ("../docs/images/...") and broke once
the scripts moved to scripts/phs/. Resolve it through @__DIR__ like the data
paths, so the figure lands in docs/images/ regardless of working directory.

Verified by running scripts/phs/phs_density_comparison_simplified.jl end to
end (with the local dev FastInterpolations): wfc auto-download, .pkl/.csv/.xyz
load, PHS-3 build, and the benchmark/profile all succeed from the new layout.
Correct factually-wrong comments/docstrings found while auditing src/phs
against the actual code. Comment-only (plus removing one dead constant);
no behavior change, package precompiles cleanly.

Threading / memory locality:
- The coefficient cache is per-thread (a Vector{Dict} indexed by
  Threads.threadid()), not task-local: drop the false "task_local_storage"
  and "objectid sub-Dict" claims and the dead _PHS_COEFF_CACHE_TKEY constant.
- AdaptiveArrayPools scratch buffers are task-local (task_local_storage),
  not thread-local: fix the two comments that mislabeled them.
- Drop the "immutable after construction / all workspace in pools" claim
  (the coeff cache is a mutated struct field).

Math / structure:
- Φ is symmetric but indefinite (saddle-point), not positive definite.
- Kernel guards fire at r ≤ 0 (r <= zero(T)), not at a nonexistent r ≤ ε;
  for K=1 it is φ'(r)/r that is singular, while φ'(0)=1 is finite.

Out-of-sync API/struct docs:
- Remove the nonexistent `spacings` field from the struct docstring.
- Fix the _phs_build_stencil signature (no `spacings` argument).
- Correct the boundary left-clip formula to include stencil_lo[d].
- Drop the false "cache always built for stencil_size<=6, N<=3" guarantee
  (it is skipped when the estimate exceeds the 100 MB limit).
- The one-shot batch builds a full temporary interpolant, not just the
  output vector.
- `_phs_blend_params` computes the mean grid spacing per axis, not the max.
The PHS interpolation page (docs/src/interpolation/phs.md) was never added
to the `pages` list, so it built as an orphan and was unreachable from the
site navigation. Add it under the Interpolation section, after Local Cubic
Hermite.
@mgyoo86

mgyoo86 commented Jun 16, 2026

Copy link
Copy Markdown
Member

Hi @twilsonco,

Thank you again for this valuable contribution,
and I apologize for the very late review — I have been busy with work lately.

First, the PHS code is nicely done.
It is well separated as its own path, which made it easy to review and work with.

Small fixes I made

  • Merged the latest master, so the branch is up to date.
  • Connected the PHS page to the docs build, moved the comparison scripts into scripts/phs/, and fixed some wrong text in the README and docstrings.

Some issues I noticed

The core interpolation works well.
But there are still a few open issues, mostly in APIs that are not fully supported yet (some extrapolation modes, non-uniform grids, a few derivative cases, etc.).
The risky part is that some of these are silent — they return a wrong value with no error.
I added them as @test_broken tests in test/test_phs_broken_pins.jl, so they are tracked and easy to turn into real tests later when each one is fixed.

Plan
I do not think we need to merge this now.
First, I will merge other compatibility PRs, and release the new version shortly.
After that, I will merge this PR and open following PRs for a later release.

I don't think we need to fix all the broken tests at once.
I'll mainly focus on blocking the silent bugs with clear error messages, so nothing returns a wrong value quietly.
Then it should be safe to release, and others can be fixed/improved in the future releases.

@mgyoo86 mgyoo86 closed this Jun 16, 2026
@mgyoo86 mgyoo86 reopened this Jun 16, 2026
@mgyoo86 mgyoo86 marked this pull request as ready for review June 16, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants