geotiff: surface VRT skipped sources as attrs['vrt_holes'] (#1734)#1747
Conversation
There was a problem hiding this comment.
Pull request overview
This PR improves xrspatial.geotiff.read_vrt’s default (lenient) behavior by surfacing machine-readable information about skipped/unreadable VRT sources, so downstream pipelines can detect partial mosaics without relying on warnings alone (issue #1734).
Changes:
- Add
VRTDataset.holesto record skipped source metadata (source,band,dst_rect,error) during lenient reads. - Extend the fallback warning to point users to
DataArray.attrs['vrt_holes']andXRSPATIAL_GEOTIFF_STRICT=1. - Propagate non-empty
vrt.holesto the publicread_vrtresult asattrs['vrt_holes'], plus add a new regression test module.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_vrt.py |
Records skipped-source “holes” during lenient VRT reads and enhances the warning message. |
xrspatial/geotiff/__init__.py |
Copies non-empty skipped-source records onto the returned DataArray as attrs['vrt_holes']. |
xrspatial/geotiff/tests/test_vrt_holes_attr_1734.py |
Adds regression tests asserting the new vrt_holes attribute behavior and warning messaging. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ' <VRTRasterBand dataType="Float32" band="1">\n' | ||
| ' <NoDataValue>-9999</NoDataValue>\n' |
There was a problem hiding this comment.
| missing_src = f'{tmp_path}/does_not_exist_1734_strict.tif' | ||
| _write_vrt_with_missing_source(vrt_path, missing_src) | ||
|
|
||
| with pytest.raises(Exception): |
There was a problem hiding this comment.
Tightened to pytest.raises(FileNotFoundError, match=...) against the filename. Confirmed locally that the strict path actually raises FileNotFoundError (an OSError subclass) -- _FileSource opens the file via open(real, 'rb'), so a missing local source is a plain FileNotFoundError that propagates through read_to_array. Fixed in 5915bd9.
The default lenient mode of read_vrt warns once per unreadable source
and continues, returning a mosaic with zero-filled holes that
downstream integer-VRT code cannot distinguish from real data. The
warning is easy to miss in a pipeline that filters UserWarnings.
Add a structured side-channel: VRTDataset gains a .holes list that
the read loop populates with {source, band, dst_rect, error} on each
skipped source. The public read_vrt copies a non-empty list into the
returned DataArray as attrs['vrt_holes'] so callers can detect
partial mosaics by a single attribute lookup.
The fallback warning message now points at attrs['vrt_holes'] or the
XRSPATIAL_GEOTIFF_STRICT env var, so the recovery path is
discoverable from a single captured warning.
Strict mode (XRSPATIAL_GEOTIFF_STRICT=1) is unchanged and still
raises.
Adds regression coverage in test_vrt_holes_attr_1734.py.
…rror Address PR #1747 review comments: * The helper that wrote the VRT used `dataType="Float32"` with a `NoDataValue`, so the missing-source hole came out as NaN, which is not the failure mode #1734 documents. Switch to `Int32` and drop the NoDataValue so the lenient path produces the zero-filled integer hole the attr is meant to surface. Add an explicit `(da.values == 0).all()` assert on the integer dtype so the regression is pinned, not just the attr surface. * The strict-mode test caught `Exception`, which would have happily swallowed any unrelated failure. The missing-source path raises `FileNotFoundError` (an `OSError` subclass) from `_FileSource` via `read_to_array`; assert that concrete class with a `match=` against the file basename so an unrelated read-path bug surfaces as a real test failure.
Summary
read_vrtwarns once per unreadable source and continues, leaving zero-filled holes in integer VRTs that downstream code cannot tell from real data.VRTDataset.holesrecords{source, band, dst_rect, error}per skipped source. Publicread_vrtcopies a non-empty list toattrs['vrt_holes']on the returned DataArray.attrs['vrt_holes']andXRSPATIAL_GEOTIFF_STRICT=1so the recovery path is discoverable from a single captured warning.Closes #1734.
Test plan
test_vrt_holes_attr_1734.pycovers: missing-source records on the attr, successful reads omit the key entirely, strict mode still raises, warning message names the attr or the env var.