Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@

[![build](https://github.com/databrickslabs/geobrix/actions/workflows/build_main.yml/badge.svg)](https://github.com/databrickslabs/geobrix/actions/workflows/build_main.yml)
[![codecov](https://codecov.io/gh/databrickslabs/geobrix/branch/main/graph/badge.svg)](https://codecov.io/gh/databrickslabs/geobrix)
[![docs](https://github.com/databrickslabs/geobrix/actions/workflows/doc-tests.yml/badge.svg)](https://github.com/databrickslabs/geobrix/actions/workflows/doc-tests.yml)
[![documentation](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://databrickslabs.github.io/geobrix/)
[![scala](https://img.shields.io/badge/scala-2.13-red.svg)](https://www.scala-lang.org/)
[![python](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/)
Expand Down
1 change: 1 addition & 0 deletions docs/docs/beta-release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Released 2026-05-19. Per-version highlights; full migration tables are in the pe
- **EWKT / EWKB support for `rst_clip`.** `JTS.fromWKT` / `JTS.fromWKB` auto-detect EWKT/EWKB; new `JTS.toEWKT` / `JTS.toEWKB` helpers emit SRID-preserving forms. `rst_clip` reprojects the cutline when its SRID differs from the raster CRS, and falls back to the raster's CRS (Mosaic-compatible) when the SRID is `0` / unresolvable.
- **`rst_transform` rejects invalid SRIDs.** `targetSrid <= 0` and unresolvable EPSG codes now surface a clear error via tile metadata `error_message` instead of returning a raster with an uninitialized CRS.
- **`/vsimem/` path-handling hardening.** `rst_memsize` / `rst_unlink` / GDAL writer in-memory byte fetch now use `startsWith("/vsimem/")` (not `contains`) and null-check `GetMemFileBuffer`, so datasets whose description embeds the substring (e.g. NetCDF subdataset selectors) aren't mis-routed through the in-memory branch.
- **`tile.raster` bytes are always self-contained (no VRT payloads).** Three RasterX operations — `MergeRasters` (`gbx_rst_merge`, `gbx_rst_merge_agg`), `MergeBands` (`gbx_rst_frombands`), and `PixelCombineRasters` (`gbx_rst_derivedband`, `gbx_rst_derivedband_agg`, `gbx_rst_combineavg`, `gbx_rst_combineavg_agg`) — used to return tiles whose `metadata("driver")` claimed `VRT` even though the on-disk file was a materialized GTiff. That mis-tag propagated through `RasterDriver.writeToBytes` (which keys both the tempfile extension AND the `-of` flag in the inner `gdal_translate` call off `metadata.driver`), causing the serialized `tile.raster` payload to be VRT XML referencing a `/vsimem/` tempfile only reachable on the producing executor. Single-node testing passed by accident; multi-executor clusters hit `file not found` when the VRT was opened elsewhere. Fix: `GDALTranslate.executeTranslate` now records the **output** dataset's driver in its returned metadata (not the input's), and `RasterDriver.writeToBytes` defensively coerces VRT to GTiff on serialization + sniffs the result to refuse shipping VRT bytes. Regression coverage in [`RST_NoVrtPayloadTest`](https://github.com/databrickslabs/geobrix/blob/main/src/test/scala/com/databricks/labs/gbx/rasterx/expressions/RST_NoVrtPayloadTest.scala).
- **Scalar args without `f.lit(...)`.** Python wrappers auto-wrap `bool` / `int` / `float` / `bytes`; Scala adds typed overloads. SQL was already natively-typed. String literals still wrap in `f.lit(...)` per pyspark's column-ref convention. Details and migration examples in [Scalar values vs `lit(...)` wrapping](#scalar-values-vs-lit-wrapping).
- **Example notebooks — EO Series, xView, and enablement diagrams.** New end-to-end walkthroughs under `docs/examples/` covering EO time-series, xView object-detection rasters, and RasterX architecture diagrams.
- **Supply-chain hardening (lockdown).** Jobs pinned to the Databricks-hardened runner group (org-level allowlist, ephemeral VMs, constrained secret access); every Maven dependency, transitive dep, plugin, and plugin dependency is PGP-verified against `.maven-keys.list` before any compile or test execution; pip and Maven routed through JFrog with OIDC; init script + pinned package versions vetted; new [Security](./security.mdx) page in the docs.
Expand Down
6 changes: 6 additions & 0 deletions docs/docs/packages/rasterx.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,12 @@ Functions to access raster properties and metadata:
- `gbx_rst_merge_agg` - Merge rasters with aggregation
- `gbx_rst_derivedband_agg` - Derived band aggregate

## Tile payload

Every RasterX function returns a tile whose `raster` field is a **self-contained, in-memory raster** (GTiff by default) — safe to serialize between Spark stages and executors, persist to Delta, hand off to `rasterio` / `gdal`, or write back out via the `gdal` writer. The bytes are never an XML reference to a per-executor `/vsimem/` tempfile or to a path that only exists on the producing node.

Functions that internally build via an intermediate VRT — `gbx_rst_merge`, `gbx_rst_merge_agg`, `gbx_rst_frombands`, `gbx_rst_combineavg`, `gbx_rst_combineavg_agg`, `gbx_rst_derivedband`, `gbx_rst_derivedband_agg` — materialize the result to GTiff before returning, so downstream stages on different executors see real raster bytes. Inspect a tile's payload format from `tile.metadata.driver`; for any of the functions above, it will read `GTiff` (not `VRT`). See [Beta Release Notes](../beta-release-notes#whats-new-in-v030) for the v0.3.0 correctness fix that introduced this invariant.

## Usage Examples

### Python/PySpark
Expand Down
26 changes: 26 additions & 0 deletions notebooks/examples/eo-series/01. Search STACs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,32 @@
"### Imports + Config"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"application/vnd.databricks.v1+cell": {
"cellMetadata": {
"byteLimit": 2048000,
"rowLimit": 10000
},
"inputWidgets": {},
"nuid": "03fc478a-2feb-457f-a21d-4327994d6304",
"showTitle": false,
"tableResultSettingsMap": {},
"title": ""
}
},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings(\n",
" \"ignore\",\n",
" message=r\".*make_tokens_by_line.*received a list of lines.*\",\n",
" category=UserWarning,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": 0,
Expand Down
36 changes: 31 additions & 5 deletions notebooks/examples/eo-series/02. Download STACs.ipynb

Large diffs are not rendered by default.

257 changes: 148 additions & 109 deletions notebooks/examples/eo-series/03. Gridded EO Data.ipynb

Large diffs are not rendered by default.

Loading
Loading