Removes unnecessary pixelpipe cache Invalidation after in-place color space conversion#21251
Open
masterpiga wants to merge 1 commit into
Open
Removes unnecessary pixelpipe cache Invalidation after in-place color space conversion#21251masterpiga wants to merge 1 commit into
masterpiga wants to merge 1 commit into
Conversation
…space conversion.
Member
|
To me this is not a "fix" so not for 5.6. I do prefer some speed penalty than some unwanted behavior on the pixelpipe. |
Collaborator
|
The uncommon |
Collaborator
Author
|
Let's wait until after the release, then. With this and #21243 I get some nice responsiveness improvements. |
Collaborator
|
See #21274 ... also explaining why we had these invalidations. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I went down a bit of a rabbit hole trying to improve the responsiveness of interactive edits, and I stumbled upon these cache invalidations. The analysis is from Claude, and it seems correct to me. @jenshannoschwalm, you certainly know better. Would you please take a look?
Pixelpipe Cache Invalidation After In-Place Colorspace Conversion
Background
Before a module processes its input, darktable checks whether the buffer's current
colorspace (
cst_from = input_format->cst) matches the module's required inputcolorspace (
cst_to = module->input_colorspace(...)). When they differ, the bufferis converted in-place — the same allocation serves as both source and destination —
to avoid a second large allocation and a full buffer copy.
Two commits by Hanno introduced cache invalidation after this conversion:
ba97902e6f_pixelpipe_process_on_CPU()9fbdc4b51e_dev_pixelpipe_process_rec()Both added
dt_dev_pixelpipe_invalidate_cacheline(pipe, input)immediately after thein-place transform, with the comment:
Both also proposed the correct fix: "implement code for the pipe cache that modifies
the data cst."
The Three Processing Paths
The pixelpipe has three distinct code paths for module execution. The cache
invalidation bug only affected two of them.
Path 1 — Pure OpenCL (no bug, never touched)
When a module runs entirely on the GPU, the colorspace conversion is performed on a
GPU buffer (
cl_mem_input), never on the CPU cache buffer. The codebase tracks theevolving GPU-side colorspace in a local shadow variable
input_cst_cl,initialised from
input_format->cstat the start of the OpenCL block and thenupdated independently:
The CPU cache is synchronised only at explicit copy-back points, when the GPU buffer
is downloaded to host memory:
At those points the CPU buffer genuinely holds the converted data and the cache
descriptor is written to match. If the GPU input is never copied back (the common
case when
important_cl_inputis false), the CPU cache retains its pre-conversiondata in
cst_fromwithdsc.cst = cst_from— also self-consistent; a future cachehit for the same hash will trigger a fresh in-place conversion on the next pass, which
is the correct behaviour.
No invalidation was ever added to this path, and none is needed. The FIXME comments
themselves stated this correctly:
Path 2 — CPU processing (bug fixed in
ba97902e6f)_pixelpipe_process_on_CPU()handles both direct CPU processing and CPU tiling.The in-place conversion mutates the cached CPU buffer and writes the resulting
colorspace back through
&input_format->cst. Commitba97902e6fadded thenow-removed invalidation here.
Path 3 — OpenCL-tiling fallback (bug fixed in
9fbdc4b51e)When
module->process_tiling_clis attempted but fails, the code falls back to CPUtiling. Before handing control to the CPU tiler it performs an in-place colorspace
conversion on the CPU buffer and writes back through
&input_format->cst. Commit9fbdc4b51eadded the now-removed invalidation here.The Cache Pointer Mechanism
The claim that "the cacheline cst does not reflect the module input colorspace any
more" rests on a misreading of how
input_formatis wired to the cache.dt_dev_pixelpipe_cache_getis the function that hands out input buffers. In boththe cache-hit path and the cache-miss (new allocation) path it redirects the caller's
pointer to point directly into the live cache descriptor array:
The caller passes
&input_format, so after the call,input_formatis no longer apointer to a local struct — it points directly into
cache->dsc[k]. Every fieldwrite through
input_formatis a direct write into the live cache descriptor.dt_ioppr_transform_image_colorspacetakes&input_format->cstas itsconverted_cstoutput parameter and writes the resulting colorspace back through it:Both inner functions set
*converted_cst = cst_tobefore doing any work. They resetit to
cst_fromonly when the requested conversion is an outright invalid pair(e.g.
IOP_CS_RAW ↔ IOP_CS_RGB), in which case the buffer is left completelyunchanged — a self-consistent no-op. There is no partial-conversion failure mode:
the image loops are pure arithmetic that either complete fully or do not run at all.
Consequence: after
dt_ioppr_transform_image_colorspacereturns,cache->dsc[k].cstalready holds the correct value for the buffer's actual content.The cache descriptor was never stale. Both the data and the metadata were correct
before the invalidation line ran.
This mechanism was already in place when both FIXME commits were authored, confirming
that the invalidations were added under an incorrect assumption.
What
dt_dev_pixelpipe_invalidate_cachelineDoesIt sets the cacheline's hash to a sentinel that can never match any real pipeline
hash. The buffer allocation and the dsc struct (including its now-correct
cstfield) are left intact, but the slot can no longer be found by hash lookup. It
becomes a zombie: occupying a slot in the cache's fixed-size table, counted in
cache->linvalid, and unavailable to any future lookup until LRU eviction eventuallyrecycles it.
Performance Implications of the (Removed) Invalidation
Module N-1 never benefits from caching. On every pipeline pass, module N
performs an in-place conversion on the buffer that contains module N-1's output, then
immediately invalidates it. The next pass finds
DT_INVALID_HASHfor what should bea valid cached result, misses, and forces module N-1 to recompute from scratch. This
repeats indefinitely. Module N-1 gets zero reuse from the cache for the entire
lifetime of the session.
Zombie slot accumulation. Each pass produces a fresh zombie: the old cacheline
is invalidated, module N-1 runs again and is assigned a new cacheline (possibly the
same physical slot after LRU eviction, possibly a different one). The
cache->linvalidcounter grows proportionally to the number of modules that require acolorspace conversion — in a typical pipeline this is several modules. The effective
size of the reusable cache is reduced.
Interaction with the
importantflag. The pipeline marks certain input buffersas important (higher LRU priority) — the focused module's input, the last-changed
module's input, and modules with
IOP_FLAGS_WRITE_PIPECACHE. An important butinvalidated cacheline retains its priority weight but is permanently unreachable. It
both wastes a high-priority slot and fails to serve its purpose.
Cascading recompute cost. If module N-1 is expensive (a denoiser, a complex
tone-mapper), the forced recompute can be the dominant cost of a pipeline pass
triggered only by a downstream parameter change — a change that should have left
module N-1's output entirely untouched.
The Blend Conversion Edge Case
After module N processes its input a second in-place conversion may follow for blend
masking (pixelpipe_hb.c:1577):
This converts the buffer from
cst_to(the module's working space) toblend_cst(the blend operator's working space), again writing back through
&input_format->cst.After this,
cache->dsc[k].cst = blend_cstand the buffer content is inblend_cst.On the next pass (without invalidation), a hash lookup for module N-1's cacheline
hits, returns
blend_cstdata, and module N then observescst_from = blend_cst.This is a different starting point than the first pass, so the analysis splits by
case.
Common case —
blend_cst == cst_from: This is the overwhelmingly typicalconfiguration. Both are
IOP_CS_RGB, the universal default. The blend conversionis
IOP_CS_RGB → IOP_CS_RGB, a no-op. The cached buffer is left incst_to(fromthe module's working-space conversion), not further mutated. On the next pass,
cst_from = cst_to, no conversion is needed, and the buffer is used directly. Fullycorrect and maximally efficient.
Less common case —
blend_cst ≠ cst_from: Considercst_from = IOP_CS_RGB,cst_to = IOP_CS_LAB,blend_cst = IOP_CS_RGB. Pass 1 leaves the cache inIOP_CS_RGB(the blend conversion round-tripped back from LAB to RGB). Pass 2: thecache returns
IOP_CS_RGBdata, module N convertsRGB → LABand then blendsLAB → RGBagain. The buffer seen by module N on pass 2 isLAB(RGB(LAB(original))). With lossless conversions this is identical toLAB(original). With IEEE 754 float32 the error is in the 6th–7th significant digit— sub-pixel, sub-quantisation. Critically, it does not accumulate: after pass 2 the
cache stabilises in
IOP_CS_RGB(post-blend) and every subsequent pass follows thesame
RGB → LAB → RGBcycle, producing numerically identical results.The case
blend_cst ≠ cst_fromandblend_cst ≠ cst_to(three distinctcolorspaces) is theoretically possible but has no known practical instance in the
darktable module set: all blend modes operate in either
IOP_CS_RGBorIOP_CS_LAB,both of which coincide with at least one of the module's endpoint colorspaces in any
real pipeline.
Remaining Legitimate Invalidations
Two other
dt_dev_pixelpipe_invalidate_cacheline(pipe, input)calls remain in thefile and are both correct:
_module_pipe_stop): the pipe is aborting mid-pass; the buffer maybe partially written or in an inconsistent state and must not be reused.
valid_input_on_gpu_only): the input was computed entirely on theGPU and was never downloaded to host memory; the CPU cache slot holds stale data
from a previous pass and must be poisoned.
These were not changed.
Summary
blend_cst == cst_from, common)blend_cst ≠ cst_from, uncommon)The invalidations were introduced under the belief that the cache descriptor was a
stale copy that needed explicit fixup. That belief was incorrect:
input_formatis apointer directly into
cache->dsc[k]in both the hit and miss paths ofdt_dev_pixelpipe_cache_get, so the transform function's write-back via&input_format->cstalready keeps the cache self-consistent. Removing theinvalidations restores correct caching behaviour with no correctness cost.