Skip to content

Name the contested case in extension naming and pin the current contract#480

Merged
xroche merged 2 commits into
masterfrom
naming-contract
Jul 4, 2026
Merged

Name the contested case in extension naming and pin the current contract#480
xroche merged 2 commits into
masterfrom
naming-contract

Conversation

@xroche

@xroche xroche commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Refactor plus test coverage for the extension-naming decision, no behavior change. The wire-type-vs-extension choice in htsname.c was a single opaque function with early returns; it becomes an explicit three-way verdict (wire_ext_verdict: extension kept / wire wins / contested), naming the case where a specific declared type disagrees with a specific URL extension. A contested verdict trusts the wire, as today.

The naming contract was untested and largely implicit; it is now pinned so any future change has to flip a test on purpose. -#test=savename gains body= (leading body bytes) and cached= (a real one-entry cache, reopened read-only, whose stored body is deliberately PNG magic), with rows asserting that naming depends only on headers, never on body content or on the previously recorded save name. New e2e fixtures (wrongtype.jpg served as image/png, a gzip variant, a 16 KiB body, content that changes between crawls) pin the wire-wins outcome across fresh and update passes.

xroche and others added 2 commits July 4, 2026 08:58
Behavior-preserving refactor of wire_patches_ext: the decision becomes
a three-way wire_ext_verdict (ext kept / wire wins / contested), with
the contested case, a specific declared type disagreeing with a
specific URL extension, named explicitly instead of falling through.
Today a contested verdict trusts the wire, unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
…aming

-#test=savename gains body= (leading body bytes via a temp url_sav
file) and cached= (a real one-entry cache, reopened read-only, whose
stored body is PNG magic); new rows and 01_zlib-savename-cached.test
pin that naming never depends on content or on the previously recorded
save name, only on headers. e2e fixtures (wrongtype.jpg served as
image/png, a gzip variant, a 16 KiB body, content that changes between
crawls) pin the wire-wins outcome across fresh and update passes. Any
future content-based tie-break must flip these rows explicitly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
@xroche xroche merged commit 11beef5 into master Jul 4, 2026
15 checks passed
xroche added a commit that referenced this pull request Jul 4, 2026
A URL whose extension maps to a specific type but is served with a
disagreeing specific Content-Type was always renamed after the wire
(photo.jpg served as image/png became photo.png). The contested
verdict (#480) is now settled by the leading body bytes: magic proving
the extension's own type keeps it, anything inconclusive trusts the
wire as before, and the #267 soft-404 guard is unchanged.

New htssniff.c covers the magic-sniffable part of the supported MIME
set (images, A/V containers by RIFF subtype and ftyp brand, zip/OLE
document containers, archives, fonts, conservative text prefixes).
hts_wait_delayed waits for a sniffable head (or EOF) only on contested
verdicts; the head is read from the live backing slot (memory,
url_sav, or the compressed-stream tmpfile, inflated in memory). Update
runs never re-read bytes: they reproduce the previous run's verdict
from the recorded X-Save name (cache_read_including_broken grows a
return_save), so names never churn across updates or upgrades.
Non-delayed mode never sniffs; its HEAD probe has no body on the
first run. Also unlock the waiter's slot on the user-cancel abort.

Tests flip the #480 contract pins to the sniffed outcomes (wrongtype/
bigtype/packed/mutant keep their extension, lie.png stays png), add
-#test=sniff table rows, and pin the recorded-verdict proxy in
01_zlib-savename-cached (kept out of the MSan job: uninstrumented
zlib). All discriminate against the pre-sniff binary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
xroche added a commit that referenced this pull request Jul 4, 2026
A URL whose extension maps to a specific type but is served with a
disagreeing specific Content-Type was always renamed after the wire
(photo.jpg served as image/png became photo.png). The contested
verdict (#480) is now settled by the leading body bytes: magic proving
the extension's own type keeps it, anything inconclusive trusts the
wire as before, and the #267 soft-404 guard is unchanged.

New htssniff.c covers the magic-sniffable part of the supported MIME
set (images, A/V containers by RIFF subtype and ftyp brand, zip/OLE
document containers, archives, fonts, conservative text prefixes).
hts_wait_delayed waits for a sniffable head (or EOF) only on contested
verdicts; the head is read from the live backing slot (memory,
url_sav, or the compressed-stream tmpfile, inflated in memory). Update
runs never re-read bytes: they reproduce the previous run's verdict
from the recorded X-Save name (cache_read_including_broken grows a
return_save), so names never churn across updates or upgrades.
Non-delayed mode never sniffs; its HEAD probe has no body on the
first run. Also unlock the waiter's slot on the user-cancel abort.

Tests flip the #480 contract pins to the sniffed outcomes (wrongtype/
bigtype/packed/mutant keep their extension, lie.png stays png), add
-#test=sniff table rows, and pin the recorded-verdict proxy in
01_zlib-savename-cached (kept out of the MSan job: uninstrumented
zlib). All discriminate against the pre-sniff binary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
xroche added a commit that referenced this pull request Jul 4, 2026
A URL whose extension maps to a specific type but is served with a
disagreeing specific Content-Type was always renamed after the wire
(photo.jpg served as image/png became photo.png). The contested
verdict (#480) is now settled by the leading body bytes: magic proving
the extension's own type keeps it, anything inconclusive trusts the
wire as before, and the #267 soft-404 guard is unchanged.

New htssniff.c covers the magic-sniffable part of the supported MIME
set (images, A/V containers by RIFF subtype and ftyp brand, zip/OLE
document containers, archives, fonts, conservative text prefixes).
hts_wait_delayed waits for a sniffable head (or EOF) only on contested
verdicts; the head is read from the live backing slot (memory,
url_sav, or the compressed-stream tmpfile, inflated in memory). Update
runs never re-read bytes: they reproduce the previous run's verdict
from the recorded X-Save name (cache_read_including_broken grows a
return_save), so names never churn across updates or upgrades.
Non-delayed mode never sniffs; its HEAD probe has no body on the
first run. Also unlock the waiter's slot on the user-cancel abort.

Tests flip the #480 contract pins to the sniffed outcomes (wrongtype/
bigtype/packed/mutant keep their extension, lie.png stays png), add
-#test=sniff table rows, and pin the recorded-verdict proxy in
01_zlib-savename-cached (kept out of the MSan job: uninstrumented
zlib). All discriminate against the pre-sniff binary.

Signed-off-by: Xavier Roche <roche@httrack.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant