Skip to content

Detect OSV degenerate tag matches via canary query#50

Open
andrew wants to merge 1 commit intomainfrom
osv-canary-detection
Open

Detect OSV degenerate tag matches via canary query#50
andrew wants to merge 1 commit intomainfrom
osv-canary-detection

Conversation

@andrew
Copy link
Copy Markdown
Collaborator

@andrew andrew commented Apr 13, 2026

Investigating #41 turned up a permanent OSV quirk: normalize_tag collapses any string with no decimal digit to "", which then fuzzy-matches any digit-free tag sitting in a CVE's affected-versions list. The icu repo has tags like last-cvs-commit and latest, so querying for "deadbeef" returns seven 2017-era CVEs while "deadbeef0" returns zero. Filed upstream as google/osv.dev#5230.

This PR sends a digit-free canary version alongside each real query in the batch. If real and canary return identical non-empty result sets, OSV did not meaningfully resolve the real tag — skip that formula with a stderr warning instead of reporting whatever the degenerate match produced. Cost is doubling the query count inside the same HTTP request; batch slicing now cuts at 500 packages instead of 1000.

Also adds regression tests for the local affects_version? filter using real-shaped icu CVE records (CVE-2016-6293, OSV-2023-1328), batch index alignment, and slicing at the 500-package boundary.

The reporter's specific #41 case appears to have been transient bad data from google/osv.dev#5103 — every flagged CVE was re-indexed in the days after that PR landed. The canary doesn't defend against bad upstream affected.versions content, but the local-filter regression tests pin what we can verify.

OSV's normalize_tag collapses any string with no decimal digit to "",
which then fuzzy-matches any digit-free tag in a CVE's affected list
(icu has tags like "last-cvs-commit" and "latest"). Send a digit-free
canary alongside each real query; if both return identical non-empty
sets, OSV did not meaningfully resolve the real tag. Skip and warn.

Also adds regression tests pinning affects_version? against real-shaped
icu CVE records, and verifies batch slicing at the 500-package boundary.

See #41 and google/osv.dev#5230.
Copilot AI review requested due to automatic review settings April 13, 2026 15:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a defensive “canary” query to OSV batch requests to detect OSV’s degenerate digit-free tag normalization behavior (issue #41 / google/osv.dev#5230) and avoids reporting misleading fallback matches, alongside new regression tests to pin local version-filter behavior and batch alignment.

Changes:

  • Interleave a digit-free canary version query with each real OSV batch query and skip results when OSV appears to have fallen back.
  • Reduce effective batch slicing to 500 packages per request (to keep within OSV’s 1000-query batch limit after doubling).
  • Add regression tests for affects_version? behavior on real-shaped ICU OSV records and for batch interleaving/slicing/index alignment.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
lib/brew/vulns/osv_client.rb Interleaves canary queries into query_batch and applies skip logic when real+canary results match.
test/brew/test_osv_client.rb Adds tests for canary interleaving, skip behavior, set comparison, and 500-package slicing/index alignment.
test/brew/test_vulnerability.rb Adds regression tests for affects_version? using ICU-shaped OSV/CVE records and separator normalization guard.
test/brew/test_cli.rb Adds end-to-end regression tests covering local filtering and batch request payload/index alignment invariants.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +70
real_ids = (real["vulns"] || []).map { |v| v["id"] }.sort
canary_ids = (canary["vulns"] || []).map { |v| v["id"] }.sort

if !canary_ids.empty? && real_ids == canary_ids
warn "Warning: OSV could not resolve #{pkg[:name] || pkg[:repo_url]} " \
"tag #{pkg[:version].inspect}; skipping (results matched " \
"default-branch fallback)"
results[global_idx] = []
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The canary-skip heuristic currently triggers for any version when the real and canary vuln ID sets match. That can produce false negatives for normal digit-bearing tags (e.g., if a vuln truly affects all versions, or if the tag legitimately has the same vuln set as default-branch/fallback), since it will be skipped even though OSV may have resolved the tag correctly. Consider gating the canary comparison (and ideally the extra canary query) to only run when the real version string contains no decimal digits, which is the specific upstream degeneracy described in the comment above CANARY_VERSION.

Copilot uses AI. Check for mistakes.
Comment on lines 55 to +62
response = post("/querybatch", { queries: queries })
batch_results = response["results"] || []

batch_results.each_with_index do |result, idx|
global_idx = batch_idx * BATCH_SIZE + idx
results[global_idx] = result["vulns"] || []
batch.each_with_index do |pkg, idx|
global_idx = batch_idx * slice_size + idx
real = batch_results[idx * 2] || {}
canary = batch_results[idx * 2 + 1] || {}

Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_batch assumes OSV returns a results array aligned 1:1 with the submitted queries. If the API returns fewer/more entries (or nils), the idx*2 indexing will silently drop or misattribute vulnerabilities. Consider validating that response["results"] is an Array of the expected length (queries.length) and raising ApiError (or at least warning) when it isn’t, to avoid producing incorrect scans.

Copilot uses AI. Check for mistakes.
Comment on lines +111 to +116
packages = [{ repo_url: "https://github.com/a/a", version: "v9.9.9", name: "a" }]

err = capture_stderr { @results = @client.query_batch(packages) }

assert_equal [[]], @results
assert_match(/OSV could not resolve a tag "v9.9.9"/, err)
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses a digit-bearing version ("v9.9.9"), but the upstream OSV degeneracy described for the canary is specifically about digit-free tags collapsing to "". Using a digit-free version here (and/or asserting that the canary comparison is only applied to digit-free versions) would better match the intended failure mode and avoid baking overly-broad skipping behavior into the test suite.

Suggested change
packages = [{ repo_url: "https://github.com/a/a", version: "v9.9.9", name: "a" }]
err = capture_stderr { @results = @client.query_batch(packages) }
assert_equal [[]], @results
assert_match(/OSV could not resolve a tag "v9.9.9"/, err)
packages = [{ repo_url: "https://github.com/a/a", version: "v", name: "a" }]
err = capture_stderr { @results = @client.query_batch(packages) }
assert_equal [[]], @results
assert_match(/OSV could not resolve a tag "v"/, err)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants