Skip to content

crowdstrike: remove FDR ingest-time cache processor in favor of LOOKUP JOIN#19434

Draft
kcreddy wants to merge 3 commits into
elastic:mainfrom
kcreddy:crowdstrike-remove-cache
Draft

crowdstrike: remove FDR ingest-time cache processor in favor of LOOKUP JOIN#19434
kcreddy wants to merge 3 commits into
elastic:mainfrom
kcreddy:crowdstrike-remove-cache

Conversation

@kcreddy

@kcreddy kcreddy commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Proposed commit message

crowdstrike: remove FDR ingest-time cache processor in favor of LOOKUP JOIN

Remove the Elastic Agent file-backed cache processor that enriched
FDR data events with host (aidmaster) and user (userinfo) metadata
at ingest time. Query-time enrichment via continuous transforms and
ES|QL LOOKUP JOIN, added in #17877 and #18694, replaces this
mechanism.

The cache approach had operational problems: agent-local scope meant
metadata was not shared across agents; ordering dependency required
metadata files to be processed before data files within an SQS batch;
and the default setting (keep_metadata: false) dropped aidmaster and
userinfo events, starving the LOOKUP JOIN transforms of source data.

Agent stream templates (aws-s3.yml.hbs, stream.yml.hbs):
- Remove the entire {{#if enrich_metadata}} cache block including
  decode_json_fields, cache put/get, conditional drop_event, and
  drop_fields processors.
- Remove metadata file sorting (files.sort) from the FDR queue
  branch of the SQS notification parsing script. The script itself
  is retained because the aws-s3 input bypasses native SQS format
  autodetection when a custom script is defined, and the FDR queue
  format is not natively supported.
- Update inline test() to remove sort-order assertions.

FDR data stream manifest (manifest.yml):
- Remove five aws-s3 variables: enrich_metadata, keep_metadata,
  metadata_ttl, metadata_cache_capacity, metadata_cache_write_interval.
- Remove three logfile variables: enrich_host_metadata,
  keep_metadata, metadata_ttl.

Ingest pipeline (default.yml):
- Remove the metadata.host.aid / metadata.user.UserSid_readable
  remove processor and the metadata to crowdstrike.info rename.
- Remove all processors that read from crowdstrike.info.host.* and
  crowdstrike.info.user.* (ComputerName to host.name fallback,
  aip to host.ip, UserName/User to user.name/user.domain fallbacks,
  and related.user/related.hosts appends from cached fields). These
  were dead code on data events since crowdstrike.info is no longer
  populated by the agent.

System tests:
- test-default-config.yml: remove skip_transform_validation (transforms
  now have source data) and increase hit_count from 127 to 133
  (aidmaster/userinfo events are now indexed).
- Delete test-keep-metadata-config.yml (redundant without
  keep_metadata variable).

Documentation (docs/README.md, _dev/build/docs/README.md):
- Remove "Ingest-time versus query-time" sections and references to
  removed configuration variables.
- Remove ES 8.19 pre-release workaround notes for LOOKUP JOIN alias
  resolution (the package already requires kibana ^8.19.0 || ^9.0.0).

Package version bumped to 4.0.0 (breaking change).

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.
  • I have verified that any added dashboard complies with Kibana's Dashboard good practices

How to test this PR locally

Default system tests pass after removing skip_transform_validation: true

--- Test results for package: crowdstrike - START ---
╭─────────────┬─────────────┬───────────┬───────────┬────────┬─────────────────╮
│ PACKAGE     │ DATA STREAM │ TEST TYPE │ TEST NAME │ RESULT │    TIME ELAPSED │
├─────────────┼─────────────┼───────────┼───────────┼────────┼─────────────────┤
│ crowdstrike │ fdr         │ system    │ default   │ PASS   │ 2m47.980883041s │
╰─────────────┴─────────────┴───────────┴───────────┴────────┴─────────────────╯
--- Test results for package: crowdstrike - END   ---
Done

@kcreddy kcreddy self-assigned this Jun 8, 2026
@kcreddy kcreddy added breaking change Integration:crowdstrike CrowdStrike Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations] labels Jun 8, 2026
@kcreddy

kcreddy commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/test

1 similar comment
@kcreddy

kcreddy commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

/test

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

✅ Elastic Docs Style Checker (Vale)

No issues found on modified lines!


The Vale linter checks documentation changes against the Elastic Docs style guide. To use Vale locally or report issues, refer to Elastic style guide for Vale.

@elastic-vault-github-plugin-prod

Copy link
Copy Markdown

✅ All changelog entries have the correct PR link.

@elasticmachine

Copy link
Copy Markdown

💚 Build Succeeded

History

cc @kcreddy

@kcreddy kcreddy marked this pull request as ready for review June 8, 2026 09:48
@kcreddy kcreddy requested review from a team as code owners June 8, 2026 09:48
@infra-vault-gh-plugin-prod

Copy link
Copy Markdown

Pinging @elastic/security-service-integrations (Team:Security-Service Integrations)

@kcreddy kcreddy requested a review from efd6 June 8, 2026 09:48
@andrewkroh andrewkroh added the documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. label Jun 8, 2026

@efd6 efd6 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to understand why this is being removed rather than being noted as deprecated first.

@kcreddy kcreddy marked this pull request as draft June 9, 2026 08:05
@kcreddy

kcreddy commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

I'd like to understand why this is being removed rather than being noted as deprecated first.

Good point. I went back and re-read both RFCs — they explicitly state the query-time path is additive and that deprecation (not removal) is the next step. I've moved this PR back to draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change documentation Improvements or additions to documentation. Applied to PRs that modify *.md files. Integration:crowdstrike CrowdStrike Team:Security-Service Integrations Security Service Integrations team [elastic/security-service-integrations]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants