Wire JSON ingestion schema extension modules by jwils · Pull Request #1215 · block/elasticgraph

jwils · 2026-05-27T15:09:34Z

Why

Introduce the JSON ingestion schema-definition extension modules after the indexing extension points exist, but before making JSON ingestion the default implementation.

What

Add ElasticGraph::JSONIngestion::SchemaDefinition::APIExtension and supporting factory/results/artifact/state/schema-element extension modules
Wire JSON ingestion through the factory so core indexing field references and field types are extended in place
Keep the existing core JSON Schema behavior active in this intermediate layer
Add doctest support for JSON ingestion schema-definition examples without making the extension the default yet

Risk Assessment

Medium - this adds new extension code, but the default behavior remains the existing core JSON Schema implementation in this PR.

References

Stacked on Add JSON ingestion indexing extensions #1204.
bundle exec rspec elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_field_metadata_spec.rb elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/indexing/json_schema_with_metadata_spec.rb elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/factory_spec.rb passed.
script/type_check passed.
script/lint passed.

Stack

Current PR is marked with ->.

myronmarston

I haven't finished reviewing but wanted to submit my feedback so far.

myronmarston

Still not done reviewing but here's my next round of feedback.

myronmarston

Next set of feedback (still not done reviewing!).

myronmarston · 2026-06-01T03:17:38Z

+          #   end
+          def json_schema(**options)
+            super
+            self.runtime_metadata = runtime_metadata.with(grouping_missing_value_placeholder: inferred_grouping_missing_value_placeholder) unless grouping_missing_value_placeholder_overridden


This line is super long.

...but I also I think we can implement this functionality more simply if we follow the old pattern we had:

elasticgraph/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/scalar_type.rb

Lines 90 to 92 in 203f849

if (placeholder = inferred_grouping_missing_value_placeholder)

self.runtime_metadata = runtime_metadata.with(grouping_missing_value_placeholder: placeholder)

end

Previously it happened in initialize after the scalar type has been yielded. Can we do a similar thing from new_scalar_type in factory_extension.rb?

In the block, extend the ScalarType with this module

Then yield

Then call something on this module to do the "post-yield" processing:

Validate that the json schema got set

Update grouping_missing_value_placeholder

With that approach you wouldn't need to override json_schema. Thoughts?

myronmarston · 2026-06-01T03:20:50Z

+          super(name) do |type|
+            extended_type = type.extend(SchemaElements::ScalarTypeExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::ScalarType & SchemaElements::ScalarTypeExtension
+            yield extended_type if block_given?
+            extended_type.validate_json_schema_configuration! unless state.initially_registered_built_in_types.empty?


The unless state.initially_registered_built_in_types.empty? seems suspect--previously, ScalarType unconditionally validated the json schema after yielding:

elasticgraph/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/schema_elements/scalar_type.rb

Lines 79 to 88 in 203f849

yield self

missing = [

("`mapping`" if mapping_options.empty?),

("`json_schema`" if json_schema_options.empty?)

].compact

if missing.any?

raise Errors::SchemaError, "Scalar types require `mapping` and `json_schema` to be configured, but `#{name}` lacks #{missing.join(" and ")}."

end

Also, when elasticgraph-json_ingestion is used, it's important that every built-in scalar type has its JSON schema configured.

Can we do away with the unless state.initially_registered_built_in_types.empty? check?

Edit: I think I'm realizing why you did it this way--the JSON schema for the built in types gets configured later via the on_built_in_types hook which runs later, after all built-in types get defined. It could lead to subtle differences in behavior: previously, logic executed as part of evaluating the user-defined schema definition could query the json_schema of the built-in scalar types and do computation based on it. Now the json_schema_options won't be set on built-in scalar types while the schema definition is evaluated. Subtle changes in behavior could result.

An alternative to consider: instead of configuring the json_schema of each scalar type via the on_built_in_types hook, configure it here:

def new_scalar_type(name) super(name) do |type| extended_type = type.extend(SchemaElements::ScalarTypeExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::ScalarType & SchemaElements::ScalarTypeExtension # if `name` is one of the built in types, configure `extended_type.json_schema` here, before yielding yield extended_type if block_given? extended_type.validate_json_schema_configuration! end end

Then the validation can be unconditional, and the JSON schema is configured on the built-in types when they are first created like has always been the case.

myronmarston · 2026-06-01T03:35:18Z

I think there's a better way to implement this logic. Extension modules are a powerful technique but should ideally only be used when needed. They have some downsides (e.g. modifying the ancestor chain of existing objects you don't own, potential conflicts with multiple extension modules applied on the same object which define conflicting methods with the same names, etc).

Generally speaking, I only reach for an extension module when I need to do one of these things:

Offer an additional API to users as part of an existing object. Example: offering t.json_schema inside a schema.scalar_type block.

Modify the behavior of existing call paths by overriding existing methods. Example: defining IndexExtension#rollover to hook into what happens when rollover is called.

The logic here doesn't fall into either category. It's just internal logic that previously existed on TypeReference for reasons of convenience. There's no reason it still needs to exist on TypeReference, though, particularly since TypeReferene isn't part of the EG public API.

Really, we just need a spot for the json_schema_layers logic to live. I believe the computation of json_schema_layers is only needed from FieldExtension#to_indexing_field_reference. Instead of needing a TypeReferenceExtension, we could move this into a JSONSchemaLayers object, e.g. JSONSchemaLayers.for(type) or something.

Thoughts?

myronmarston · 2026-06-01T03:51:19Z

+
+        # Returns the API's `state` narrowed to include this gem's `StateExtension`. Centralizes
+        # the Steep cast that's needed because Steep can't see the `extend(StateExtension)` applied
+        # at runtime in `extended`.


myronmarston · 2026-06-01T03:52:55Z

+        # @param version [Integer] current version number of the JSON schema artifact
+        # @return [void]
+        # @see #enforce_json_schema_version
+        def json_schema_version(version)


The YARD docs dropped an example that used to be there:

elasticgraph/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/api.rb

Lines 469 to 472 in 203f849

# @example Set the JSON schema version to 1

# ElasticGraph.define_schema do |schema|

# schema.json_schema_version 1

# end

Can you bring that back?

myronmarston · 2026-06-01T03:54:56Z

+        #   accidentally provides it as `parent_id`, ElasticGraph would happily ignore the `parent_id` field entirely, because `parentId`
+        #   is allowed to be omitted and `parent_id` would be treated as an extra field. Therefore, we recommend that you only set one of
+        #   these to `true` (or none).
+        def json_schema_strictness(allow_omitted_fields: false, allow_extra_fields: true)


The YARD docs dropped an example that used to be here:

elasticgraph/elasticgraph-schema_definition/lib/elastic_graph/schema_definition/api.rb

Lines 501 to 504 in 203f849

# @example Allow omitted fields and disallow extra fields

# ElasticGraph.define_schema do |schema|

# schema.json_schema_strictness allow_omitted_fields: true, allow_extra_fields: false

# end

Can you bring it back?

myronmarston · 2026-06-01T03:57:02Z

+        end
+
+        # @private
+        def new_enum_indexing_field_type(...)


Suggested change

def new_enum_indexing_field_type(...)

def new_enum_indexing_field_type(enum_value_names)

Using ... is nice when you have a larger list of arguments, particularly if that list may grow over time...but here it's just one argument and it obscures what's going on.

On new_object_indexing_field_type below I think ... is fine because there's a long list of args.

myronmarston · 2026-06-01T04:04:18Z

+        def new_field(**kwargs, &block)
+          super(**kwargs) do |field|
+            extended_field = field.extend(SchemaElements::FieldExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::Field & SchemaElements::FieldExtension
+            block&.call(extended_field)
+          end
+        end


Suggested change

def new_field(**kwargs, &block)

super(**kwargs) do |field|

extended_field = field.extend(SchemaElements::FieldExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::Field & SchemaElements::FieldExtension

block&.call(extended_field)

end

end

def new_field(**kwargs)

super(**kwargs) do |field|

extended_field = field.extend(SchemaElements::FieldExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::Field & SchemaElements::FieldExtension

yield extended_field if block_given?

end

end

IIRC, there's a bit of extra overhead inherent in the &block syntax as it forces Ruby to allocate a block object, which isn't required for yield/block_given?. My rule of thumb is to use &block when I'm just passing it through to another method, like we do here:

elasticgraph/elasticgraph-graphql/lib/elastic_graph/graphql/filtering/filter_interpreter.rb

Lines 345 to 347 in 203f849

def build_bool_hash(&block)

bool_node = Hash.new { |h, k| h[k] = [] } # : stringOrSymbolHash

bool_node.tap(&block)

...but to use yield instead of block.call and yield if block_given? instead of block&.call(...).

Can you also apply this below?

myronmarston · 2026-06-01T04:05:21Z

+        end
+
+        # @private
+        def new_scalar_indexing_field_type(...)


Suggested change

def new_scalar_indexing_field_type(...)

def new_scalar_indexing_field_type(scalar_type:)

myronmarston · 2026-06-01T04:05:59Z

+        end
+
+        # @private
+        def new_union_indexing_field_type(...)


Suggested change

def new_union_indexing_field_type(...)

def new_union_indexing_field_type(subtypes_by_name)

jwils mentioned this pull request May 27, 2026

Use JSON ingestion as schema definition extension #1205

Open

jwils force-pushed the joshuaw/json-ingestion-api-polish branch from 3436470 to ea857eb Compare May 27, 2026 16:22

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch from b80bef4 to bce83e3 Compare May 27, 2026 16:22

jwils force-pushed the joshuaw/json-ingestion-api-polish branch from ea857eb to 7af1788 Compare May 27, 2026 18:43

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch from bce83e3 to 38d7488 Compare May 27, 2026 18:43

myronmarston mentioned this pull request May 28, 2026

Add JSON ingestion indexing extensions #1204

Open

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch from 38d7488 to 65c8468 Compare May 28, 2026 18:35

jwils force-pushed the joshuaw/json-ingestion-api-polish branch from 7af1788 to 20ee5dd Compare May 28, 2026 18:35

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch from 65c8468 to d780944 Compare May 28, 2026 18:44

jwils force-pushed the joshuaw/json-ingestion-api-polish branch 2 times, most recently from 5b67eda to 28d3b58 Compare May 28, 2026 19:01

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch 2 times, most recently from 08fa741 to 66a50ff Compare May 28, 2026 19:13

jwils force-pushed the joshuaw/json-ingestion-api-polish branch 2 times, most recently from 4f85649 to fd4651f Compare May 30, 2026 14:07

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch 2 times, most recently from e904b45 to 7cd0f7d Compare May 30, 2026 14:26

jwils force-pushed the joshuaw/json-ingestion-api-polish branch 3 times, most recently from 2e3770f to 9ed0d47 Compare May 30, 2026 14:38

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch from 3d879aa to 01d66b2 Compare May 30, 2026 20:03

jwils force-pushed the joshuaw/json-ingestion-api-polish branch from 9ed0d47 to a02256c Compare May 30, 2026 20:03

jwils force-pushed the joshuaw/json-ingestion-extension-modules branch from 01d66b2 to 4d549ec Compare May 30, 2026 20:17

jwils force-pushed the joshuaw/json-ingestion-api-polish branch from a02256c to 180c7d3 Compare May 30, 2026 20:17

jwils marked this pull request as ready for review May 31, 2026 03:58

jwils requested review from BrianSigafoos-SQ, ayousufi, bsorbo, jwondrusch and myronmarston as code owners May 31, 2026 03:58

jwils force-pushed the joshuaw/json-ingestion-api-polish branch from 180c7d3 to cd6256d Compare May 31, 2026 04:17

myronmarston requested changes May 31, 2026

View reviewed changes

jwils force-pushed the joshuaw/json-ingestion-api-polish branch 2 times, most recently from 55175c5 to 16255f2 Compare May 31, 2026 13:28

myronmarston requested changes May 31, 2026

View reviewed changes

jwils force-pushed the joshuaw/json-ingestion-api-polish branch 3 times, most recently from 738964b to 57c341a Compare May 31, 2026 22:05

myronmarston reviewed Jun 1, 2026

View reviewed changes

jwils force-pushed the joshuaw/json-ingestion-api-polish branch 3 times, most recently from a1e2bb3 to e28ff9c Compare June 1, 2026 18:39

Wire JSON ingestion schema extension modules

83f6c2a

jwils force-pushed the joshuaw/json-ingestion-api-polish branch from e28ff9c to 83f6c2a Compare June 1, 2026 18:57

	if (placeholder = inferred_grouping_missing_value_placeholder)
	self.runtime_metadata = runtime_metadata.with(grouping_missing_value_placeholder: placeholder)
	end

	yield self

	missing = [
	("`mapping`" if mapping_options.empty?),
	("`json_schema`" if json_schema_options.empty?)
	].compact

	if missing.any?
	raise Errors::SchemaError, "Scalar types require `mapping` and `json_schema` to be configured, but `#{name}` lacks #{missing.join(" and ")}."
	end

	# @example Set the JSON schema version to 1
	# ElasticGraph.define_schema do \|schema\|
	# schema.json_schema_version 1
	# end

	# @example Allow omitted fields and disallow extra fields
	# ElasticGraph.define_schema do \|schema\|
	# schema.json_schema_strictness allow_omitted_fields: true, allow_extra_fields: false
	# end

	def new_enum_indexing_field_type(...)
	def new_enum_indexing_field_type(enum_value_names)

	def build_bool_hash(&block)
	bool_node = Hash.new { \|h, k\| h[k] = [] } # : stringOrSymbolHash
	bool_node.tap(&block)

	def new_scalar_indexing_field_type(...)
	def new_scalar_indexing_field_type(scalar_type:)

	def new_union_indexing_field_type(...)
	def new_union_indexing_field_type(subtypes_by_name)

Conversation

jwils commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Risk Assessment

References

Stack

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

myronmarston left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jwils commented May 27, 2026 •

edited

Loading