Wire JSON ingestion schema extension modules#1215
Conversation
3436470 to
ea857eb
Compare
b80bef4 to
bce83e3
Compare
ea857eb to
7af1788
Compare
bce83e3 to
38d7488
Compare
38d7488 to
65c8468
Compare
7af1788 to
20ee5dd
Compare
65c8468 to
d780944
Compare
5b67eda to
28d3b58
Compare
08fa741 to
66a50ff
Compare
4f85649 to
fd4651f
Compare
e904b45 to
7cd0f7d
Compare
2e3770f to
9ed0d47
Compare
3d879aa to
01d66b2
Compare
9ed0d47 to
a02256c
Compare
01d66b2 to
4d549ec
Compare
a02256c to
180c7d3
Compare
180c7d3 to
cd6256d
Compare
myronmarston
left a comment
There was a problem hiding this comment.
I haven't finished reviewing but wanted to submit my feedback so far.
55175c5 to
16255f2
Compare
myronmarston
left a comment
There was a problem hiding this comment.
Still not done reviewing but here's my next round of feedback.
738964b to
57c341a
Compare
myronmarston
left a comment
There was a problem hiding this comment.
Next set of feedback (still not done reviewing!).
| # end | ||
| def json_schema(**options) | ||
| super | ||
| self.runtime_metadata = runtime_metadata.with(grouping_missing_value_placeholder: inferred_grouping_missing_value_placeholder) unless grouping_missing_value_placeholder_overridden |
There was a problem hiding this comment.
This line is super long.
...but I also I think we can implement this functionality more simply if we follow the old pattern we had:
Previously it happened in initialize after the scalar type has been yielded. Can we do a similar thing from new_scalar_type in factory_extension.rb?
- In the block, extend the
ScalarTypewith this module - Then yield
- Then call something on this module to do the "post-yield" processing:
- Validate that the json schema got set
- Update
grouping_missing_value_placeholder
With that approach you wouldn't need to override json_schema. Thoughts?
| super(name) do |type| | ||
| extended_type = type.extend(SchemaElements::ScalarTypeExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::ScalarType & SchemaElements::ScalarTypeExtension | ||
| yield extended_type if block_given? | ||
| extended_type.validate_json_schema_configuration! unless state.initially_registered_built_in_types.empty? |
There was a problem hiding this comment.
The unless state.initially_registered_built_in_types.empty? seems suspect--previously, ScalarType unconditionally validated the json schema after yielding:
Also, when elasticgraph-json_ingestion is used, it's important that every built-in scalar type has its JSON schema configured.
Can we do away with the unless state.initially_registered_built_in_types.empty? check?
Edit: I think I'm realizing why you did it this way--the JSON schema for the built in types gets configured later via the on_built_in_types hook which runs later, after all built-in types get defined. It could lead to subtle differences in behavior: previously, logic executed as part of evaluating the user-defined schema definition could query the json_schema of the built-in scalar types and do computation based on it. Now the json_schema_options won't be set on built-in scalar types while the schema definition is evaluated. Subtle changes in behavior could result.
An alternative to consider: instead of configuring the json_schema of each scalar type via the on_built_in_types hook, configure it here:
def new_scalar_type(name)
super(name) do |type|
extended_type = type.extend(SchemaElements::ScalarTypeExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::ScalarType & SchemaElements::ScalarTypeExtension
# if `name` is one of the built in types, configure `extended_type.json_schema` here, before yielding
yield extended_type if block_given?
extended_type.validate_json_schema_configuration!
end
endThen the validation can be unconditional, and the JSON schema is configured on the built-in types when they are first created like has always been the case.
There was a problem hiding this comment.
I think there's a better way to implement this logic. Extension modules are a powerful technique but should ideally only be used when needed. They have some downsides (e.g. modifying the ancestor chain of existing objects you don't own, potential conflicts with multiple extension modules applied on the same object which define conflicting methods with the same names, etc).
Generally speaking, I only reach for an extension module when I need to do one of these things:
- Offer an additional API to users as part of an existing object. Example: offering
t.json_schemainside aschema.scalar_typeblock. - Modify the behavior of existing call paths by overriding existing methods. Example: defining
IndexExtension#rolloverto hook into what happens whenrolloveris called.
The logic here doesn't fall into either category. It's just internal logic that previously existed on TypeReference for reasons of convenience. There's no reason it still needs to exist on TypeReference, though, particularly since TypeReferene isn't part of the EG public API.
Really, we just need a spot for the json_schema_layers logic to live. I believe the computation of json_schema_layers is only needed from FieldExtension#to_indexing_field_reference. Instead of needing a TypeReferenceExtension, we could move this into a JSONSchemaLayers object, e.g. JSONSchemaLayers.for(type) or something.
Thoughts?
|
|
||
| # Returns the API's `state` narrowed to include this gem's `StateExtension`. Centralizes | ||
| # the Steep cast that's needed because Steep can't see the `extend(StateExtension)` applied | ||
| # at runtime in `extended`. |
| # @param version [Integer] current version number of the JSON schema artifact | ||
| # @return [void] | ||
| # @see #enforce_json_schema_version | ||
| def json_schema_version(version) |
There was a problem hiding this comment.
The YARD docs dropped an example that used to be there:
Can you bring that back?
| # accidentally provides it as `parent_id`, ElasticGraph would happily ignore the `parent_id` field entirely, because `parentId` | ||
| # is allowed to be omitted and `parent_id` would be treated as an extra field. Therefore, we recommend that you only set one of | ||
| # these to `true` (or none). | ||
| def json_schema_strictness(allow_omitted_fields: false, allow_extra_fields: true) |
There was a problem hiding this comment.
The YARD docs dropped an example that used to be here:
Can you bring it back?
| end | ||
|
|
||
| # @private | ||
| def new_enum_indexing_field_type(...) |
There was a problem hiding this comment.
| def new_enum_indexing_field_type(...) | |
| def new_enum_indexing_field_type(enum_value_names) |
Using ... is nice when you have a larger list of arguments, particularly if that list may grow over time...but here it's just one argument and it obscures what's going on.
On new_object_indexing_field_type below I think ... is fine because there's a long list of args.
| def new_field(**kwargs, &block) | ||
| super(**kwargs) do |field| | ||
| extended_field = field.extend(SchemaElements::FieldExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::Field & SchemaElements::FieldExtension | ||
| block&.call(extended_field) | ||
| end | ||
| end |
There was a problem hiding this comment.
| def new_field(**kwargs, &block) | |
| super(**kwargs) do |field| | |
| extended_field = field.extend(SchemaElements::FieldExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::Field & SchemaElements::FieldExtension | |
| block&.call(extended_field) | |
| end | |
| end | |
| def new_field(**kwargs) | |
| super(**kwargs) do |field| | |
| extended_field = field.extend(SchemaElements::FieldExtension) # : ::ElasticGraph::SchemaDefinition::SchemaElements::Field & SchemaElements::FieldExtension | |
| yield extended_field if block_given? | |
| end | |
| end |
IIRC, there's a bit of extra overhead inherent in the &block syntax as it forces Ruby to allocate a block object, which isn't required for yield/block_given?. My rule of thumb is to use &block when I'm just passing it through to another method, like we do here:
...but to use yield instead of block.call and yield if block_given? instead of block&.call(...).
Can you also apply this below?
| end | ||
|
|
||
| # @private | ||
| def new_scalar_indexing_field_type(...) |
There was a problem hiding this comment.
| def new_scalar_indexing_field_type(...) | |
| def new_scalar_indexing_field_type(scalar_type:) |
| end | ||
|
|
||
| # @private | ||
| def new_union_indexing_field_type(...) |
There was a problem hiding this comment.
| def new_union_indexing_field_type(...) | |
| def new_union_indexing_field_type(subtypes_by_name) |
a1e2bb3 to
e28ff9c
Compare
e28ff9c to
83f6c2a
Compare
Why
Introduce the JSON ingestion schema-definition extension modules after the indexing extension points exist, but before making JSON ingestion the default implementation.
What
ElasticGraph::JSONIngestion::SchemaDefinition::APIExtensionand supporting factory/results/artifact/state/schema-element extension modulesRisk Assessment
Medium - this adds new extension code, but the default behavior remains the existing core JSON Schema implementation in this PR.
References
bundle exec rspec elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_spec.rb elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/json_schema_field_metadata_spec.rb elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/indexing/json_schema_with_metadata_spec.rb elasticgraph-schema_definition/spec/unit/elastic_graph/schema_definition/factory_spec.rbpassed.script/type_checkpassed.script/lintpassed.Stack
Current PR is marked with
->.