feat(ingestion): introduce TagRegistry domain layer by IceS2 · Pull Request #27991 · open-metadata/OpenMetadata

IceS2 · 2026-05-08T12:37:42Z

Fixes open-metadata/openmetadata-collate#3999

Describe your changes:

Why

The legacy context.tags list overloads two roles (sink-bound queue + per-entity inheritance lookup) and stores one Pydantic TagLabel per attachment. On a Snowflake schema with ~120k tag attachments this peaks at ~112 MB — disproportionate to the ~20 unique tags actually in play.

What

Adds metadata.domain.tags with two classes:

TagRegistry — per-Source bookkeeping for tag/classification ingestion
TagCanonicalizer — case-corrected name resolution against system-provider entities in OM

Migrates the Snowflake connector to use them. Other DB connectors continue on the legacy context.tags flow (strangler pattern; future PRs migrate per-connector).

How

TagRegistry interns shared TagLabel instances by (classification, tag, labelType, state) — N attachments dereference to the same Pydantic graph.
clear_scope rebinds _labels_by_entity to a fresh dict (not pop-in-place) so the hash-table bucket array releases immediately.

Type of change:

Improvement

High-level design:

Architecture

metadata.domain.tags is a new domain layer (in-memory helpers operating on OM-generated types, no I/O ownership beyond an injected OpenMetadata client for read-only queries). Two classes by separated concerns:

TagRegistry — three storage groups behind two locks. _pending queues sink-bound OMetaTagAndClassification payloads (deduped by _known_tag_fqns). _labels_by_entity maps entity FQN → list of shared TagLabel references. _tag_label_cache interns the underlying TagLabel objects.
TagCanonicalizer — separate concern: case-corrected lookup against system-provider classifications/tags via ES with retry. Failures raise; callers wrap in Either.

DatabaseServiceSource exposes both as cached_property and wires clear_schema_tag_scope / clear_database_tag_scope topology hooks.

Files added

ingestion/src/metadata/domain/__init__.py
ingestion/src/metadata/domain/tags/__init__.py
ingestion/src/metadata/domain/tags/registry.py
ingestion/src/metadata/domain/tags/canonicalizer.py
ingestion/tests/unit/domain/tags/test_registry.py (34 tests)
ingestion/tests/unit/domain/tags/test_canonicalizer.py (13 tests)

Files updated

ingestion/src/metadata/ingestion/source/database/database_service.py — registry/canonicalizer wiring + topology hooks
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py — migrated `yield_tag` and `get_*_tag_labels` overrides
ingestion/setup.py — added tenacity>=8.0,<10 for canonicalizer retries

Tests:

Unit tests

34 tests in test_registry.py (attach/labels_for/drain/clear_scope/ensure_known/stats, thread safety, interning identity by `is`-equality, cache survives `clear_scope`, fqn-level dedup across `label_type` variants)
13 tests in test_canonicalizer.py (canonicalization, retry behavior, no cache poisoning on persistent failure)

Manual testing performed

Local Collate via `docker compose -f docker/development/docker-compose.yml up -d`
Snowflake ingestion against a perf fixture with ~120k tag attachments on a single schema → peak heap ~112 MB → ~24 MB (~4.7× reduction)
Multi-DB ingestion (5 perf databases × 39 schemas with tags) → per-scope peak ≤ 0.24 MB, every `clear_scope` releases bucket memory
Behavioral verification of inheritance + dedup on `OM_PERF_MAIN.NORMAL_011.T_0001` (schema + table + column tag levels)

UI screen recording / screenshots:

Not applicable.

Checklist:

I have read the CONTRIBUTING document.
My PR title is descriptive
My PR is linked to a GitHub issue
I have added tests (unit) and listed them above.

Summary by Gitar

Refactored TagRegistry internals:
- Improved thread-safety in clear_scope by moving the _cleared_scopes update inside the state lock.
- Simplified _build_pending_record to handle empty strings for descriptions instead of optional checks.
Enhanced API and Model robustness:
- Refined Either model typing using Annotated and | unions for better type checking.
- Updated TopologyNode to use Annotated for field definitions and list[str] | None for union types.
Connector and infrastructure updates:
- Standardized FQN building across DatabaseServiceSource and SnowflakeSource using cast to satisfy type checkers.
- Improved error reporting by including traceback.format_exc() in StackTraceError objects during tag canonicalization.
Test suite modernization:
- Rewrote Snowflake unit tests to use actual TagRegistry state rather than complex mocks, improving test reliability and clarity.

_{This will update automatically on new commits.}

Adds metadata.domain.tags with TagRegistry (per-Source bookkeeping) and TagCanonicalizer (case-corrected name resolution against OM). Migrates the Snowflake connector to the new architecture; other connectors stay on the legacy context.tags flow (strangler pattern). TagRegistry interns shared TagLabel instances by (classification, tag, label_type, state) and rebinds the per-entity dict on scope clear. On a schema with ~120k tag attachments and 21 unique tags, peak heap drops from ~112 MB to ~24 MB. Public get_*_tag_labels methods on the database service base are unchanged; non-Snowflake DB connectors are not touched.

Copilot

Pull request overview

This PR introduces a new ingestion “domain” layer for tags/classifications to reduce memory usage by interning TagLabel instances and separating canonicalization (case correction) from bookkeeping. It wires the new TagRegistry/TagCanonicalizer into the database source base and migrates the Snowflake connector to use the new flow, with added unit test coverage and a new tenacity dependency for retry behavior.

Changes:

Added metadata.domain.tags with TagRegistry (interning + scope clearing) and TagCanonicalizer (system-provider case-corrected resolution with retries).
Wired registry/canonicalizer into DatabaseServiceSource and migrated Snowflake tag ingestion + tag-label lookup to use the registry.
Added unit tests for the new domain layer and added tenacity to ingestion dependencies.

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
ingestion/src/metadata/domain/init.py	Defines the new domain-layer package and its intended boundaries.
ingestion/src/metadata/domain/tags/init.py	Exposes the tags domain API (`TagRegistry`, `TagCanonicalizer`, etc.).
ingestion/src/metadata/domain/tags/registry.py	Implements tag payload dedup + `TagLabel` interning + per-scope label storage/clearing.
ingestion/src/metadata/domain/tags/canonicalizer.py	Implements ES-backed canonicalization with retry + caching.
ingestion/src/metadata/ingestion/source/database/database_service.py	Adds per-source registry/canonicalizer and topology hooks intended to clear per-scope tag state.
ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py	Migrates Snowflake tag ingestion and tag-label lookup/inheritance to the new registry/canonicalizer.
ingestion/tests/unit/domain/init.py	Test package marker for the new unit-test module path.
ingestion/tests/unit/domain/tags/init.py	Test package marker for tags domain unit tests.
ingestion/tests/unit/domain/tags/test_registry.py	Unit tests validating registry semantics, interning, and concurrency behavior.
ingestion/tests/unit/domain/tags/test_canonicalizer.py	Unit tests validating canonicalization, caching, and retry behavior.
ingestion/setup.py	Adds `tenacity` runtime dependency for canonicalizer retries.

+    def clear_scope(self, scope_fqn: str) -> None:
+        """Drop labels under ``scope_fqn`` and mark the scope cleared.
+
+        Subsequent ``attach`` calls for this scope will raise.
+        """
+        prefix = scope_fqn + fqn.FQN_SEPARATOR
+
+        with self._scope_state_lock:
+            kept = {k: v for k, v in self._labels_by_entity.items() if k != scope_fqn and not k.startswith(prefix)}
+            dropped = len(self._labels_by_entity) - len(kept)
+            self._labels_by_entity = kept
+        if dropped:
+            logger.debug("TagRegistry: cleared scope %s (%d entity labels dropped)", scope_fqn, dropped)
+
+        with self._run_state_lock:
+            self._cleared_scopes.add(scope_fqn)


    def yield_database_tag(self, database_entity: str) -> Iterable[Either[OMetaTagAndClassification]]:
        """Yield database-level tags for the topology."""
        if not self.source_config.includeTags:
            return

-        if database_entity in self.database_tags_map:
-            database_fqn = fqn.build(
-                self.metadata,
-                entity_type=Database,
-                service_name=self.context.get().database_service,
-                database_name=database_entity,
-            )
-            for tag_info in self.database_tags_map[database_entity]:
-                yield from get_ometa_tag_and_classification(
-                    tag_fqn=FullyQualifiedEntityName(database_fqn),
-                    tags=[tag_info["tag_value"]],
-                    classification_name=tag_info["tag_name"],
-                    tag_description=SNOWFLAKE_TAG_DESCRIPTION,
-                    classification_description=SNOWFLAKE_CLASSIFICATION_DESCRIPTION,
-                    metadata=self.metadata,
-                    system_tags=True,
+        if database_entity not in self.database_tags_map:
+            return
+
+        database_fqn = fqn.build(
+            self.metadata,
+            entity_type=Database,
+            service_name=self.context.get().database_service,
+            database_name=database_entity,
+        )
+        for tag_info in self.database_tags_map[database_entity]:
+            try:
+                classification = self.tag_canonicalizer.classification(
+                    tag_info["tag_name"], SNOWFLAKE_CLASSIFICATION_DESCRIPTION
+                )
+                tag = self.tag_canonicalizer.tag(classification.name, tag_info["tag_value"], SNOWFLAKE_TAG_DESCRIPTION)
+
+                self.tags_registry.attach(
+                    scope_fqn=database_fqn,
+                    entity_fqn=database_fqn,
+                    classification_name=classification.name,
+                    tag_name=tag.name,
+                    classification_description=classification.description,
+                    tag_description=tag.description,
+                )
+            except Exception as exc:


pmbrull · 2026-05-08T12:42:59Z

+from metadata.domain.tags.canonicalizer import Canonical, TagCanonicalizer
+from metadata.domain.tags.registry import ScopeAlreadyClearedError, TagRegistry
+
+__all__ = [


nit - i dont like this because then people like to *

This is way handy that having too deep imports imho, we have static checks to prevent import *

pmbrull · 2026-05-08T12:44:35Z

+logger = ingestion_logger()
+
+
+_es_retry = retry(


thinking this could be more commonly reused in other places other than the canolicalaizer, e.g., when we search for tables, or owners

I agree, but did not want to add it there for now. If this works properly, I will need to create another PR to migrate the rest of the databases to use this and the other service types, while also doing some cleaning. For sure this should be moved there.

pmbrull · 2026-05-08T12:46:16Z

+        return canonical
+
+    @_es_retry
+    def _es_search(self, entity_type: Any, search_string: str) -> Iterable[Any]:


this could be inside ometa with the decorator and all

I agree, but did not want to add it there for now. If this works properly, I will need to create another PR to migrate the rest of the databases to use this and the other service types, while also doing some cleaning. For sure this should be moved there.

pmbrull · 2026-05-08T12:50:07Z

+                                error=f"Tag canonicalization failed for {tag_info['tag_name']}.{tag_info['tag_value']}: {exc}",
+                            )
+                        )
+            yield from (Either(right=record) for record in self.tags_registry.drain())


i dont follow why the drain() here helps vs the vanilla yield we had

It is not about the drain itself.
The gains are on how we populate and what we yield.

Before we were yielding the following:

for tag_info in self.schema_tags_map[schema_name]: yield from get_ometa_tag_and_classification( tag_fqn=FullyQualifiedEntityName(schema_fqn), tags=[tag_info["tag_value"]], classification_name=tag_info["tag_name"], tag_description=SNOWFLAKE_TAG_DESCRIPTION, classification_description=SNOWFLAKE_CLASSIFICATION_DESCRIPTION, metadata=self.metadata, system_tags=True, )

This was yielding multiple times the same tag and classification to be ingested since we are saving basically Tag References.
If Tag A was used one time at a schema, one time at a database and one time at a column, we would send 3 puts on that Tag and Classification.

Drain comes from pending tags that have not yet been yielded to the sink, based on this logic that should avoid us sending duplicates (just classification ones due to the current coupling between both, but I did not want to delve there due to the urgency)

with self._run_state_lock: tag_fqn = model_str(tag_label.tagFQN) if tag_fqn not in self._known_tag_fqns: self._known_tag_fqns.add(tag_fqn) self._pending.append( self._build_pending_record( classification_name=classification_name, classification_description=classification_description, tag_name=tag_name, tag_description=tag_description, ) )

This is geared towards time improvement, memory improvement is handled within the TagRegistry as well, avoiding creating a lot of repeated pydantic TagLabel objects by using references to them instead.

All this while centralizing the Tag logic in one place with a clear api.

pmbrull · 2026-05-08T12:51:29Z

+                schema_name=schema_name,
+            )
+            self.tags_registry.clear_scope(schema_fqn)
+        yield from ()


silly q, why this yield?

Due to how the post process actually works in the Topology Runner:
yield from self._run_node_post_process(node=node)

- Drop None defaults from `TagRegistry.attach()` description params (required `str | None`); normalize None -> "" inside `_build_pending_record` so the OM schema's required-non-null Markdown contract is owned at the registry boundary, not at every caller. - Fix `yield_database_tag` missing registry drain (PR review). - Fix `clear_scope` lock-order race that left labels visible after the scope was marked cleared (PR review). - Resolve basedpyright errors: - `Either` and `TopologyNode` use Form 3 Pydantic v2 fields (`Annotated[X, Field(...)] = default`) so static checkers see the defaults. - `cast("str", fqn.build(...))` at the 13 sites that feed `fqn.build(...)` results into FQN-typed args. - Scoped `# pyright: ignore[reportAttributeAccessIssue]` on `TopologyContext` dynamic-attribute accesses (matches the codebase pattern of grandfathering 8.4k+ such errors via baseline). - Populate `stackTrace` on the three snowflake tag-error `StackTraceError`s so the Status UI shows the trace, not just the one-line summary. - Rewrite three snowflake tag-inheritance tests to drive the real registry attach + inheritance walk after `get_tag_label` was removed: - `test_schema_tag_inheritance` - `test_database_tag_inheritance` - `test_tag_value_precedence` (one attach intentionally passes `None` descriptions to exercise the registry's normalization)

github-actions · 2026-05-09T01:33:05Z

🔴 Playwright Results — 4 failure(s), 8 flaky

✅ 4016 passed · ❌ 4 failed · 🟡 8 flaky · ⏭️ 86 skipped

Shard	Passed	Failed	Flaky	Skipped
✅ Shard 1	299	0	0	4
🔴 Shard 2	752	1	2	8
🔴 Shard 3	755	3	1	7
🟡 Shard 4	787	0	3	18
✅ Shard 5	687	0	0	41
🟡 Shard 6	736	0	2	8

Genuine Failures (failed on all attempts)

❌ Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="StatusBadgeTerm1778323433562"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="StatusBadgeTerm1778323433562"]').locator('.status-badge')�[22m
�[2m    19 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'5b9f8026.Zanyb4677996".StatusBadgeTerm1778323433562-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

❌ Features/Tasks/TaskNavigation.spec.ts › clicking task notification while on entity task tab refreshes the task list (shard 3)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoBeVisible�[2m(�[22m�[2m)�[22m failed

Locator: locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()
Expected: visible
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toBeVisible" with timeout 15000ms�[22m
�[2m  - waiting for locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()�[22m

❌ Features/Tasks/TaskNavigation.spec.ts › two sessions: admin on Columns tab creates task, assignee sees refresh on notification click (shard 3)

Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoBeVisible�[2m(�[22m�[2m)�[22m failed

Locator: locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()
Expected: visible
Timeout: 15000ms
Error: element(s) not found

Call log:
�[2m  - Expect "toBeVisible" with timeout 15000ms�[22m
�[2m  - waiting for locator('.notification-box').locator('li.ant-list-item.notification-dropdown-list-btn').first()�[22m

❌ Flow/ObservabilityAlerts.spec.ts › Alert operations for a user with and without permissions (shard 3)

�[31mTest timeout of 60000ms exceeded.�[39m

🟡 8 flaky test(s) (passed on retry)

Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
Pages/DataContracts.spec.ts › Create Data Contract and validate for Table (shard 4, 1 retry)
Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Not_Set (shard 4, 1 retry)
Pages/Domains.spec.ts › Domain Rbac (shard 4, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

…cascade Reverting `Either.left`/`Either.right` to the original `Annotated[Optional[T], Field(default=None, ...)]` form. The Form 3 shape (`Annotated[T | None, Field(...)] = None`) introduced in the prior commit caused pyright to eagerly bind `T` from the literal-None default at every no-arg construction site like `Either(left=...)`, resolving them to `Either[Unknown]`. Because `Either[T]` is invariant, those failed to satisfy declared generator return types like `Iterable[Either[CreateTableRequest]]` — surfacing 45 latent reportReturnType errors across sample_data, dbt, sas, qliksense, sigma, common_db_source, common_broker_source, amundsen, sink/metadata_rest. Form 2 wraps the default inside `Annotated` metadata where pyright treats it as opaque: it sees "this field has a default" (so construction sites pass) but doesn't eagerly bind `T` from None. Context-driven inference works, no cascade. `TopologyNode` stays in Form 3 — its fields are concrete-typed (`list[str] | None`, `bool`), no Generic-T to bind.

gitar-bot · 2026-05-09T08:56:00Z

+        with self._run_state_lock:
+            if scope_fqn in self._cleared_scopes:
+                raise ScopeAlreadyClearedError(
+                    f"Tag attach called for cleared scope '{scope_fqn!r}' for entity '{entity_fqn!r}'"
+                )
+
+        tag_label = self._intern_tag_label(
+            classification_name=classification_name,
+            tag_name=tag_name,
+            label_type=label_type,
+            state=state,
+        )
+
+        with self._scope_state_lock:
+            self._labels_by_entity.setdefault(entity_fqn, []).append(tag_label)


💡 Edge Case: TOCTOU race between _run_state_lock and _scope_state_lock in attach

In TagRegistry.attach, the check for _cleared_scopes (line 122-126 under _run_state_lock) and the mutation of _labels_by_entity (line 135-136 under _scope_state_lock) are not atomic. A concurrent clear_scope call could interleave between these two critical sections: Thread A passes the cleared-scope guard, Thread B calls clear_scope (marks the scope cleared and rebuilds _labels_by_entity), Thread A then appends a label into the freshly rebuilt dict — effectively inserting a label into a scope that was just cleared.

In the current topology model this is unlikely to trigger because clear_scope runs in post_process after all attach calls for that scope have finished. However, the class advertises itself as "safe for concurrent use across the topology's parallel schema workers" (docstring line 21), which makes this a latent correctness gap if threading assumptions change.

Suggested fix:

Either: 1. Use a single lock for both _cleared_scopes and _labels_by_entity, or 2. Re-check _cleared_scopes inside the _scope_state_lock block: with self._scope_state_lock: with self._run_state_lock: if scope_fqn in self._cleared_scopes: raise ScopeAlreadyClearedError(...) self._labels_by_entity.setdefault(entity_fqn, []).append(tag_label)

Apply suggested fix

_{Check the box to apply the fix or reply for a change | Was this helpful? React with 👍 / 👎}

gitar-bot · 2026-05-09T08:56:05Z

Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Introduces the TagRegistry domain layer to optimize ingestion memory usage and resolves missing tag drains, though a TOCTOU race condition in attach remains to be addressed.

💡 Edge Case: TOCTOU race between _run_state_lock and _scope_state_lock in attach

📄 ingestion/src/metadata/domain/tags/registry.py:122-136

In TagRegistry.attach, the check for _cleared_scopes (line 122-126 under _run_state_lock) and the mutation of _labels_by_entity (line 135-136 under _scope_state_lock) are not atomic. A concurrent clear_scope call could interleave between these two critical sections: Thread A passes the cleared-scope guard, Thread B calls clear_scope (marks the scope cleared and rebuilds _labels_by_entity), Thread A then appends a label into the freshly rebuilt dict — effectively inserting a label into a scope that was just cleared.

In the current topology model this is unlikely to trigger because clear_scope runs in post_process after all attach calls for that scope have finished. However, the class advertises itself as "safe for concurrent use across the topology's parallel schema workers" (docstring line 21), which makes this a latent correctness gap if threading assumptions change.

Suggested fix

Either:
1. Use a single lock for both _cleared_scopes and _labels_by_entity, or
2. Re-check _cleared_scopes inside the _scope_state_lock block:

    with self._scope_state_lock:
        with self._run_state_lock:
            if scope_fqn in self._cleared_scopes:
                raise ScopeAlreadyClearedError(...)
        self._labels_by_entity.setdefault(entity_fqn, []).append(tag_label)

✅ 1 resolved

✅ Bug: yield_database_tag never drains registry — tags lost

📄 ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py:620-634
In yield_database_tag, tags are attached to the registry via self.tags_registry.attach(...) but there is no corresponding yield from (Either(right=record) for record in self.tags_registry.drain()) at the end of the method. This means the OMetaTagAndClassification payloads for database-level tags are queued but never yielded to the topology/sink for creation in OpenMetadata.

By contrast, yield_tag (schema-level) correctly calls drain() at line 618. Without this, database-level classifications and tags will never be created server-side (though their labels will still be attached to entities via inheritance, referencing non-existent tags).

🤖 Prompt for agents

Code Review: Introduces the TagRegistry domain layer to optimize ingestion memory usage and resolves missing tag drains, though a TOCTOU race condition in `attach` remains to be addressed.

1. 💡 Edge Case: TOCTOU race between _run_state_lock and _scope_state_lock in attach
   Files: ingestion/src/metadata/domain/tags/registry.py:122-136

   In `TagRegistry.attach`, the check for `_cleared_scopes` (line 122-126 under `_run_state_lock`) and the mutation of `_labels_by_entity` (line 135-136 under `_scope_state_lock`) are not atomic. A concurrent `clear_scope` call could interleave between these two critical sections: Thread A passes the cleared-scope guard, Thread B calls `clear_scope` (marks the scope cleared and rebuilds `_labels_by_entity`), Thread A then appends a label into the freshly rebuilt dict — effectively inserting a label into a scope that was just cleared.
   
   In the current topology model this is unlikely to trigger because `clear_scope` runs in `post_process` after all `attach` calls for that scope have finished. However, the class advertises itself as "safe for concurrent use across the topology's parallel schema workers" (docstring line 21), which makes this a latent correctness gap if threading assumptions change.

   Suggested fix:
   Either:
   1. Use a single lock for both _cleared_scopes and _labels_by_entity, or
   2. Re-check _cleared_scopes inside the _scope_state_lock block:
   
       with self._scope_state_lock:
           with self._run_state_lock:
               if scope_fqn in self._cleared_scopes:
                   raise ScopeAlreadyClearedError(...)
           self._labels_by_entity.setdefault(entity_fqn, []).append(tag_label)

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

Copilot

Pull request overview

Copilot reviewed 11 out of 13 changed files in this pull request and generated 3 comments.

+    @cached_property
+    def tags_registry(self) -> TagRegistry:
+        """Per-Source registry tracking tag/classification ingestion state."""
+        return TagRegistry(metadata=self.metadata)
+
+    @cached_property
+    def tag_canonicalizer(self) -> TagCanonicalizer:
+        """Per-Source canonicalizer for case-corrected tag/classification names."""
+        return TagCanonicalizer(metadata=self.metadata)


+        with self._scope_state_lock:
+            self._labels_by_entity.setdefault(entity_fqn, []).append(tag_label)
+
+        with self._run_state_lock:


+    """Skip tenacity's between-retry sleeps so retry-tests run instantly."""
+    monkeypatch.setattr("tenacity.nap.time.sleep", lambda *_args, **_kwargs: None)


sonarqubecloud · 2026-05-09T10:01:10Z

Quality Gate failed for 'open-metadata-ingestion'

Failed conditions
11.1% Coverage on New Code (required ≥ 20%)

See analysis details on SonarQube Cloud

Copilot AI review requested due to automatic review settings May 8, 2026 12:37

IceS2 requested a review from a team as a code owner May 8, 2026 12:37

github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels May 8, 2026

Copilot started reviewing on behalf of IceS2 May 8, 2026 12:38 View session

gitar-bot Bot reviewed May 8, 2026

View reviewed changes

Comment thread ingestion/src/metadata/ingestion/source/database/snowflake/metadata.py Outdated

Copilot AI reviewed May 8, 2026

View reviewed changes

IceS2 had a problem deploying to test May 8, 2026 12:49 — with GitHub Actions Failure

IceS2 temporarily deployed to test May 8, 2026 12:49 — with GitHub Actions Inactive

pmbrull reviewed May 8, 2026

View reviewed changes

IceS2 temporarily deployed to test May 8, 2026 23:23 — with GitHub Actions Inactive

IceS2 had a problem deploying to test May 8, 2026 23:23 — with GitHub Actions Failure

IceS2 temporarily deployed to test May 8, 2026 23:23 — with GitHub Actions Inactive

Copilot AI review requested due to automatic review settings May 9, 2026 08:49

Copilot started reviewing on behalf of IceS2 May 9, 2026 08:50 View session

gitar-bot Bot reviewed May 9, 2026

View reviewed changes

Copilot AI reviewed May 9, 2026

View reviewed changes

IceS2 temporarily deployed to test May 9, 2026 09:01 — with GitHub Actions Inactive

IceS2 had a problem deploying to test May 9, 2026 09:01 — with GitHub Actions Failure

IceS2 temporarily deployed to test May 9, 2026 09:01 — with GitHub Actions Inactive

IceS2 had a problem deploying to test May 9, 2026 09:01 — with GitHub Actions Failure

		"""Skip tenacity's between-retry sleeps so retry-tests run instantly."""
		monkeypatch.setattr("tenacity.nap.time.sleep", lambda _args, *_kwargs: None)

		logger = ingestion_logger()


		_es_retry = retry(

Conversation

IceS2 commented May 8, 2026 • edited by gitar-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes:

Why

What

How

Type of change:

High-level design:

Architecture

Files added

Files updated

Tests:

Unit tests

Manual testing performed

UI screen recording / screenshots:

Checklist:

Summary by Gitar

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔴 Playwright Results — 4 failure(s), 8 flaky

Genuine Failures (failed on all attempts)

Uh oh!

gitar-bot Bot May 9, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

sonarqubecloud Bot commented May 9, 2026

Quality Gate failed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IceS2 commented May 8, 2026 •

edited by gitar-bot Bot

Loading

github-actions Bot commented May 9, 2026 •

edited

Loading

gitar-bot Bot commented May 9, 2026 •

edited

Loading