Skip to content

feat(assets): add namespaced model_type tags and align tag semantics#14511

Open
synap5e wants to merge 7 commits into
masterfrom
synap5e/assets-namespaced-tags
Open

feat(assets): add namespaced model_type tags and align tag semantics#14511
synap5e wants to merge 7 commits into
masterfrom
synap5e/assets-namespaced-tags

Conversation

@synap5e

@synap5e synap5e commented Jun 17, 2026

Copy link
Copy Markdown

Introduce model_type: namespaced tag, align asset scanner to set this based on the registered model folder**(s)** the asset belongs to. Drop deep folder-based tag construction, and placement logic. Update asset-upload behaviour to use key system/known tags and faithfully store the rest without affecting on-disk location.

Note: This PR does not attempt to migrate assets tagged by the previous version of the asset system. This migration is feasible, and if desired could be landed in a followup PR.

What this does

Reworks asset tags so they are no longer overloaded as model classification, upload routing, and filesystem subdirectory instructions at the same time.

This PR makes three related behavior changes:

  1. Backend model classification becomes explicit and namespaced. Files under registered model folders are tagged as models + model_type:<folder_name> instead of flat labels such as models + checkpoints or models + loras.
  2. Caller-provided tags stay relaxed labels. Callers may store arbitrary tags, including system-looking strings, and tags are no longer lowercased before persistence. A caller label by itself does not prove a file is under a model folder or make it loadable by a model loader.
  3. Multipart upload placement stops using extra tags as path components. New byte uploads use exactly one destination role (input, output, or models), model uploads use exactly one model_type:<folder_name>, and nested placement uses the explicit subfolder field. Extra tags are labels only.

The new backend-generated model classification shape is:

  • models: the asset has a real filesystem path under at least one allowed registered Comfy model folder.
  • model_type:<folder_name>: the asset path is under the registered model folder named <folder_name>, preserving the registered folder name casing.

Examples: models + model_type:checkpoints, models + model_type:loras, models + model_type:LLM.

Backend-generated classification is added only from trusted filesystem facts, such as a scanner-discovered path, an already-written /upload/image file, or the final destination of a multipart byte upload. Tag filters still operate on the single persisted tag set, so once a tag exists it is filterable regardless of whether it originated from a caller or from backend classification.

Motivation

The old asset tag behavior was not just underspecified; it produced wrong or misleading results for real Comfy model configurations.

  1. Extra model paths were flattened into ambiguous path labels. Comfy model folders are defined by folder_paths.folder_names_and_paths, and a single model type can have registered paths outside the stock <models>/<folder> layout. The old path-derived tags treated the registered folder name and parent path fragments as ordinary flat tags such as checkpoints, loras, or arbitrary subfolder names. That meant the API did not have a stable way to say “this asset belongs to the registered checkpoints model type” versus “this asset merely has a caller/path label named checkpoints.” Namespaced tags make the model type explicit: models + model_type:<folder_name>.

  2. Registered non-model folders could be treated like model upload destinations. folder_names_and_paths also contains entries that are not safe model upload targets. In particular, custom_nodes and configs should not become places where asset upload requests can write user bytes just because they are registered folder names. This PR excludes those names from model classification/upload destination resolution.

  3. Overlapping roots made one flat classification insufficient. A path can belong to more than one relevant backend fact, for example a registered model folder under the output root. The old “choose one root and then emit path fragments” shape could lose part of the truth or encode it as misleading directory labels. This PR lets backend classification keep each trusted fact as tags, e.g. output + models + model_type:* when both root and model-folder membership are true.

  4. Upload routing and tag labels were conflated. Multipart upload tags previously doubled as routing, model category, and subdirectory path components. That made models + checkpoints + foo + bar both a classifier and a filesystem placement instruction. This PR keeps routing narrow: new byte uploads use exactly one destination role (input, models, or output), model uploads use exactly one model_type:<folder_name>, and all other tags remain labels.

This PR also changes the location semantics for new multipart /api/assets byte uploads. Previously, ordered tags described the destination root, model category, and nested filesystem subdirectories: for example, input + foo wrote under input/foo, and models + checkpoints + foo + bar wrote under a checkpoints folder plus foo/bar. In this PR, multipart upload location is selected by explicit placement fields: one destination role (input, output, or models), for model uploads exactly one model_type:<folder_name>, and an optional subfolder form field for nested placement. Extra tags are labels only and do not create subdirectories.

That location change is specific to multipart /api/assets byte uploads. /upload/image keeps its existing placement semantics: it still uses type and subfolder form fields to choose where the image is written. Known /upload/image semantic subfolders such as pasted, painter, webcam, threed, and 3d are mirrored as asset tags when the already-written file is registered.

The resulting model is intentionally simple: tag filters operate on persisted labels, while backend-generated model classification is added only from trusted filesystem facts from scan/register/upload placement.

Examples

  • File under a registered checkpoint folder:

    • tags: models, model_type:checkpoints
    • matches include_tags=models,model_type:checkpoints
    • is filesystem-loadable by checkpoint loaders if the file is valid and the model-list cache sees it
  • File under a case-preserving registered model folder named LLM:

    • tags: models, model_type:LLM
    • model_type:LLM and model_type:llm are distinct tags
  • Multipart byte upload with tags models + model_type:checkpoints:

    • writes to the first allowed registered checkpoints folder
    • persists uploaded, models, and model_type:checkpoints
  • Multipart byte upload with tags input + model_type:checkpoints:

    • writes to input
    • stores model_type:checkpoints as a label
    • does not become checkpoint-loadable because the file is not written under a registered checkpoints folder
  • Multipart byte upload with tags models + model_type:checkpoints + foo + bar and no subfolder:

    • writes at the selected checkpoints folder root using the digest filename
    • stores foo and bar as labels
    • does not create foo/bar subdirectories
  • Multipart byte upload with tags models + model_type:checkpoints and subfolder=foo/bar:

    • writes under the selected checkpoints folder plus foo/bar/<digest><ext>
    • nested placement comes from the explicit subfolder field, not from tags
  • /api/assets/from-hash with tags models + model_type:checkpoints:

    • creates a reference-only asset label set when a new reference is created
    • does not move bytes or synthesize filesystem classification
  • /upload/image with type=input&subfolder=pasted:

    • still writes to input/pasted/<filename> using the existing image-upload fields
    • asset registration adds path-derived input, uploaded, and the known semantic pasted tag
    • multipart asset upload destination-tag rules do not apply to this endpoint

Changes

  • Preserve tag casing in normalization instead of lowercasing every tag.
  • Add a database migration that removes the lowercase CHECK constraint from tags, with a downgrade path that lowercases/merges mixed-case tags before restoring the old constraint.
  • Add trusted path classification for backend-generated tags:
    • input
    • output
    • temp
    • models
    • model_type:<folder_name>
  • Stop deriving model classification from caller labels alone.
  • Stop generating flat model/path labels such as checkpoints, loras, or arbitrary parent directory names from model paths.
  • Resolve multipart byte-upload destinations from exactly one destination role: input, models, or output.
  • Require exactly one model_type:<folder_name> tag for new multipart model uploads.
  • Add explicit multipart subfolder support for nested placement.
  • Treat extra multipart upload tags as labels, not filesystem subdirectories.
  • Reject configs and custom_nodes as model upload destinations.
  • Add uploaded to successful byte upload registrations, including already-written /upload/image registrations.
  • Mirror known /upload/image subfolders (pasted, painter, webcam, threed, 3d) as tags when registering those files as assets.
  • Keep known-hash and duplicate-byte multipart paths reference-only once the content hash is already known: they do not require destination tags, move bytes, synthesize uploaded, or copy path-derived classification.
  • Leave public file_path / display_name response locator fields to feat(assets) Add asset file_path and display_name response fields #14005, which can build on these tag and upload semantics.
  • Advertise supports_model_type_tags: true in server feature flags for frontend capability detection.
  • Make tag prefix querying stop lowercasing the requested prefix.
  • Add focused tests for case-preserving tags, namespaced model filters, upload destination resolution, rejected model destinations, reference-only fast paths, and path classification.

Behavior / compatibility notes

  • Tags are still a single public label set. A persisted models tag matches include_tags=models regardless of whether it was supplied by a caller or generated from a trusted path.
  • This PR keeps generic tag mutation permissive, so arbitrary caller labels still do not by themselves prove filesystem placement. The stacked retagging fix in fix(assets): move model asset on model_type: edit to stay loader-coherent #14561 builds on this contract: when the frontend edit-type flow applies a registered model_type:<folder_name> to a filesystem-backed model asset, that follow-up moves/re-registers the file so the label and loader-visible location stay coherent.
  • New scanner/upload classifications no longer add arbitrary path subfolder labels. For example, output/foo/bar.png is classified as output, not output + foo.
  • Multipart asset upload no longer uses extra tags as nested subdirectories. Nested placement is supported through the explicit subfolder form field.
  • /upload/image placement is not changed: it continues to use its type and subfolder form fields rather than asset tags. Known image-upload subfolder names are also stored as tags for filtering.
  • Existing flat tags such as checkpoints are not migrated into model_type:checkpoints; clients that need transition compatibility should handle both shapes.
  • Overlapping registered roots can produce both root tags and model tags, e.g. output + models + model_type:*; persisted tags are not dynamically recomputed at response time.

Verification

  • python3 -m py_compile on touched Python files and the migration.
  • uv run --with pytest --with sqlalchemy --with pydantic --with aiohttp --with requests --with filelock --with pyyaml --with pydantic-settings --with pillow --with blake3 pytest -q tests-unit/assets_test/services/test_ingest.py tests-unit/assets_test/services/test_path_utils.py tests-unit/assets_test/queries/test_asset_info.py::TestListReferencesPage
    • 39 passed
  • uv run --with pytest --with sqlalchemy --with pydantic --with aiohttp --with requests --with filelock --with pyyaml --with pydantic-settings --with pillow --with blake3 pytest -q tests-unit/assets_test/services/test_path_utils.py
    • 18 passed
  • uv run --with-requirements requirements.txt --with pytest pytest -q tests-unit/assets_test/test_uploads.py
    • 31 passed
  • uv run --with-requirements requirements.txt --with-requirements tests-unit/requirements.txt pytest -q tests-unit --tb=short
    • 902 passed, 10 skipped, 9 warnings
  • uv run --with-requirements requirements.txt --with-requirements tests-unit/requirements.txt pytest -q tests-unit/feature_flags_test.py tests-unit/websocket_feature_flags_test.py --tb=short
    • 23 passed
  • Migration smoke test against a temporary SQLite database:
    • upgrade head accepts mixed-case tags such as NewTag and model_type:LLM
    • downgrade to 0004_drop_tag_type restores the lowercase constraint after normalizing mixed-case tags

@synap5e synap5e changed the title Add namespaced model type asset tags feat(assets): Add namespaced model_type: tags Jun 19, 2026
@synap5e synap5e changed the title feat(assets): Add namespaced model_type: tags feat(assets): Add namespaced model_type: tags and align tags semantics Jun 20, 2026
@synap5e synap5e changed the title feat(assets): Add namespaced model_type: tags and align tags semantics feat(assets): add namespaced model_type tags and align tag semantics Jun 20, 2026
@synap5e synap5e force-pushed the synap5e/assets-namespaced-tags branch from 4a757fc to 4340337 Compare June 20, 2026 01:28
@synap5e synap5e marked this pull request as ready for review June 20, 2026 01:31

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4340337c69

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread app/assets/database/queries/tags.py Outdated
@coderabbitai

coderabbitai Bot commented Jun 20, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3fce7d26-6efd-4a1a-bf68-5fd480ca69d6

📥 Commits

Reviewing files that changed from the base of the PR and between 4340337 and de997d2.

📒 Files selected for processing (5)
  • alembic_db/versions/0005_allow_case_sensitive_tags.py
  • app/assets/database/queries/tags.py
  • tests-unit/app_test/test_migrations.py
  • tests-unit/assets_test/queries/test_tags.py
  • tests-unit/assets_test/test_prune_orphaned_assets.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests-unit/assets_test/test_prune_orphaned_assets.py

📝 Walkthrough

Walkthrough

This PR makes asset tags case-sensitive throughout the stack. A new Alembic migration (0005_allow_case_sensitive_tags) removes the CHECK (name = lower(name)) constraint from the tags table with a safe downgrade path that collapses existing mixed-case data. All Pydantic validators, query helpers, and normalize_tags drop their .lower() calls, preserving tag casing end-to-end. Model category tags migrate from bare names (e.g., checkpoints) to a namespaced model_type:<folder_name> scheme. The upload pipeline gains an explicit subfolder parameter replacing tag-embedded path components, and get_backend_system_tags_from_path derives trusted system tags from filesystem containment rather than relative path segments. A supports_model_type_tags feature flag is added to SERVER_FEATURE_FLAGS and the OpenAPI spec. Tests throughout are updated to the new tag naming and storage layout conventions.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 24.58% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(assets): add namespaced model_type tags and align tag semantics' clearly and concisely captures the main change—introducing namespaced model_type tags and restructuring tag semantics—which aligns with the primary objective of the PR.
Description check ✅ Passed The description comprehensively explains the PR's rationale, behavior changes, implementation details, and examples. It directly relates to the changeset and provides meaningful context about what the PR accomplishes and why.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests-unit/assets_test/test_prune_orphaned_assets.py (1)

32-40: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Scope is no longer applied in find_asset, making these assertions cross-test ambiguous.

Line 32 drops scope-based narrowing, but callers still pass scope and use generic names (input.bin / output.bin). That can match unrelated assets and produce false pass/fail outcomes.

Suggested hardening
 def find_asset(http: requests.Session, api_base: str):
     """Query API for assets matching scope and optional name."""
-    def _find(scope: str, name: str | None = None) -> list[dict]:
+    def _find(_scope: str, name: str | None = None) -> list[dict]:
         params = {"limit": "500"}
         if name:
             params["name_contains"] = name
@@
 def test_prune_across_multiple_roots(
@@
     scope = f"multi-{uuid.uuid4().hex[:6]}"
-    input_fp = create_seed_file("input", scope, "input.bin")
-    create_seed_file("output", scope, "output.bin")
+    input_name = f"{scope}-input.bin"
+    output_name = f"{scope}-output.bin"
+    input_fp = create_seed_file("input", scope, input_name)
+    create_seed_file("output", scope, output_name)

     trigger_sync_seed_assets(http, api_base)
-    assert find_asset(scope, input_fp.name)
-    assert find_asset(scope, "output.bin")
+    assert find_asset(scope, input_name)
+    assert find_asset(scope, output_name)
@@
-    assert not find_asset(scope, input_fp.name)
-    assert find_asset(scope, "output.bin")
+    assert not find_asset(scope, input_name)
+    assert find_asset(scope, output_name)

Also applies to: 115-122

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests-unit/assets_test/test_prune_orphaned_assets.py` around lines 32 - 40,
The find_asset function no longer applies scope-based filtering when retrieving
assets, which causes test assertions to be ambiguous when multiple tests use
generic asset names like input.bin or output.bin. Modify the find_asset function
to accept and apply scope as a query parameter in the API request (similar to
how name_contains is currently handled), ensuring that the returned assets are
filtered by both scope and name to prevent false matches across different test
scopes.
🧹 Nitpick comments (1)
alembic_db/versions/0005_allow_case_sensitive_tags.py (1)

23-33: ⚡ Quick win

SQLite foreign key handling may leave FKs disabled on error.

If any statement between PRAGMA foreign_keys=OFF (line 23) and PRAGMA foreign_keys=ON (line 33) raises an exception, foreign keys remain disabled for the connection. Consider wrapping in a try/finally or using a transaction to ensure cleanup.

🛡️ Suggested safer pattern
     if bind.dialect.name == "sqlite":
-        op.execute("PRAGMA foreign_keys=OFF")
-        op.execute(
-            "CREATE TABLE tags_new ("
-            "name VARCHAR(512) NOT NULL, "
-            "CONSTRAINT pk_tags PRIMARY KEY (name)"
-            ")"
-        )
-        op.execute("INSERT INTO tags_new(name) SELECT name FROM tags")
-        op.execute("DROP TABLE tags")
-        op.execute("ALTER TABLE tags_new RENAME TO tags")
-        op.execute("PRAGMA foreign_keys=ON")
-        return
+        try:
+            op.execute("PRAGMA foreign_keys=OFF")
+            op.execute(
+                "CREATE TABLE tags_new ("
+                "name VARCHAR(512) NOT NULL, "
+                "CONSTRAINT pk_tags PRIMARY KEY (name)"
+                ")"
+            )
+            op.execute("INSERT INTO tags_new(name) SELECT name FROM tags")
+            op.execute("DROP TABLE tags")
+            op.execute("ALTER TABLE tags_new RENAME TO tags")
+        finally:
+            op.execute("PRAGMA foreign_keys=ON")
+        return
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@alembic_db/versions/0005_allow_case_sensitive_tags.py` around lines 23 - 33,
The migration code disables foreign keys at the start with `PRAGMA
foreign_keys=OFF` but if any of the subsequent `op.execute()` calls (such as
CREATE TABLE tags_new, INSERT INTO tags_new, DROP TABLE tags, or ALTER TABLE
tags_new RENAME) raise an exception, the final `PRAGMA foreign_keys=ON`
statement will not execute, leaving foreign keys permanently disabled for the
connection. Wrap the middle statements (the table creation, data insertion, and
table swap operations) in a try/finally block to ensure that `PRAGMA
foreign_keys=ON` is always executed regardless of whether an error occurs during
the migration steps.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@alembic_db/versions/0005_allow_case_sensitive_tags.py`:
- Around line 42-51: The downgrade path uses SQLite-specific SQL syntax that
will fail on PostgreSQL and MySQL databases. The operations in the downgrade
function including INSERT OR IGNORE and rowid references are not
database-agnostic. Wrap these SQLite-specific op.execute calls (the INSERT OR
IGNORE statement and the DELETE with rowid usage) in a conditional check for
SQLite dialect similar to the dialect check pattern used elsewhere in the
migration file, or rewrite the operations using portable SQL syntax that handles
both INSERT OR IGNORE (for SQLite), INSERT ... ON CONFLICT DO NOTHING (for
PostgreSQL), and INSERT IGNORE (for MySQL), and replace rowid references with
database-appropriate alternatives like ctid for PostgreSQL or proper primary key
handling.

---

Outside diff comments:
In `@tests-unit/assets_test/test_prune_orphaned_assets.py`:
- Around line 32-40: The find_asset function no longer applies scope-based
filtering when retrieving assets, which causes test assertions to be ambiguous
when multiple tests use generic asset names like input.bin or output.bin. Modify
the find_asset function to accept and apply scope as a query parameter in the
API request (similar to how name_contains is currently handled), ensuring that
the returned assets are filtered by both scope and name to prevent false matches
across different test scopes.

---

Nitpick comments:
In `@alembic_db/versions/0005_allow_case_sensitive_tags.py`:
- Around line 23-33: The migration code disables foreign keys at the start with
`PRAGMA foreign_keys=OFF` but if any of the subsequent `op.execute()` calls
(such as CREATE TABLE tags_new, INSERT INTO tags_new, DROP TABLE tags, or ALTER
TABLE tags_new RENAME) raise an exception, the final `PRAGMA foreign_keys=ON`
statement will not execute, leaving foreign keys permanently disabled for the
connection. Wrap the middle statements (the table creation, data insertion, and
table swap operations) in a try/finally block to ensure that `PRAGMA
foreign_keys=ON` is always executed regardless of whether an error occurs during
the migration steps.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c1776315-8449-45c8-acf4-3f40bb5ffd5b

📥 Commits

Reviewing files that changed from the base of the PR and between e00b556 and 4340337.

📒 Files selected for processing (26)
  • alembic_db/versions/0005_allow_case_sensitive_tags.py
  • app/assets/api/routes.py
  • app/assets/api/schemas_in.py
  • app/assets/api/upload.py
  • app/assets/database/queries/tags.py
  • app/assets/helpers.py
  • app/assets/services/ingest.py
  • app/assets/services/path_utils.py
  • comfy_api/feature_flags.py
  • openapi.yaml
  • server.py
  • tests-unit/assets_test/conftest.py
  • tests-unit/assets_test/queries/test_asset_info.py
  • tests-unit/assets_test/queries/test_tags.py
  • tests-unit/assets_test/services/test_path_utils.py
  • tests-unit/assets_test/test_assets_missing_sync.py
  • tests-unit/assets_test/test_crud.py
  • tests-unit/assets_test/test_downloads.py
  • tests-unit/assets_test/test_list_cursor.py
  • tests-unit/assets_test/test_list_filter.py
  • tests-unit/assets_test/test_metadata_filters.py
  • tests-unit/assets_test/test_prune_orphaned_assets.py
  • tests-unit/assets_test/test_tags_api.py
  • tests-unit/assets_test/test_uploads.py
  • tests-unit/feature_flags_test.py
  • tests-unit/websocket_feature_flags_test.py

Comment thread alembic_db/versions/0005_allow_case_sensitive_tags.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants