Skip to content

Feature: Add DockerImage classes for docker.io, nvcr.io, and quay.io#697

Open
georgiastuart wants to merge 19 commits into
singularityhub:mainfrom
georgiastuart:feature/add-metadata-fetch-override
Open

Feature: Add DockerImage classes for docker.io, nvcr.io, and quay.io#697
georgiastuart wants to merge 19 commits into
singularityhub:mainfrom
georgiastuart:feature/add-metadata-fetch-override

Conversation

@georgiastuart
Copy link
Copy Markdown
Contributor

@georgiastuart georgiastuart commented May 10, 2026

This is intended to resolve singularityhub/shpc-registry#480 and singularityhub/shpc-registry#481 . The driving motivations are rate limiting from dockerhub and odd output from NGC containers at times. The updated registry files in singularityhub/shpc-registry#483 use this singularity-hpc PR. This is a moderately sized PR, so I'm anticipating a couple rounds of iteration after the PR tests run.

Subclass Design

DockerHubImage, QuayDockerImage and NGCImage Subclass DockerImage and override the tags and digest methods. It's possible there's a more elegant way to simplify, but all these APIs have bespoke needs so I didn't try to unify further. All query metadata APIs rather than use crane.

NCG Compatibility

Querying the NCG API requires an API token. These are free to get for this purpose, but I kept ngcsdk as an optional dependency. If ngcsdk and the environment variable SHPC_NGC_API_KEY are present, shpc will use ngcsdk to update. Otherwise, it will fall back to crane and the normal DockerImage class.

Added CLI arguments for update

While cleaning up some mangled container.yaml files, I added two CLI arguments to update:

--max-tags=N: Overrides the default 5 tags to add
--purge: wipes the existing tags in the container.yaml file to start fresh.

Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
@georgiastuart
Copy link
Copy Markdown
Contributor Author

Shockingly, the tests passed the first time. Ready for review @vsoch !

Copy link
Copy Markdown
Member

@vsoch vsoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! A few comments, and then we should have a test for the registry classes. We will also want a strategy to clean up the current registry repository.

Comment thread shpc/main/container/update/docker.py Outdated

if len(tags) == 0:
logger.error(
f"The tag {tag} you provided is not known. Check that it and the container both exist."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested for here would be to raise the ValueError with the message currently given to logger.error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored error handling a bit, but it still seems messy. Going to take a fresh look at it tomorrow!

)
raise ValueError
new_tags = [x for x in tags if x.get("name")]
tags.extend(new_tags)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check here we will not have duplicates in this list. And what about order? I think different registries may do alphabetical vs. by date. If the user asks for some specific number of tags, we likely want to give them latest.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about this - I mainly took inspiration from the previous tag_quay method where it pulls all the tags and feeds it into the filtering, value extraction, and sorting functions in the update main function. DockerHub does appear to reliably return tags in order of latest update. Given that this class can handle a rate limit, is it better to continue pulling all the tags, or to cap it at the first X tags? I'm guessing we could comfortably restrict it to 5 pages or so and still be able to get the N most recent tags for even tag-spammy projects.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should likely be conservative and cap at some X tags, especially if you are hitting a limit with requests.

Comment thread shpc/main/container/update/docker.py Outdated
break
return tag_responses

def tags(self):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are going in the right direction by using DockerHubImage for NGC, and it looks like there is still some redundancy in these classes (e.g., the tags filter for the sbom and similar). The main difference tends to be in the query tag API function. Could we put more logic in the DockerHubImage base class that the others inherit from with the shared logic?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to minimize redundancy. All class specialization now occurs in _query_tag_api. I marked the manifest and config methods to raise NotImplementedError in the subclasses. It appears that they aren't currently used elsewhere. I intend to implement, but throwing the error in for now just in case!

max_length (int) : the max number to return (latest)
"""

filtered_tags = [x for x in tags]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanity check - order and repeats?

Comment thread shpc/version.py
__copyright__ = "Copyright 2021-2025, Vanessa Sochat"
__license__ = "MPL 2.0"

__version__ = "0.1.33"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's bump up to 0.2.0 for this merge. If you have other changes / PRs to get in we can encompass them with that release too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, can do. I want to add gitlab.com and self hosted gitlab support in before merge. I'm also looking at a strategy to handle multiple "series" of tags, like "X.X.X-alpine" vs "X.X.X-trixie" or whatever.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking. 👍

@georgiastuart
Copy link
Copy Markdown
Contributor Author

Thanks for the review!! I'll implement tests and the other suggestions (and check out the 2 line digest printing).

Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
@georgiastuart
Copy link
Copy Markdown
Contributor Author

To do before merge:

  • Add gitlab.com / self hosted support
  • Add tests for remote image types & error handling
  • (If feasible) look at "tag series" refactor
  • Force tag: digest yaml to print on one line

Signed-off-by: Georgia Stuart <georgia.stuart@gmail.com>
@muffato
Copy link
Copy Markdown
Contributor

muffato commented May 11, 2026

Hi @georgiastuart . I'd be happy to help with "Add gitlab.com / self hosted support". We started working on it with @vsoch in #585 but never got to complete it.

@vsoch
Copy link
Copy Markdown
Member

vsoch commented May 11, 2026

Thank you @muffato !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Address dockerhub pull limit from clobbering files

3 participants