fix: preserve protocol (e.g. ssh:// or https://) and port in dependency URL#665
fix: preserve protocol (e.g. ssh:// or https://) and port in dependency URL#665edenfunf wants to merge 17 commits intomicrosoft:mainfrom
Conversation
Fixes microsoft#661. When a user specifies an explicit ssh:// URL in apm.yml (e.g. ssh://git@bitbucket.domain.ext:7999/project/repo.git), APM was silently stripping the port during ssh:// → git@ normalisation and then falling back to https:// after the portless SSH clone attempt failed. Store the original ssh:// string in DependencyReference.original_ssh_url before normalisation, and pass it verbatim to git clone in _clone_with_fallback so the port and protocol are preserved.
sergio-sisternes-epam
left a comment
There was a problem hiding this comment.
Clean, focused fix — the "preserve original URL, don't fix normalization" approach is the right call. Minimal risk, no side effects on existing URL handling. The PR description is excellent with clear root cause analysis and before/after examples.
The test coverage is thorough: custom port, standard ssh, https exclusion, git@ exclusion, and clone URL ordering all covered.
One request before merge: please add a CHANGELOG entry under ## [Unreleased]:
### Fixed
- Preserve `ssh://` dependency URLs with custom ports for Bitbucket Datacenter repositories instead of silently falling back to HTTPS (#661)
Thanks for another solid contribution!
There was a problem hiding this comment.
Pull request overview
This PR fixes cloning failures for self-hosted Git servers (notably Bitbucket Datacenter) when dependencies are specified with an explicit ssh://...:PORT/... URL by preserving the original SSH URL and using it for the SSH clone attempt.
Changes:
- Add
DependencyReference.original_ssh_urland populate it when parsingssh://dependency strings. - Update
_clone_with_fallbackto preferoriginal_ssh_urlfor the SSH clone method. - Add regression tests covering parsing and clone URL selection for
ssh://URLs with custom ports.
Show a summary per file
| File | Description |
|---|---|
src/apm_cli/models/dependency/reference.py |
Adds original_ssh_url and captures it during parsing for ssh:// inputs. |
src/apm_cli/deps/github_downloader.py |
Uses original_ssh_url directly for the SSH clone attempt to preserve custom ports. |
tests/unit/test_generic_git_urls.py |
Adds parsing regression tests for Bitbucket Datacenter-style ssh:// URLs with ports. |
tests/unit/test_auth_scoping.py |
Adds downloader regression tests to assert the SSH clone attempt uses the original ssh:// URL verbatim. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 3
Raw dependency strings can include '#ref' or '@alias' suffixes which are not valid git clone URL syntax. Strip these from original_ssh_url at capture time so the port-preserving clone path does not silently fail and re-trigger the https fallback. Also narrows the over-broad except clause in regression tests from (RuntimeError, Exception) to (RuntimeError, GitCommandError), and adds the CHANGELOG entry requested in review.
|
@edenfunf looks like we got a conflict now. We will need that solved before we can merge the PR. Can you review this please? |
|
@sergio-sisternes-epam Conflict has been resolved — merged the latest main into the branch. Ready for review. |
a5917bf to
c75b2e6
Compare
|
@edenfunf the lockfile needs to track provenance URL with the port too in this case. We need to test this also works. |
When a dependency is specified as ssh://host:port/repo, the lockfile was storing only host and repo_url, silently dropping the custom port. Add original_ssh_url to LockedDependency so the verbatim ssh:// URL (including any non-standard port) is written to apm.lock.yaml. from_dependency_ref copies it from DependencyReference; to_dict/from_dict round-trip it correctly. Add three regression tests to TestBitbucketDatacenterSSH verifying that the lockfile captures the port, that the field survives a to_dict/from_dict round-trip, and that HTTPS deps do not set the field.
|
@danielmeppiel Thanks for catching this — you're right, the lockfile was silently dropping the custom port. Added original_ssh_url to LockedDependency so the verbatim ssh:// URL is now written to apm.lock.yaml and survives a to_dict/from_dict round-trip. Also added three regression tests covering the lockfile path. Will keep provenance completeness in mind for future changes involving URL normalization. |
|
I'm not sure the lockfile should track this as a separate field. In essence this is the actual repo url. There is a repo url field already. We may be cluttering the lockfile and overall logic (including download logic) unnecessarily. |
I'm not sure the lockfile should track this as a separate field. In essence this is the actual repo url. There is a repo url field already. We may be cluttering the lockfile and overall logic (including download logic) unnecessarily.
Right now, the problem is that the existing repo_url path drops the port during normalization, so the lockfile ends up not reflecting the exact source that was used. My initial approach was to preserve the original SSH URL separately to avoid losing that information. That said, I agree this may be duplicating what should conceptually be a single source of truth. A cleaner approach would be to fix the existing repo_url / host handling so the port is preserved, instead of introducing a new field. Also worth noting: the download path already re-parses apm.yml, so this doesn't affect install correctness — it's mainly about lockfile fidelity / auditability. I'm happy to switch to fixing repo_url normalization (e.g. preserving port in host like bitbucket.domain.ext:7999) and drop original_ssh_url if that aligns better with the intended design. |
|
@edenfunf thank you - we already have a user reporting a related problem with an https url on a specific port #731 So I'd rather solve this at the root rather than patching just for ssh. The lockfile should stay as simple as possible and the repo_url should reflect the actual url including the port from where the package was retrieved for maximum auditability |
|
@edenfunf I changed the title of the PR to reflect the scope (preserving protocol as well as port in repo url) |
|
@edenfunf im thinking of a scenario here that may need a bit more consideration:
The other way around (https vs ssh) is also true. This is why I originally decided to normalize ssh into https. We probably need to preserve protocol yet fall back to the other protocol (ssh vs https and viceversa). This may need careful consideration / analysis. This is not the case for ports as those are always to be preserved. |
|
Additional thoughts: We may have git@github.com:user/repo.git which is not a standard URL. It’s a Git-specific shorthand (called SCP-like syntax) that:
We may also have true SSH URL format such as ssh://git@github.com/user/repo.git which is is a valid URL, because:
Git accepts both interchangeably. But:
I'm leaning towards respecting the users ssh url and keeping it in lockfile (in true URL format only, not SCP) regardless of SCP or URL notation provided as input - I.e. normalizing SCP to SSH URL. Falling back to HTTPS may be better for UX in developer teams. I need a second thought from @sergio-sisternes-epam |
|
@danielmeppiel additional comments from my side:
The lockfile should record what was used (respecting the installer's choice), but clone time should be forgiving: try the lockfile protocol first, fall back to the alternative if it fails. This way:
The only trade-off is slightly noisier lockfile diffs/conflicts when two devs install the same package with different protocols, but that's minor compared to broken installs. We might want to update the documentation and recommend teams to use the same clone method to avoid this inconvenience. |
|
@danielmeppiel Weighing in on your two questions: 1. Normalise SCP to 2. Protocol fallback for team UX? I think we should fall back. The scenario you described is real -- one dev installs with SSH, another teammate doesn't have SSH keys configured, and they're stuck. The lockfile should record what was used (respecting the installer's choice), but clone time should be forgiving: try the lockfile protocol first, fall back to the alternative if it fails. This way:
The only trade-off is slightly noisier lockfile diffs when two devs install the same package with different protocols, but that's minor compared to broken installs. |
|
Thanks @danielmeppiel and @sergio-sisternes-epam for the thoughtful discussion. Here's my planned approach before I start reworking the PR: Lockfile format
Clone-time protocol fallback
Field changes
Tests
Does this match what you both had in mind? Happy to adjust before I start — otherwise I'll push the rework shortly. |
Maintainer verdict (port preservation design)@edenfunf @sergio-sisternes-epam thanks for the thoughtful back-and-forth. I ran this through our python-architect persona and a security review. Here's where I land — and I'd like to redirect the rework before you push v3. TL;DRReject both v2 ( Why not v2
Why not v3It's over-engineered for what is a port-preservation bug. Specifically:
What to build instead1. Single new field# src/apm_cli/models/dependency/reference.py
port: Optional[int] = None # Non-standard SSH/HTTPS port
# src/apm_cli/deps/lockfile.py — LockedDependency
port: Optional[int] = None2. Stop normalizing
|
| File | Change |
|---|---|
src/apm_cli/models/dependency/reference.py |
Add port, replace _normalize_ssh_protocol_url with a real ssh:// parser via urlparse |
src/apm_cli/utils/github_host.py |
build_ssh_url / build_https_clone_url accept port |
src/apm_cli/deps/lockfile.py |
Add port, serialize when non-default, validate on deserialize |
src/apm_cli/deps/github_downloader.py |
Thread dep_ref.port through _build_repo_url (no fallback chain changes) |
Closes the same root cause for both #661 (SSH) and #731 (HTTPS).
@edenfunf — happy to discuss any specific tradeoff, but this is the direction I'd like the PR to take. Let me know if anything is unclear before you start.
|
@edenfunf here's a concrete checklist to land this. Sing out if any item is unclear and I'll expand. Code
TestsTwo existing test files are the right home — please add to them rather than creating a new file:
DocsThese need updating in the same PR (our convention is code + docs land together):
CHANGELOG
Out of scope (separate follow-up)
Validation before pushinguv run pytest tests/unit/test_generic_git_urls.py tests/unit/test_lockfile.py tests/unit/test_auth_scoping.py -x
uv run pytest tests/unit tests/test_console.py -x # full unit suite, must stay greenOnce all of the above is in, I'll do another review pass and we can ship. Thanks for sticking with this — the original bug report is going to make a lot of self-hosted users happy. |
Fixes #661.
Root Cause
When APM parses a dependency URL that starts with
ssh://,_normalize_ssh_protocol_urlconverts it to thegit@host:pathformat and silently drops any custom port:_clone_with_fallbackthen tries SSH on the reconstructed URL (default port 22, wrong) which fails, and falls back to:This affects Bitbucket Datacenter and any other self-hosted git server that uses a non-standard SSH port.
Fix
Add an
original_ssh_urlfield toDependencyReference. Whenparse_from_strencounters a URL that begins withssh://, the verbatim string is stored in this field before the normalisation step.In
_clone_with_fallback, Method 2 (SSH) now usesdep_ref.original_ssh_urldirectly when it is set, instead of rebuilding the URL from host + repo_ref (which loses the port):Before
After
Changes
src/apm_cli/models/dependency/reference.pyoriginal_ssh_urldataclass field; capture it inparse_from_strbefore normalisation; pass to constructorsrc/apm_cli/deps/github_downloader.pydep_ref.original_ssh_urlverbatim in SSH clone attempt when presenttests/unit/test_generic_git_urls.pyTestBitbucketDatacenterSSHclass with 5 regression testsRegression Protection
Added
TestBitbucketDatacenterSSHcovering:ssh://with custom port stores the original URL inoriginal_ssh_urlrepo_urlfields are still correctly parsedssh://without a port also preserves the original URLoriginal_ssh_urlgit@shorthand URLs do not setoriginal_ssh_urlAll 3 763 existing unit tests continue to pass.