Skip to content

feat: support GitHub URLs as CLI input#7

Merged
developer0hye merged 6 commits into
mainfrom
feat/github-url-support
Mar 9, 2026
Merged

feat: support GitHub URLs as CLI input#7
developer0hye merged 6 commits into
mainfrom
feat/github-url-support

Conversation

@developer0hye

Copy link
Copy Markdown
Owner

Summary

  • ZIP common prefix stripping: GitHub-style archives nest files under {repo}-{ref}/. Both ZipMemoryFileSystem (analysis) and materialize_zip_workspace (extraction) now auto-detect and strip this prefix, so entry detection and asset resolution work correctly for all ZIP inputs.
  • GitHub URL parsing: New parse_github_url() handles bare repo URLs, /tree/{ref}, /blob/{ref}/{path}, /tree/{ref}/{dir}, .git suffix, and both http/https schemes. Returns None for non-GitHub URLs.
  • GitHub archive download: resolve_github_auth_token() reads GITHUB_TOKEN/GH_TOKEN env vars. resolve_github_default_branch() queries the GitHub API. download_github_archive() fetches the zipball with a 256 MB limit, descriptive errors for 404/403/rate-limit/timeout.
  • CLI integration: resolve_input() detects GitHub URLs before filesystem access. analyze_input_path() downloads the archive, saves to a temp file, and feeds it into the existing ZIP pipeline. /blob/ URLs set the implicit --entry. Both convert and validate commands support GitHub URLs.
  • Documentation: Updated help text and README with GitHub URL examples and GITHUB_TOKEN/GH_TOKEN environment variable documentation.

Test plan

  • ZIP prefix stripping: 4 new tests (GitHub-style prefix, no prefix, multiple top dirs, single nested file)
  • URL parsing: 10 new tests (bare URL, tree, blob, tag+dir, .git suffix, http, non-GitHub, malformed, non-URL, trailing slash)
  • Auth token resolution: 1 combined test (GITHUB_TOKEN priority, GH_TOKEN fallback, neither set, empty values)
  • resolve_input integration: 2 new tests (GitHub URL returns GitHubUrl variant, local path returns local variant)
  • All existing tests pass without regression
  • Manual E2E: marknest convert https://github.com/pulldown-cmark/pulldown-cmark -o test.pdf

🤖 Generated with Claude Code

developer0hye and others added 6 commits March 9, 2026 14:59
GitHub-style ZIP archives nest all files under a single directory
(e.g. `repo-main/`). Strip this prefix in both ZipMemoryFileSystem
(analysis) and materialize_zip_workspace (extraction) so entry
detection and asset resolution work correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Parse GitHub URLs into owner, repo, ref, subpath, and blob/tree type.
Supports bare repo URLs, branch/tag refs, blob paths, and tree paths.
Handles .git suffix and http/https schemes. Rejects non-GitHub URLs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Add resolve_github_auth_token (GITHUB_TOKEN/GH_TOKEN env vars),
resolve_github_default_branch (GitHub API), and
download_github_archive (zipball endpoint with 256 MB limit).
Reuses existing ureq HTTP patterns. Includes descriptive error
messages for 404, 403/rate-limit, timeout, and size exceeded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Add GitHubUrl variant to ResolvedInput. resolve_input() detects
GitHub URLs before filesystem access and returns the parsed URL.
analyze_input_path() downloads the archive, saves to a temp file,
and feeds it into the existing ZIP analysis pipeline. Blob URLs
set the implicit entry. Temp directory kept alive via _temp_dir
field on AnalyzedInput.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Update convert, validate, and root help messages to document
GitHub URL input support and GITHUB_TOKEN/GH_TOKEN env vars.
Add GitHub URL examples to README development section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Regular analyze_zip() no longer strips common prefixes, preserving
existing behavior for user-created ZIPs. New analyze_zip_strip_prefix()
applies stripping only when explicitly requested. The GitHub URL flow
uses the strip variant; regular ZIP inputs are unchanged. This fixes
WASM test failures where intentional subdirectory structure was being
incorrectly stripped.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
@developer0hye developer0hye merged commit 66323f1 into main Mar 9, 2026
2 checks passed
@developer0hye developer0hye deleted the feat/github-url-support branch March 9, 2026 06:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant