feat: support GitHub URLs as CLI input#7
Merged
Conversation
GitHub-style ZIP archives nest all files under a single directory (e.g. `repo-main/`). Strip this prefix in both ZipMemoryFileSystem (analysis) and materialize_zip_workspace (extraction) so entry detection and asset resolution work correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Parse GitHub URLs into owner, repo, ref, subpath, and blob/tree type. Supports bare repo URLs, branch/tag refs, blob paths, and tree paths. Handles .git suffix and http/https schemes. Rejects non-GitHub URLs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Add resolve_github_auth_token (GITHUB_TOKEN/GH_TOKEN env vars), resolve_github_default_branch (GitHub API), and download_github_archive (zipball endpoint with 256 MB limit). Reuses existing ureq HTTP patterns. Includes descriptive error messages for 404, 403/rate-limit, timeout, and size exceeded. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Add GitHubUrl variant to ResolvedInput. resolve_input() detects GitHub URLs before filesystem access and returns the parsed URL. analyze_input_path() downloads the archive, saves to a temp file, and feeds it into the existing ZIP analysis pipeline. Blob URLs set the implicit entry. Temp directory kept alive via _temp_dir field on AnalyzedInput. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Update convert, validate, and root help messages to document GitHub URL input support and GITHUB_TOKEN/GH_TOKEN env vars. Add GitHub URL examples to README development section. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
Regular analyze_zip() no longer strips common prefixes, preserving existing behavior for user-created ZIPs. New analyze_zip_strip_prefix() applies stripping only when explicitly requested. The GitHub URL flow uses the strip variant; regular ZIP inputs are unchanged. This fixes WASM test failures where intentional subdirectory structure was being incorrectly stripped. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Yonghye Kwon <developer.0hye@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
{repo}-{ref}/. BothZipMemoryFileSystem(analysis) andmaterialize_zip_workspace(extraction) now auto-detect and strip this prefix, so entry detection and asset resolution work correctly for all ZIP inputs.parse_github_url()handles bare repo URLs,/tree/{ref},/blob/{ref}/{path},/tree/{ref}/{dir},.gitsuffix, and bothhttp/httpsschemes. ReturnsNonefor non-GitHub URLs.resolve_github_auth_token()readsGITHUB_TOKEN/GH_TOKENenv vars.resolve_github_default_branch()queries the GitHub API.download_github_archive()fetches the zipball with a 256 MB limit, descriptive errors for 404/403/rate-limit/timeout.resolve_input()detects GitHub URLs before filesystem access.analyze_input_path()downloads the archive, saves to a temp file, and feeds it into the existing ZIP pipeline./blob/URLs set the implicit--entry. Bothconvertandvalidatecommands support GitHub URLs.GITHUB_TOKEN/GH_TOKENenvironment variable documentation.Test plan
marknest convert https://github.com/pulldown-cmark/pulldown-cmark -o test.pdf🤖 Generated with Claude Code