-E time limit does not stop the mirror under a slow server#482
Merged
Conversation
457ad38 to
4cab0be
Compare
-E was only evaluated at per-link boundaries, so a slow or throttling server starved the check for minutes, and the smooth stop it finally requested drained the remaining transfers at server pace with no bound. back_wait now checks the deadline every cycle and, once a short grace period expires, aborts the in-flight HTTP transfers like the -T timeout path does (FTP slots stay with their owning thread). back_checkmirror's 0 return, previously dead, now carries the hard stop. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Xavier Roche <roche@httrack.com>
4cab0be to
d14a988
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A capped pre-release crawl of a bot-throttled site never stopped:
-E300ran 8m50s and only ended on an external SIGINT (#481). The deadline was checked at per-link boundaries alone, so slow transfers starved it for minutes, and the smooth stop it eventually requested let the remaining transfers drain at server pace with no bound.back_wait()now checks the deadline on every cycle. Once a grace period expires (maxtime/10, clamped to 5..30s), it aborts the in-flight HTTP transfers the same way the-Ttimeout path does; FTP slots stay with their owning thread.back_checkmirror()'s previously dead 0 return carries the hard stop, so the existing caller branches engage. The new34_local-maxtime.testcrawls a trickling local server with-E2: the fixed engine stops in about 8 seconds, the unfixed binary runs 118s, fails both log assertions, and leaves a.delayedfile behind.The test skips local-crawl.sh's
.delayedleftover audit: a cancelled crawl can orphan a placeholder through a window inhts_wait_delayed()that predates this PR, tracked as #483.Closes #481