Prune by either int or interval for all retention policies#8775
Prune by either int or interval for all retention policies#8775Goddesen wants to merge 24 commits into
Conversation
9beb1b7 to
9b828f5
Compare
|
This is loosely related to #8715 as well. I like the idea very much (I didn't and can't review the code, but I'm happy to help with the docs if desired). Borg's current retention policy IMHO is rather hard to understand for beginners (but very safe, because it usually keeps more than what users expect) and an interval-based approach feels more intuitive to me. However, special care is necessary to not cause unexpected behaviour, especially with frequent backups (e.g. daily), combined with overlapping rules (e.g. including all of within, daily, weekly, monthly, and yearly), and some missing backups (and thus the need to sometimes keep the "next best" to later fulfil less frequent rules). As far as I remember there were some issues with this in the early days of Borg that at least caused multiple major docs overhauls (not sure whether the behaviour actually changed substantially, but the docs definitely did because many users understood things wrong). Maybe Thomas can give some insights about the challenges back then? |
82c9e4e to
da77550
Compare
ThomasWaldmann
left a comment
There was a problem hiding this comment.
Thanks for the PR!
Some minor stuff I found...
64c48ed to
38fbd43
Compare
|
It seems the GHA test runner did not like the previous |
I'd vote for removing
IMHO this must be consistent with #8776. Question is what we agree on. I honestly believe differently: If it is |
|
If you want to be able to backport this to 1.4-maint, you must not break compatibility in this PR, but deprecating some options that can be replaced by a better new option can be done here. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8775 +/- ##
==========================================
+ Coverage 83.62% 84.94% +1.31%
==========================================
Files 90 92 +2
Lines 15539 15029 -510
Branches 2337 2238 -99
==========================================
- Hits 12995 12766 -229
+ Misses 1806 1571 -235
+ Partials 738 692 -46 ☔ View full report in Codecov by Harness. |
How do you do separation of functionality like this when a breaking 2.0 change is to be backported with deprecations instead? The code might turn out very different, especially with changes introduced on master. |
I don't think #8776 is relevant, as it matches absolute timestamps based on a pattern (like how the I had a whole comment here written out defending the inclusive check based on comparing timestamps only using seconds-granularity. In that scenario this makes sense. Having taken a closer look I see that |
|
Comments fixed, inclusive timestamp change reverted with tests made slightly more robust. If that's it for implementation comments I'll extend the documentation with an involved example like the retention policy I have described in the PR description and write a test for it. I am still not sure how to go about doing breaking change on |
|
Easiest is to backport a non-breaking change and then do the breaking change in a separate PR afterwards (and not backport that). There are some code structure differences between master and 1.4-maint though, so even backporting the first PR might not be trivial. As 1.4.x is a stable release, we need to be very careful to not break anything there. That sometimes can mean "no backport" if the change is too big or too risky. |
|
Last explicit default removed.
The changiest change introduced here is allowing intervals to be of length Am I correct in assuming I just put entries for |
|
In regards to backporting this to I kinda feel like I accidentally caused some confusion, so: Do we even have breaking changes right now? The only "breaking" change is that |
Ah I forgot there is one more thing: But if you want, it's pretty easy to just not introduce that change now, and do it in the breaking PR instead. |
There's only a difference if someone actually managed to create multiple archives within the same second, right? I'd say that's similar to Anyway, I don't think we really have a problem here. If we decide on dropping Not for me to decide, but I suggest dealing with that stuff in the backport PR.
It does: It can also match all archives within e.g. a full month relative to "now" (and added just recently, also relative to arbitrary other timestamps). However, I agree that there's no practical difference due to the microseconds scale. I didn't consider that. Yet I'd still vote for consistency. I just skimmed through @c-herz's work in #8776 and noticed that matching exclusively causes some problems there as well (minor, yet there are some). IMHO the consistency question still somewhat matters, so how about we simply decide on always matching inclusively? I'm thus also tagging @c-herz, WDYT? |
|
After some consideration I think this new retention interval should also account for the oldest rule that was implemented for the retention count flags. I'll get back to this. |
|
before doing further changes, please rebase on current master branch to get the cython workaround. otherwise, builds will fail. |
|
Guess I'ld like to merge this for next beta, can we finish this until then? |
|
I'll endeavor to get something done soon then. Factorio has been a distraction 😄 |
|
btw, i dissected the |
|
ping? shall I help here a bit by rebasing this on current master branch and resolve the conflicts? |
a87a0d9 to
dc6f878
Compare
Yes, guess that would be good. |
So just outright remove them here and mark them deprecated in a backport? |
|
Deprecation:
|
|
@Goddesen, great work ❤️ Just a quick note that I haven't forgotten about this PR after all the great and productive discussions we've had. Unfortunately, I've been a bit swamped with work lately, so I haven't had the chance to review it from an user's perspective yet. I'm aiming to get to it by the middle of next week. Hold me accountable if I don't 😉 Sorry for the delay! |
For the benefit of reviewers I'll hold off on rebasing until ready for merge. |
|
Review by Claude Opus 4.8: Took a detailed pass over the code and the discussion. Overall this is a really nice improvement to a long-standing Bug — uncaught
|
| if arg in prune_keys and (isinstance(val, timedelta) or val == -1) | ||
| ] | ||
| for (lo_arg, lo_val), (hi_arg, hi_val) in combinations(interval_args, 2): | ||
| if lo_val == -1 or lo_val >= hi_val: |
There was a problem hiding this comment.
Here's a minimal fix for the TypeError described above. -1/all should only ever be handled by the equality branch; the magnitude comparison must be skipped whenever either side is the -1 sentinel:
| if lo_val == -1 or lo_val >= hi_val: | |
| if lo_val == -1 or (hi_val != -1 and lo_val >= hi_val): |
This keeps the intended semantics:
lo == -1(finer rule unlimited) → still flagged, since it covers any coarser rule.- finite finer + unlimited coarser (
hi == -1) → allowed, as it should be (not redundant). - both finite → compared as before.
And a regression test (in src/borg/testsuite/archiver/prune_cmd_test.py) covering the previously-uncovered orientation — these combos should succeed, not error:
@pytest.mark.parametrize(
"keep_args",
[
("--keep-daily=7d", "--keep-yearly=all"),
("--keep-daily=7d", "--keep-yearly=-1"),
("--keep-hourly=14d", "--keep-daily=all"),
],
)
def test_prune_interval_with_unlimited_coarser(archivers, request, backup_files, keep_args):
# A finite finer interval combined with an unlimited ("all"/-1) coarser
# count is a valid, non-redundant combination and must not raise.
# Previously crashed with:
# TypeError: '>=' not supported between instances of 'datetime.timedelta' and 'int'
archiver = request.getfixturevalue(archivers)
cmd(archiver, "repo-create", RK_ENCRYPTION)
dt = datetime(2023, 12, 31, tzinfo=timezone.utc)
_create_archive_dt(archiver, backup_files, "test-1", dt)
output = cmd(archiver, "prune", "--list", "--dry-run", "--since", dt.isoformat(), *keep_args)
assert re.search(r"Keeping archive .*test-1", output)There was a problem hiding this comment.
Aha. This is an oversight after a simplification midway through. Fixing.
|
Int vs. timedelta comparison in granularity ordering check fixed.
Reworded.
With the rework this is only applicable in a very specific case, and only for new behavior: A fine-grained count-based rule is unfulfilled while a coarser-grained interval rule has an interval that doesn't cover the archive that would be kept by the fine-grained oldest rule. Example: I don't think this is worth spending much more time on -- only new behavior is affected here, the existing oldest-rule rolling backup test is green, and I cannot think of any test that could challenge this assertion for any behavior from Borg 1.
Reworded.
Non-issue.
Outdated comment after refactor. Removed that final point.
Fixed.
Not really needed; removed.
I still think Up to you.
All relevant tests are green, I think this is fine. |
Seems like there is some markup issue in the prune help (guess epilog). |
You're right! I assumed workflow hiccups. Fixed. |
|
Claude review: Did another pass on the current revision (
|
Fixed. Added a validation with a message that at least one
Reworded.
Minor. Not worth addressing. |
There was a problem hiding this comment.
Here we go: My initial review from an user's perspective 🎉
As always, I didn't do a code review. I skimmed through the code to verify a few things, but I mostly comment based upon the docs - i.e., how I interpret the docs and how I think the features should behave, all from an user's perspective. I can't verify whether the code actually behaves that way. From what I see there, all except one IMHO major issue (see below) looks just great! 👍
Amazing work! ❤️
I also agree with Claude that narrowing the scope was the right call.
Even though you presented --since (despite the discussion about its name) like a workaround for the test suite, I believe it's a genuine feature that deserves to be widely adopted by users. It cleanly solves the original problem with "exact intervals", varying execution times, and stray microseconds. With --since and daily backups, I simply set 00:00 or 23:59 as fixed reference point, and 1w reliably means exactly one week from that anchor - independent of when prune actually runs. That makes the behaviour much more predictable. Simple, but powerful, great idea! 🚀
That being said, I have a few ideas regarding the prior "absolute time" and "fuzzy time" matchers, especially when combined with --since. But that's intentionally out-of-scope, so a topic for another time…
I haven't tested the code yet. But I'll do some test runs after Thomas' code review is done. However, the test suite look pretty thorough. Again, great work!
I did some work on the new docs/misc/prune-example-interval.txt example, at Goddesen#1 you'll find a PR against your fork to include my suggested changes. My reasoning is explained below.
My only "major" issue with the current implementation is with the interval() helper and how it parses months. IMO a user doesn't expect 1m to mean 31 days, and every higher number to be a multiple of 31 days (with e.g. 18m yielding 558 days, roughly 10 days too many). I was quite baffled by that. Just to be clear, you didn't change that, it behaved like that before, it's just that the scope has changed in a way that makes it a major issue now even though it wasn't one before. You'll find my thoughts on the matter below.
#edit: Looks like as if GitHub mixed up the comments… It might be best to read them via https://github.com/borgbackup/borg/pull/8775/changes instead.
| command. Note the yearly rule keeping _any two_ yearly archives. | ||
|
|
||
| borg prune \ | ||
| --since '2026-06-04 16:00' \ |
There was a problem hiding this comment.
IMO --since shouldn't be equal to "now" in an example, because "now" is the default anyway.
I like the idea of "prune running immediately after create", but we might want to switch to "create starts at 16:00" instead, also implying that prune runs a little later. Since we're looking at prune, "now" is somewhat after 16:00. This also better allows for explaining --since.
This change of the example has some more implications, like the latest archive being kept not due to --keep-daily, but due to --since (because it was created at or after 16:00), and the archive of 2026-05-28 being kept, too.
There are some more suggestions on this document below. I've created a PR against your fork with all my suggestions implemented already. See Goddesen#1
There was a problem hiding this comment.
Follow-up: Is it possible to specify a future date with --since? In my adjusted example 2026-05-28 is kept, because the current day's archive (2026-06-04) isn't counted. However, one could force Borg by always counting the current day's archive and thus excluding 2026-05-28 by using something like --since '2026-06-04 26:59:59' (i.e., the end of the day). Does this work?
| prune run. | ||
|
|
||
| 2026-03-31 was skipped, so 2026-03-30 is the monthly candidate for that month. | ||
| 2025-12-31 16:00 is exactly 5 months (5 * 31 days) from today and so that day's |
There was a problem hiding this comment.
2025-12-31 is exactly 5 months […] from today [2026-06-04]
What? 😅
Okay, to be honest, this hit me totally off-guard. This is very much unexpected.
I checked the code and indeed, that's how helpers.parseformat.interval() is and always was implemented, with 5m yielding a delta of 5 * 31 = 155 days. But IMO that's not what a user would expect when thinking about "5 months before today"… A user thinks about 2026-01-04 instead.
Since this logic was only used by --keep-within before and with presumably rather short time-spans in practice, this wasn't really an issue. But with interval-based retention policies it's quite important that 5m actually means "5 months" and not 155 days. Totally reasonable policies like "1,5 years" translate to 18m, currently yielding 558 days, roughly 10 days too many.
Unfortunately Python's native timedelta doesn't support anything beyond weeks, but there are some 3rd-party libraries out there (e.g. https://dateutil.readthedocs.io/en/latest/index.html). But honestly, a home-grown implementation for months and years is fairly simple… Only issue is that we need the datetime as a starting point; but from a quick look at the code I don't think that this is a major issue.
The current interval() helper seems to be only used by the pruning code. The new int_or_interval() seems to be only used by pruning tests? So it should be possible to replace them without affecting other components.
Here's a working PoC for a relative_interval() helper:
from argparse import ArgumentTypeError
from calendar import monthrange
from datetime import datetime, timedelta
def _interval_subtract_months(dt: datetime, months: int) -> datetime:
total_months = dt.year * 12 + (dt.month - 1) - months
year = total_months // 12
month = total_months % 12 + 1
day = min(dt.day, monthrange(year, month)[1])
return dt.replace(year=year, month=month, day=day)
def _interval_subtract_years(dt: datetime, years: int) -> datetime:
year = dt.year - years
day = min(dt.day, monthrange(year, dt.month)[1])
return dt.replace(year=year, day=day)
def relative_interval(dt: datetime, s: str | timedelta) -> datetime:
if isinstance(s, timedelta):
return dt - s
amounts = {
"S": lambda dt, n: dt - timedelta(seconds=n),
"M": lambda dt, n: dt - timedelta(minutes=n),
"H": lambda dt, n: dt - timedelta(hours=n),
"d": lambda dt, n: dt - timedelta(days=n),
"w": lambda dt, n: dt - timedelta(weeks=n),
"m": _interval_subtract_months,
"y": _interval_subtract_years,
}
if s.endswith(tuple(amounts.keys())):
try:
number = int(s[:-1])
suffix = s[-1]
except ValueError:
raise ArgumentTypeError(f'Invalid number "{s[:-1]}": expected integer')
if number < 0:
raise ArgumentTypeError(f'Invalid number "{number}": expected nonnegative integer')
else:
raise ArgumentTypeError(f'Unexpected time unit "{s[-1]}": choose from {", ".join(amounts)}')
return amounts[suffix](dt, number)| that time window. When ``--since`` is given together with an interval | ||
| retention, the interval is measured backwards from that timestamp | ||
| instead of from the current time. See ``Date and Time`` docs for exact | ||
| INTERVAL format. |
There was a problem hiding this comment.
Absolutely: "from" is less connected to time - what is precisely its advantage IMO.
Unless a different context is given, I will always assume that "since" is time-related and follows the default "from past to present" mental model. borg prune --keep-daily 1w --since X only works because --keep-daily changes the context to "from present to past" first. borg prune --since X --keep-daily 1w reads like "Prune since X", because nothing has changed the context yet - and "Prune since X" is just wrong.
--keep-since solves that. However, I agree that --keep-since could be misread like "keep everything since".
I don't really see that risk with --keep-from though, again precisely because "from" isn't strongly connected to time. Or maybe just --from?
The one option that would essentially remove most room for interpretation is something like --base-time. But it's also the least intuitive. It at least forces users to read the docs… 😄
| exactly one week before `--since` and so would be excluded and pruned in this | ||
| prune run. | ||
|
|
||
| 2026-03-31 was skipped, so 2026-03-30 is the monthly candidate for that month. |
There was a problem hiding this comment.
You aren't mentioning 2026-05-31 here, even though it's kinda special, isn't it? 2026-05-31 isn't just a candidate for --keep-daily, but also for --keep-monthly. Naturally archives aren't matched by multiple rules, but it's a great example of how interval-based policies behave a little different.
In my PR I went a step further to even include --keep-weekly. WDYT?
| # Keep the last 14 archives using `--keep` (same as `--keep-last 14`): | ||
| $ borg prune -v --list --dry-run --keep 14 | ||
|
|
||
| # Keep all archives from the last 30 days using `--keep` (same as `--keep-within 30d`): |
There was a problem hiding this comment.
--keep-within was removed
| @@ -44,8 +51,24 @@ Do not forget to run ``borg compact -v`` after prune to actually free disk space | |||
| # and an end of month archive for every month: | |||
| $ borg prune -v --list --keep-within=10d --keep-weekly=4 --keep-monthly=-1 | |||
There was a problem hiding this comment.
--keep-within was removed
| $ borg prune -v --list --dry-run --keep 14 | ||
|
|
||
| # Keep all archives from the last 30 days using `--keep` (same as `--keep-within 30d`): | ||
| $ borg prune -v --list --dry-run --keep 30d |
There was a problem hiding this comment.
--keep 30d made me thinking: Isn't that identical to --since "$(date --date='-30 days' +'%FT%T%:z')"? Since we still want --keep 14 (i.e., with a count), --since can't ever replace --keep, but for my personal understanding: Is it identical?
I have a few ideas then 🤔 But that's nothing for today or this PR, but maybe for a follow-up…
| --since '2026-06-04 16:00' \ | ||
| --keep-daily 1w \ | ||
| --keep-monthly 5m \ | ||
| --keep-yearly 2 |
There was a problem hiding this comment.
Mixing count-based and interval-based rules is a great idea for an example! However, --keep-yearly 2y wouldn't have made a difference, right? WDYT about adding some weekly archives that show how --keep-weekly 4 behaves different from --keep-weekly 4w?
| assert "test%02d" % i not in output | ||
|
|
||
|
|
||
| # This test must match docs/misc/prune-example-interval.txt |
There was a problem hiding this comment.
When changing the example as suggested, this also needs to be updated.
This PR adds optional interval handling for all retention filter flags of the
prunecommand, previously only available on--keep-within. E.g.prune --keep-hourly=7dwill keep hourly archives for the last 7 days regardless of count. Adds--keepwhich acts as a combination of--keep-lastand--keep-within.Implements #8637.
I opted to make the existing flags handle both ints and intervals instead of adding new flags as there are already so many flags for this command. Simplified some prune filtering: With the default filter function now also handling intervals,
prune_withinis no longer needed as a special case.I added a library to freeze time in testing, let me know if that's not wanted and I'll figure out something else. It's such a hassle to deal with timestamps relative to
now()in test. The tests should be fairly comprehensive in checking both their timely filter (hourly/yearly, etc.) and the new inclusive timestamp check. I did not add new helper tests forprune_splitas this function is not used anywhere other thanprune_cmdand isn't really a helper.TODOs:
TODOs from comments:
Notes:
--keep: Merges functionality of--keep-withinand--keep-lastwith new int-or-interval handling.--keep-lastis no longer an alias of--keep-secondlyand thus keeps archives made on the same second. It now fits together with--keep-withinand new flag--keep.int_or_intervaland creatingtimedeltaobjects it seemed weird to restrict input like this. Not extremely useful unless you want to prune on archives in the same second as they were created, but it seemed logical when setting up tests to verify new--keep-lastbehavior.. Technically breaking, but will likely not meaningfully affect any real scenarios.xx:xx:10and there's an archive atxx:xx:05then--keep-within 5Sshould cover that archive. Easy change to make when already altering the filtering logic. Technically breaking, but will likely not meaningfully affect any real scenarios.This has been a pet peeve of mine in the pruning command for a long time. In my mind the most clean backup regime keeps all backups for a short time (allows catching small errors quickly), then hourly backups for a reasonable time (say, 7 days), then daily backups for a little longer and then finally weekly/monthly as storage permits. This was not previously possible, requiring for example
--keep-within 7d --keep-hourly 168 --keep-daily 14 --keep-weekly -1for an approximation. But keeping around 168 archives for a machine that's only running a few hours a day seems mighty excessive. So here we are :)With this implemented my ideal retention for my primary laptop with archives every 15m looks something like
--keep 3d --keep-hourly 7d --keep-daily 30d --keep-weekly -1.