fix: %j (day of year) format directive overwrote parsed month/day#1345
Merged
Conversation
…h_formats
When a user-supplied date format contained %j (day of year), parse_with_formats
incorrectly treated the parsed date as having no month and no day. This caused
it to overwrite the correctly-parsed month and day with the current-date defaults
instead of preserving what strptime extracted.
Root cause: missing_day was computed as '"%d" not in date_format', missing %j
(and %-j) as day-of-year directives. A successful strptime with %j always
populates both the day and month fields (e.g. day 100 of 2023 = April 10).
The pre-existing _get_missing_parts helper already listed %j in the day
directive mapping but not in month.
Fix: add %j and %-j to the month directive mapping in _get_missing_parts, and
replace the ad-hoc missing_month/missing_day checks in parse_with_formats with
_get_missing_parts, which is the authoritative source for this logic.
Before:
dateparser.parse("2023-100", date_formats=["%Y-%j"])
-> datetime(2023, 6, 25) # wrong: June 25 = today
After:
dateparser.parse("2023-100", date_formats=["%Y-%j"])
-> datetime(2023, 4, 10) # correct: day 100 of 2023
AdrianAtZyte
approved these changes
Jun 25, 2026
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1345 +/- ##
=======================================
Coverage 97.11% 97.11%
=======================================
Files 235 235
Lines 2909 2910 +1
=======================================
+ Hits 2825 2826 +1
Misses 84 84 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes incorrect behavior in parse_with_formats where using the %j (day-of-year) directive could cause the parsed month/day to be overwritten by “today’s” month/day, by centralizing “missing parts” detection in the existing _get_missing_parts utility.
Changes:
- Treat
%j/%-jas providing month information in_get_missing_parts(since day-of-year implies month/day). - Replace ad-hoc “missing month/day” checks in
parse_with_formatswith_get_missing_parts. - Add regression tests covering
%jparsing for day-of-year inputs (including leap-year behavior).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
dateparser/date.py |
Uses _get_missing_parts to correctly detect which parts are missing for custom strptime formats, avoiding incorrect overwrites when %j is present. |
dateparser/utils/__init__.py |
Updates directive mapping so %j / %-j count as providing month information. |
tests/test_date.py |
Adds regression coverage for %j day-of-year parsing behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+377
to
+395
| @parameterized.expand( | ||
| [ | ||
| param( | ||
| date_string="2023-100", | ||
| date_formats=["%Y-%j"], | ||
| expected_result=datetime(2023, 4, 10), | ||
| ), | ||
| param( | ||
| date_string="2024-060", | ||
| date_formats=["%Y-%j"], | ||
| expected_result=datetime(2024, 2, 29), | ||
| ), | ||
| param( | ||
| date_string="2023 060", | ||
| date_formats=["%Y %j"], | ||
| expected_result=datetime(2023, 3, 1), | ||
| ), | ||
| ] | ||
| ) |
serhii73
approved these changes
Jun 26, 2026
Collaborator
|
Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
When a format string passed to
parse_with_formats(viadateparser.parse(..., date_formats=[...])orDateDataParser.get_date_data(..., date_formats=[...])) contained%j(day of year), the correctly-parsed month and day were silently replaced with today's date:This also affects
%-j(no-padding variant).Root cause
parse_with_formatscomputed:Neither check recognises
%j. A successfulstrptimecall with%jalways populates both themonthanddayfields in the returneddatetimeobject, so the parsed values should be kept as-is.The existing
_get_missing_partshelper indateparser/utils/__init__.pyalready listed%jand%-junder the"day"directive, but not under"month". This was a secondary inconsistency: since%jencodes a day-of-year that uniquely determines the month, it implicitly provides month information.Fix
%jand%-jto the"month"directive list in_get_missing_parts.missing_month/missing_dayexpressions inparse_with_formatswith a single_get_missing_partscall, which is the authoritative and already-tested source for this logic.All 22 703 existing tests continue to pass. Three new regression tests are included.
This pull request was prepared with the assistance of AI, under my direction and review.