Skip to content

fix: %j (day of year) format directive overwrote parsed month/day#1345

Merged
serhii73 merged 1 commit into
scrapinghub:masterfrom
gaoflow:fix/day-of-year-format-j
Jun 26, 2026
Merged

fix: %j (day of year) format directive overwrote parsed month/day#1345
serhii73 merged 1 commit into
scrapinghub:masterfrom
gaoflow:fix/day-of-year-format-j

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Bug

When a format string passed to parse_with_formats (via dateparser.parse(..., date_formats=[...]) or DateDataParser.get_date_data(..., date_formats=[...])) contained %j (day of year), the correctly-parsed month and day were silently replaced with today's date:

import dateparser

# Day 100 of 2023 is April 10
dateparser.parse("2023-100", date_formats=["%Y-%j"])
# Before fix: datetime(2023, 6, 25)  ← today, wrong
# After fix:  datetime(2023, 4, 10)  ← correct

This also affects %-j (no-padding variant).

Root cause

parse_with_formats computed:

missing_month = not any(m in date_format for m in ["%m", "%b", "%B"])
missing_day = "%d" not in date_format

Neither check recognises %j. A successful strptime call with %j always populates both the month and day fields in the returned datetime object, so the parsed values should be kept as-is.

The existing _get_missing_parts helper in dateparser/utils/__init__.py already listed %j and %-j under the "day" directive, but not under "month". This was a secondary inconsistency: since %j encodes a day-of-year that uniquely determines the month, it implicitly provides month information.

Fix

  1. Add %j and %-j to the "month" directive list in _get_missing_parts.
  2. Replace the ad-hoc missing_month/missing_day expressions in parse_with_formats with a single _get_missing_parts call, which is the authoritative and already-tested source for this logic.

All 22 703 existing tests continue to pass. Three new regression tests are included.


This pull request was prepared with the assistance of AI, under my direction and review.

…h_formats

When a user-supplied date format contained %j (day of year), parse_with_formats
incorrectly treated the parsed date as having no month and no day. This caused
it to overwrite the correctly-parsed month and day with the current-date defaults
instead of preserving what strptime extracted.

Root cause: missing_day was computed as '"%d" not in date_format', missing %j
(and %-j) as day-of-year directives. A successful strptime with %j always
populates both the day and month fields (e.g. day 100 of 2023 = April 10).
The pre-existing _get_missing_parts helper already listed %j in the day
directive mapping but not in month.

Fix: add %j and %-j to the month directive mapping in _get_missing_parts, and
replace the ad-hoc missing_month/missing_day checks in parse_with_formats with
_get_missing_parts, which is the authoritative source for this logic.

Before:
  dateparser.parse("2023-100", date_formats=["%Y-%j"])
  -> datetime(2023, 6, 25)  # wrong: June 25 = today

After:
  dateparser.parse("2023-100", date_formats=["%Y-%j"])
  -> datetime(2023, 4, 10)  # correct: day 100 of 2023
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.11%. Comparing base (33e913c) to head (d85c420).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1345   +/-   ##
=======================================
  Coverage   97.11%   97.11%           
=======================================
  Files         235      235           
  Lines        2909     2910    +1     
=======================================
+ Hits         2825     2826    +1     
  Misses         84       84           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect behavior in parse_with_formats where using the %j (day-of-year) directive could cause the parsed month/day to be overwritten by “today’s” month/day, by centralizing “missing parts” detection in the existing _get_missing_parts utility.

Changes:

  • Treat %j / %-j as providing month information in _get_missing_parts (since day-of-year implies month/day).
  • Replace ad-hoc “missing month/day” checks in parse_with_formats with _get_missing_parts.
  • Add regression tests covering %j parsing for day-of-year inputs (including leap-year behavior).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
dateparser/date.py Uses _get_missing_parts to correctly detect which parts are missing for custom strptime formats, avoiding incorrect overwrites when %j is present.
dateparser/utils/__init__.py Updates directive mapping so %j / %-j count as providing month information.
tests/test_date.py Adds regression coverage for %j day-of-year parsing behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_date.py
Comment on lines +377 to +395
@parameterized.expand(
[
param(
date_string="2023-100",
date_formats=["%Y-%j"],
expected_result=datetime(2023, 4, 10),
),
param(
date_string="2024-060",
date_formats=["%Y-%j"],
expected_result=datetime(2024, 2, 29),
),
param(
date_string="2023 060",
date_formats=["%Y %j"],
expected_result=datetime(2023, 3, 1),
),
]
)
@serhii73

Copy link
Copy Markdown
Collaborator

Thanks!

@serhii73 serhii73 merged commit 273eb23 into scrapinghub:master Jun 26, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants