Skip to content

italian: support 'un ora fa', 'un'ora fa', 'oggi alle 11:00'.#1049

Merged
serhii73 merged 3 commits into
scrapinghub:masterfrom
realtimeprojects:extend-italian
Jun 23, 2026
Merged

italian: support 'un ora fa', 'un'ora fa', 'oggi alle 11:00'.#1049
serhii73 merged 3 commits into
scrapinghub:masterfrom
realtimeprojects:extend-italian

Conversation

@realtimeprojects

Copy link
Copy Markdown
Contributor

No description provided.

@ghost

ghost commented Sep 16, 2022

Copy link
Copy Markdown

Is this ready to merge?
Is there something more I can do?

@ghost ghost mentioned this pull request Sep 16, 2022
@serhii73

Copy link
Copy Markdown
Collaborator

Hi @realtimeprojects
Thank you for your PR.
Could you please resolve the conflicts? Thanks in advance.

@realtimeprojects

Copy link
Copy Markdown
Contributor Author

@serhii73: yes, try do do it asap.

serhii73 and others added 3 commits June 22, 2026 11:16
- Add month locative/dative forms (lednu, únoru, březnu, dubnu, květnu,
  červnu, červenci, srpnu, říjnu, prosinci) for dates like "v lednu 2023"
- Add July abbreviation "črv"
- Fix "za měsíc/týden/rok" not parsing: add accusative simplification
  patterns (previous patterns only matched instrumental -em forms)
- Add tests covering the new expressions

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Finishes PR scrapinghub#1049 and resolves conflicts with master:

- Keep master's improved "(\d+[.,]?\d*)\s+ora" simplification and add a new
  one mapping "un ora"/"un'ora" to "1 ore".
- Anchor the new simplification with word boundaries (\bun[' ]ora\b) so it no
  longer corrupts words such as "un orario" / "ciascun orario".
- Add "alle" to the skip tokens so "oggi alle 11:00" parses as "oggi 11:00".
- Regenerate it.py from it.yaml.
- Drop an accidental duplicate test line from the original PR and add a
  regression test that locks in the word-boundary fix.

Co-Authored-By: Serhii A <serhii@zyte.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.11%. Comparing base (eb111c2) to head (3c328ab).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1049   +/-   ##
=======================================
  Coverage   97.11%   97.11%           
=======================================
  Files         235      235           
  Lines        2909     2909           
=======================================
  Hits         2825     2825           
  Misses         84       84           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Italian (it) date-string translation support to handle common colloquial/connector forms so they translate cleanly into the normalized English representation used by the parser pipeline.

Changes:

  • Add Italian translation test cases for "un ora fa", "un'ora fa", and "oggi (alle) 11:00", plus a regression guard to avoid corrupting words like "orario".
  • Update Italian translation data to treat "alle" as a skip word and to simplify un ora / un'ora into a numeric-hour form (1 ore) compatible with existing relative regex handling.
  • Mirror the same updates in supplementary YAML so generated translation data remains consistent.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/test_languages.py Adds regression tests covering the new Italian forms and a non-regression case for "un orario".
dateparser/data/date_translation_data/it.py Updates bundled Italian translation rules: add "alle" to skip and add a simplification for un ora / un'ora.
dateparser_data/supplementary_language_data/date_translation_data/it.yaml Updates supplementary Italian translation source data to match the new skip/simplification behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@serhii73 serhii73 merged commit 33e913c into scrapinghub:master Jun 23, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants