italian: support 'un ora fa', 'un'ora fa', 'oggi alle 11:00'.#1049
Conversation
|
Is this ready to merge? |
|
Hi @realtimeprojects |
|
@serhii73: yes, try do do it asap. |
- Add month locative/dative forms (lednu, únoru, březnu, dubnu, květnu, červnu, červenci, srpnu, říjnu, prosinci) for dates like "v lednu 2023" - Add July abbreviation "črv" - Fix "za měsíc/týden/rok" not parsing: add accusative simplification patterns (previous patterns only matched instrumental -em forms) - Add tests covering the new expressions Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Finishes PR scrapinghub#1049 and resolves conflicts with master: - Keep master's improved "(\d+[.,]?\d*)\s+ora" simplification and add a new one mapping "un ora"/"un'ora" to "1 ore". - Anchor the new simplification with word boundaries (\bun[' ]ora\b) so it no longer corrupts words such as "un orario" / "ciascun orario". - Add "alle" to the skip tokens so "oggi alle 11:00" parses as "oggi 11:00". - Regenerate it.py from it.yaml. - Drop an accidental duplicate test line from the original PR and add a regression test that locks in the word-boundary fix. Co-Authored-By: Serhii A <serhii@zyte.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1049 +/- ##
=======================================
Coverage 97.11% 97.11%
=======================================
Files 235 235
Lines 2909 2909
=======================================
Hits 2825 2825
Misses 84 84 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR extends Italian (it) date-string translation support to handle common colloquial/connector forms so they translate cleanly into the normalized English representation used by the parser pipeline.
Changes:
- Add Italian translation test cases for
"un ora fa","un'ora fa", and"oggi (alle) 11:00", plus a regression guard to avoid corrupting words like"orario". - Update Italian translation data to treat
"alle"as a skip word and to simplifyun ora/un'orainto a numeric-hour form (1 ore) compatible with existing relative regex handling. - Mirror the same updates in supplementary YAML so generated translation data remains consistent.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/test_languages.py | Adds regression tests covering the new Italian forms and a non-regression case for "un orario". |
| dateparser/data/date_translation_data/it.py | Updates bundled Italian translation rules: add "alle" to skip and add a simplification for un ora / un'ora. |
| dateparser_data/supplementary_language_data/date_translation_data/it.yaml | Updates supplementary Italian translation source data to match the new skip/simplification behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
No description provided.