Skip to content

Fix inconsistent behavior with ALL-CAPS strings containing separators#1583

Open
ManojLingala wants to merge 4 commits into
Humanizr:mainfrom
ManojLingala:fix-uppercase-inconsistent-behavior
Open

Fix inconsistent behavior with ALL-CAPS strings containing separators#1583
ManojLingala wants to merge 4 commits into
Humanizr:mainfrom
ManojLingala:fix-uppercase-inconsistent-behavior

Conversation

@ManojLingala

Copy link
Copy Markdown

Fixes #1557: Humanize(), ApplyCase(), and Transform() methods now handle ALL-CAPS strings with underscores or hyphens more consistently.

Changes:

  • Modified StringHumanizeExtensions.Humanize() to process ALL-CAPS strings that contain separators (underscore/hyphen) instead of preserving them
  • Updated ToTitleCase transformer to handle words with separators
  • Enhanced ToSentenceCase transformer to convert separator-containing ALL-CAPS strings to sentence case
  • Added comprehensive unit tests for the new behavior

Examples of improvements:

  • "LONGER_WORD".Humanize() now returns "LONGER WORD" instead of "LONGER_WORD"
  • "HYPEN-SEPARATOR".Transform(To.SentenceCase) now returns "Hypen-separator"
  • "LONGER_WORD".Transform(To.TitleCase) now returns "Longer_word"

Regular ALL-CAPS words without separators are still preserved as potential acronyms to maintain backward compatibility.

Here is a checklist you should tick through before submitting a pull request:

  • [√ ] Implementation is clean
  • [ √ ] Code adheres to the existing coding standards; e.g. no curlies for one-line blocks, no redundant empty lines between methods or code blocks, spaces rather than tabs, etc.
  • [ √ ] No Code Analysis warnings
  • [ √ ] There is proper unit test coverage
  • [ √ ] If the code is copied from StackOverflow (or a blog or OSS) full disclosure is included. That includes required license files and/or file headers explaining where the code came from with proper attribution
  • [ √ ] There are very few or no comments (because comments shouldn't be needed if you write clean code)
  • [ √ ] Xml documentation is added/updated for the addition/change
  • [ √ ] Your PR is (re)based on top of the latest commits from the main branch (more info below)
  • [ √ ] Link to the issue(s) you're fixing from your PR description. Use fixes #<the issue number>
  • [ √ ] Readme is updated if you change an existing feature or add a new one
  • [ √ ] Run either build.cmd or build.ps1 and ensure there are no test failures

Fixes Humanizr#1557: Humanize(), ApplyCase(), and Transform() methods now handle
ALL-CAPS strings with underscores or hyphens more consistently.

Changes:
- Modified StringHumanizeExtensions.Humanize() to process ALL-CAPS strings
  that contain separators (underscore/hyphen) instead of preserving them
- Updated ToTitleCase transformer to handle words with separators
- Enhanced ToSentenceCase transformer to convert separator-containing
  ALL-CAPS strings to sentence case
- Added comprehensive unit tests for the new behavior

Examples of improvements:
- "LONGER_WORD".Humanize() now returns "LONGER WORD" instead of "LONGER_WORD"
- "HYPEN-SEPARATOR".Transform(To.SentenceCase) now returns "Hypen-separator"
- "LONGER_WORD".Transform(To.TitleCase) now returns "Longer_word"

Regular ALL-CAPS words without separators are still preserved as potential
acronyms to maintain backward compatibility.
The LONGER_WORD test cases should expect 'LONGER WORD' output after
humanization, as the ALL-CAPS words are preserved when converted to
multi-word format.
@ManojLingala

Copy link
Copy Markdown
Author

@dotnet-policy-service agree

ManojLingala and others added 2 commits July 26, 2025 16:15
This commit resolves issue Humanizr#1557 where ALL-CAPS strings with separators
(underscores/hyphens) were inconsistently handled by Humanize() and
Transform() methods.

## Changes Made:

### Enhanced ToTitleCase Transformer:
- Added context-aware acronym detection via `ContainsMultipleAllCapsWords()`
- Distinguishes between single words (preserve more acronyms) vs multi-word
  results from separator-based input (transform more aggressively)
- Preserves genuine acronyms like "HELLO", "HTML", "ALLCAPS" while transforming
  words from separator-based input like "LONGER WORD" → "Longer Word"

### Improved ToSentenceCase Transformer:
- Added logic to handle both direct transformer usage and integration with Humanize()
- Preserves acronyms in mixed-case contexts but transforms ALL-CAPS words
  when all words are caps (indicating separator-based origin)
- Maintains backward compatibility for existing sentence case behavior

### Updated Test Cases:
- Added test case for "LONGER_WORD" → "Longer word" (sentence case)
- Updated test expectation for "LONGER_WORD" → "Longer Word" (title case)
- All existing tests continue to pass

## Behavior Fixed:
- **"HELLO" → "HELLO"** (preserves standalone acronyms)
- **"LONGER_WORD" → "Longer Word"** (title case for separator-based input)
- **"LONGER_WORD" → "Longer word"** (sentence case for separator-based input)
- **"honors UPPER case" → "Honors UPPER case"** (preserves acronyms in mixed contexts)

## Test Results:
- StringHumanizeTests: 57/57 passing
- TransformersTests: 23/23 passing
- All target frameworks (.NET 10.0, 8.0, Framework 4.8) passing

Fixes Humanizr#1557
@clairernovotny clairernovotny requested a review from Copilot October 9, 2025 04:04

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes inconsistent behavior with ALL-CAPS strings containing separators (underscores or hyphens) by ensuring they are properly humanized and transformed instead of being preserved as potential acronyms.

  • Updated the Humanize() method to process ALL-CAPS strings with separators instead of preserving them unchanged
  • Enhanced ToTitleCase and ToSentenceCase transformers to handle separator-containing strings appropriately
  • Added comprehensive test coverage for the new behavior while maintaining backward compatibility for regular ALL-CAPS words

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
StringHumanizeExtensions.cs Modified Humanize() method to exclude strings with separators from acronym preservation
ToTitleCase.cs Added logic to detect and transform multi-word strings from separator-based input while preserving short acronyms
ToSentenceCase.cs Enhanced to handle separator-containing ALL-CAPS strings and multi-word transformations
TransformersTests.cs Added test cases for the new transformer behavior
StringHumanizeTests.cs Added test cases covering the updated Humanize() and casing methods

Comment thread src/Humanizer/Transformer/ToTitleCase.cs
[InlineData("", "")]
[InlineData("JeNeParlePasFrançais", "Je ne parle pas français")]
[InlineData("LONGER_WORD", "LONGER WORD")] // Issue #1557: ALL-CAPS with separators should be humanized to separated words
[InlineData("HELLO", "HELLO")] // ALL-CAPS words without separators should be preserved as potential acronyms

Copilot AI Oct 9, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove trailing whitespace at the end of the comment line.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +81
foreach (var word in words)
{
if (word.Any(char.IsLetter))
{
totalWordCount++;
if (word.All(char.IsUpper))
{
allCapsCount++;
}
}
}

Check notice

Code scanning / CodeQL

Missed opportunity to use Where Note

This foreach loop
implicitly filters its target sequence
- consider filtering the sequence explicitly using '.Where(...)'.
@clairernovotny

Copy link
Copy Markdown
Member

Considering this as it could change too much. Perhaps an analyzer could detect this case and warn/suggest doing making it lower case first? Just needs some thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Humanize(), ApplyCase(), and Transform() all produce incorrect result when the input is in UPPERCASE

4 participants