fix(softbreak): Incorrect extra spaces in CJK paragraph soft line breaks#548
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughA new ChangesSoftBreak Inline Node and Locale-Aware Rendering
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
4b70f47 to
d82f6d5
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
quarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.kt (1)
277-277: ⚖️ Poor tradeoffConsider extracting duplicate SoftBreak rendering logic. Both renderers implement identical locale-aware whitespace logic:
if (context.documentInfo.locale.isCJK()) "" else " ". While this is a simple one-liner, the coding guidelines emphasize avoiding repetitive code "at all costs."
quarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.kt#L277-L277: Extract the SoftBreak rendering decision to a shared utility function in the core module.quarkdown-plaintext/src/main/kotlin/com/quarkdown/rendering/plaintext/node/PlainTextNodeRenderer.kt#L177-L177: Use the same shared utility function.Given the cross-module nature and simplicity of the code, this refactor can be deferred if the added abstraction outweighs the benefit.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@quarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.kt` at line 277, Extract the duplicated SoftBreak whitespace logic into a shared utility in the core module (e.g., add function softBreakFor(locale: DocumentLocale): String in a new util object SoftBreakUtils under package com.quarkdown.util that returns "" when locale.isCJK() else " "), then call that utility from both renderers: in quarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.kt (277-277) replace the inline if with SoftBreakUtils.softBreakFor(context.documentInfo.locale), and in quarkdown-plaintext/src/main/kotlin/com/quarkdown/rendering/plaintext/node/PlainTextNodeRenderer.kt (177-177) do the same replacement so both visit(node: SoftBreak) implementations delegate to the shared function.Source: Coding guidelines
quarkdown-core/src/main/kotlin/com/quarkdown/core/localization/Locale.kt (1)
61-61: SimplifyLocale?.isCJK()—codeandshortTagare redundant with current implementations.
quarkdown-core/src/main/kotlin/com/quarkdown/core/localization/Locale.ktline 61 checks bothcode.lowercase()andshortTag.lowercase()againstCJK_LANGUAGE_CODES, but the soleLocaleimplementation (JVMLocale) sets bothcodeandshortTagtojvmLocale.language, so they are always identical. Consider checking onlycode(or document why both are needed for potential futureLocaleimplementations).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@quarkdown-core/src/main/kotlin/com/quarkdown/core/localization/Locale.kt` at line 61, Simplify the nullable extension Locale?.isCJK(): replace the redundant dual checks with a single check against code (i.e. implement as this != null && code.lowercase() in CJK_LANGUAGE_CODES) and remove shortTag from the condition; if you prefer to preserve future-proofing, instead add a brief comment on Locale/ JVMLocale noting that code and shortTag are identical in the current JVMLocale implementation so only code is checked.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@quarkdown-core/src/main/kotlin/com/quarkdown/core/parser/BlockTokenParser.kt`:
- Around line 91-111: The replaceTextSoftBreaks implementation creates empty
Text("") nodes when splitting on '\n'; update
InlineContent.replaceTextSoftBreaks so you skip empty split parts and only emit
Text(part) for non-empty parts while still emitting a SoftBreak between parts
(but not after the final part). Concretely, in
InlineContent.replaceTextSoftBreaks (the function handling Text and SoftBreak
nodes), replace the current split(...).flatMap { ... }.dropLast(1) logic with a
loop or flatMapIndexed approach that: splits the text on '\n', iterates parts
with their index, emits Text(part) only if part.isNotEmpty(), and after each
part except the last emits a SoftBreak — this preserves SoftBreak counts for
consecutive/newline-only inputs while avoiding empty Text nodes.
---
Nitpick comments:
In `@quarkdown-core/src/main/kotlin/com/quarkdown/core/localization/Locale.kt`:
- Line 61: Simplify the nullable extension Locale?.isCJK(): replace the
redundant dual checks with a single check against code (i.e. implement as this
!= null && code.lowercase() in CJK_LANGUAGE_CODES) and remove shortTag from the
condition; if you prefer to preserve future-proofing, instead add a brief
comment on Locale/ JVMLocale noting that code and shortTag are identical in the
current JVMLocale implementation so only code is checked.
In
`@quarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.kt`:
- Line 277: Extract the duplicated SoftBreak whitespace logic into a shared
utility in the core module (e.g., add function softBreakFor(locale:
DocumentLocale): String in a new util object SoftBreakUtils under package
com.quarkdown.util that returns "" when locale.isCJK() else " "), then call that
utility from both renderers: in
quarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.kt
(277-277) replace the inline if with
SoftBreakUtils.softBreakFor(context.documentInfo.locale), and in
quarkdown-plaintext/src/main/kotlin/com/quarkdown/rendering/plaintext/node/PlainTextNodeRenderer.kt
(177-177) do the same replacement so both visit(node: SoftBreak) implementations
delegate to the shared function.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 19923fd7-0acd-4389-885d-d876bfc71595
📒 Files selected for processing (17)
quarkdown-core/src/main/kotlin/com/quarkdown/core/ast/base/inline/SoftBreak.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/ast/dsl/InlineAstBuilder.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/tokens/InlineTokens.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/localization/Locale.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/parser/BlockTokenParser.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/util/node/NodeUtils.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/visitor/node/NodeVisitor.ktquarkdown-core/src/test/kotlin/com/quarkdown/core/BlockParserTest.ktquarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.ktquarkdown-plaintext/src/main/kotlin/com/quarkdown/rendering/plaintext/node/PlainTextNodeRenderer.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/FunctionDefinitionTest.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/LayoutTest.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/MathFunctionsTest.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/TextTest.ktquarkdown-test/src/test/resources/data/search-index/search-index-no-headings-no-metadata.jsonquarkdown-test/src/test/resources/data/search-index/search-index-no-headings-with-metadata.jsonquarkdown-test/src/test/resources/data/search-index/search-index-with-headings.json
sorry about that! Quarkdown is meant to be globally accurate 🌍 but there are many typesetting rules I'm not familiar with. Feedback is crucial! |
d512714 to
a7f74be
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
quarkdown-core/src/main/kotlin/com/quarkdown/core/parser/InlineTokenParser.kt (1)
161-164: ⚡ Quick winAdd documentation for the hard/soft break distinction logic.
The method now performs a non-trivial semantic distinction between hard and soft line breaks based on the first character. Per the coding guidelines, medium-sized documentation should be added for "non-public [methods] when the logic is not straightforward."
📝 Suggested documentation
+ /** + * Converts a [LineBreakToken] to either a [LineBreak] (hard break) or [SoftBreak] (soft break). + * A line break is hard if preceded by two or more spaces or a backslash; otherwise it is soft. + */ override fun visit(token: LineBreakToken): Node {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@quarkdown-core/src/main/kotlin/com/quarkdown/core/parser/InlineTokenParser.kt` around lines 161 - 164, Add KDoc documentation to the visit method that accepts a LineBreakToken parameter in the InlineTokenParser class. The documentation should explain the non-trivial semantic distinction being performed: that the method checks if the first character of the token data is a space or backslash to determine whether to return a hard LineBreak or a soft SoftBreak, and should clarify the significance of this distinction for consumers of this method.Source: Coding guidelines
quarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/patterns/BaseMarkdownInlineTokenRegexPatterns.kt (2)
90-90: 💤 Low valueThe capturing group in the regex appears unused.
The regex contains a capturing group
( {2,}|\\\\)inside the optional non-capturing group, but the parser only checkstoken.data.text.first()rather than using captured groups. Consider making it non-capturing for clarity:(?:(?: {2,}|\\\\))?\\R(?!\\s*$).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@quarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/patterns/BaseMarkdownInlineTokenRegexPatterns.kt` at line 90, The regex pattern in BaseMarkdownInlineTokenRegexPatterns.kt contains a capturing group `( {2,}|\\\\)` that is not being used by the parser (which only checks token.data.text.first()). Convert this capturing group to a non-capturing group by changing the opening parenthesis from a standard group `(` to a non-capturing group `(?:` to improve code clarity and match the actual parsing behavior.
85-92: ⚡ Quick winConsider renaming to clarify that this pattern matches all line breaks.
The property name
hardLineBreaksuggests it matches only hard line breaks, but the regex(?:( {2,}|\\\\))?\\R(?!\\s*$)with the optional prefix matches BOTH hard and soft line breaks. The parser downstream (InlineTokenParser.kt lines 161-164) distinguishes between them by inspecting the first character.Consider renaming to
lineBreakor adding an explicit comment explaining that this pattern matches all line breaks and the parser later distinguishes hard vs soft. This would prevent future confusion and align the name with the actual behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@quarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/patterns/BaseMarkdownInlineTokenRegexPatterns.kt` around lines 85 - 92, The property name hardLineBreak is misleading because its regex pattern with an optional prefix group (?:( {2,}|\\\\))? actually matches both hard line breaks (with spacing or backslash prefix) and soft line breaks (without the prefix). Rename the property from hardLineBreak to lineBreak or add an explicit comment explaining that this pattern matches all line breaks and the downstream parser (referenced in InlineTokenParser.kt) later distinguishes between hard and soft types based on the captured prefix. This will align the property name with its actual behavior and prevent future confusion.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In
`@quarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/patterns/BaseMarkdownInlineTokenRegexPatterns.kt`:
- Line 90: The regex pattern in BaseMarkdownInlineTokenRegexPatterns.kt contains
a capturing group `( {2,}|\\\\)` that is not being used by the parser (which
only checks token.data.text.first()). Convert this capturing group to a
non-capturing group by changing the opening parenthesis from a standard group
`(` to a non-capturing group `(?:` to improve code clarity and match the actual
parsing behavior.
- Around line 85-92: The property name hardLineBreak is misleading because its
regex pattern with an optional prefix group (?:( {2,}|\\\\))? actually matches
both hard line breaks (with spacing or backslash prefix) and soft line breaks
(without the prefix). Rename the property from hardLineBreak to lineBreak or add
an explicit comment explaining that this pattern matches all line breaks and the
downstream parser (referenced in InlineTokenParser.kt) later distinguishes
between hard and soft types based on the captured prefix. This will align the
property name with its actual behavior and prevent future confusion.
In
`@quarkdown-core/src/main/kotlin/com/quarkdown/core/parser/InlineTokenParser.kt`:
- Around line 161-164: Add KDoc documentation to the visit method that accepts a
LineBreakToken parameter in the InlineTokenParser class. The documentation
should explain the non-trivial semantic distinction being performed: that the
method checks if the first character of the token data is a space or backslash
to determine whether to return a hard LineBreak or a soft SoftBreak, and should
clarify the significance of this distinction for consumers of this method.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d35ea9b6-6876-46ec-9e5a-b11e37d6646f
📒 Files selected for processing (19)
quarkdown-core/src/main/kotlin/com/quarkdown/core/ast/base/inline/SoftBreak.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/ast/dsl/InlineAstBuilder.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/flavor/base/BaseMarkdownLexerFactory.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/patterns/BaseMarkdownInlineTokenRegexPatterns.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/tokens/InlineTokens.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/localization/Locale.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/parser/InlineTokenParser.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/util/node/NodeUtils.ktquarkdown-core/src/main/kotlin/com/quarkdown/core/visitor/node/NodeVisitor.ktquarkdown-core/src/test/kotlin/com/quarkdown/core/BlockParserTest.ktquarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.ktquarkdown-plaintext/src/main/kotlin/com/quarkdown/rendering/plaintext/node/PlainTextNodeRenderer.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/FunctionDefinitionTest.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/LayoutTest.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/MathFunctionsTest.ktquarkdown-test/src/test/kotlin/com/quarkdown/test/TextTest.ktquarkdown-test/src/test/resources/data/search-index/search-index-no-headings-no-metadata.jsonquarkdown-test/src/test/resources/data/search-index/search-index-no-headings-with-metadata.jsonquarkdown-test/src/test/resources/data/search-index/search-index-with-headings.json
✅ Files skipped from review due to trivial changes (6)
- quarkdown-test/src/test/resources/data/search-index/search-index-no-headings-no-metadata.json
- quarkdown-core/src/main/kotlin/com/quarkdown/core/ast/base/inline/SoftBreak.kt
- quarkdown-test/src/test/resources/data/search-index/search-index-with-headings.json
- quarkdown-core/src/main/kotlin/com/quarkdown/core/lexer/tokens/InlineTokens.kt
- quarkdown-core/src/main/kotlin/com/quarkdown/core/util/node/NodeUtils.kt
- quarkdown-test/src/test/kotlin/com/quarkdown/test/TextTest.kt
🚧 Files skipped from review as they are similar to previous changes (11)
- quarkdown-core/src/main/kotlin/com/quarkdown/core/visitor/node/NodeVisitor.kt
- quarkdown-test/src/test/resources/data/search-index/search-index-no-headings-with-metadata.json
- quarkdown-core/src/main/kotlin/com/quarkdown/core/flavor/base/BaseMarkdownLexerFactory.kt
- quarkdown-test/src/test/kotlin/com/quarkdown/test/LayoutTest.kt
- quarkdown-core/src/main/kotlin/com/quarkdown/core/localization/Locale.kt
- quarkdown-core/src/main/kotlin/com/quarkdown/core/ast/dsl/InlineAstBuilder.kt
- quarkdown-test/src/test/kotlin/com/quarkdown/test/FunctionDefinitionTest.kt
- quarkdown-core/src/test/kotlin/com/quarkdown/core/BlockParserTest.kt
- quarkdown-plaintext/src/main/kotlin/com/quarkdown/rendering/plaintext/node/PlainTextNodeRenderer.kt
- quarkdown-test/src/test/kotlin/com/quarkdown/test/MathFunctionsTest.kt
- quarkdown-html/src/main/kotlin/com/quarkdown/rendering/html/node/BaseHtmlNodeRenderer.kt
c3b6d01 to
b85a37f
Compare
iamgio
left a comment
There was a problem hiding this comment.
lgtm, just a nitpick about style and one comment to unblock the failing lint
b85a37f to
503f846
Compare
…eaks with "SoftBreak"
503f846 to
e22fcb5
Compare
4d87fbc to
0a9892e
Compare
|
Looks great, ty. I'm going to make some minor changes |
Closes #ISSUE_NUMBERat the end of this PR description.docsandCHANGELOG.mdSummary
When users split long paragraphs across multiple lines in
.qdfiles (a common practice for readability), every soft line break is unconditionally rendered as a space. This produces visually incorrect output for CJK text, where adjacent characters should be joined without spaces.Before (broken):
Renders as:
第一行内容,第二句话。 第二行内容, 第三行内容!(extra spaces inserted)After (fixed):
Same source renders as:
第一行内容,第二句话。第二行内容,第三行内容!(no unwanted spaces)Rationale
This issue affects all Quarkdown users writing CJK content. The current behavior is a side effect of HTML whitespace collapsing: each
\nremains as aText("\n")node in the AST, and the browser renders it as a space. The fix applies typographically correct rules at the parsing stage, where the character context is available, rather than at render time where the distinction between intentional and structural spaces is lost.Hard line breaks (Markdown's
two-space + newlineorbackslash + newlinesyntax) are unaffected, as the regex explicitly excludes them via negative lookbehind.Summary by CodeRabbit
New Features
Bug Fixes