feat(csv): make parseLine the synchronous primitive (refs #3765) by MukundaKatta · Pull Request #7118 · denoland/std

MukundaKatta · 2026-04-28T04:19:54Z

Summary

Make parseLine the actual internal CSV primitive that both parse() and CsvParseStream build on, addressing the design feedback from feat(csv): add parseLine() convenience for single-line CSV records (refs #3765) #7114 (closed) and aligning with suggestion: investigate simpler CSV-parsing APIs #3765's intent.
Drop the duplicate field-parsing loop that previously lived inside Parser.#parseRecord in parse.ts — both parse() (sync) and the streaming path now share one set of field/quote rules.
Public parseLine(line, options) -> string[] is the simple shape suggestion: investigate simpler CSV-parsing APIs #3765 asked for, with BOM strip and trailing CR/LF/CRLF normalization.

What changed

csv/_io.ts: new sync parseLine carries the whole field-parsing state machine (separator, quotes, escapes, lazyQuotes, comment, trim). The existing async parseRecord becomes a small wrapper that pulls more lines from the LineReader and re-calls parseLine until a record completes. Error column tracking maps absolute positions in the joined input back to (line, column) so multi-line quoted records still report the right line.
csv/parse.ts: drop Parser.#parseRecord's duplicate field loop; Parser now defers to parseLine from _io.ts. Add the public parseLine export with a clean (line, options) signature.
csv/parse_test.ts: 12 new tests pin parseLine behavior (happy path, custom separator, escaped quotes, BOM, trailing newline, multi-line quoted body, lazyQuotes, comment, unclosed-field error).

Test plan

All 133 existing parse + parse_stream steps still pass (145 total with the new parseLine tests).
Existing error-message assertions (StartLine1, StartLine2, ParseErrorLine, OddQuotes, etc.) preserved with no changes to test expectations.
Reviewer to confirm parseLine's public surface matches suggestion: investigate simpler CSV-parsing APIs #3765's spirit and that the (line, options) shape is what was wanted.

cc @bartlomieju — this replaces #7114 with the design you sketched in the review there.

Refactor the CSV parser so a single synchronous parseLine handles all field-level rules, with parse() (sync) and CsvParseStream (async) becoming thin line-iteration shells on top of it. - _io.ts: introduce sync parseLine; rewrite the existing async parseRecord as a thin reader.readLine accumulator that delegates to parseLine. Error column tracking now resolves through embedded newlines so error messages stay correct for multi-line quoted records. - parse.ts: drop the duplicate field-parsing loop that lived inside Parser.#parseRecord; both Parser and the new public parseLine share the same primitive. Public parseLine has the simple (line, options) -> string[] signature requested in denoland#3765, including BOM strip and trailing CR/LF/CRLF normalization. - parse_test.ts: add 12 parseLine-specific tests covering happy path, custom separator, escapes, BOM, trailing newlines, multi-line quoted body, lazyQuotes, comment lines, and unclosed-field error. All 133 existing parse + parse_stream tests still pass; new tests bring the total to 145.

codecov · 2026-04-28T04:25:37Z

Codecov Report

❌ Patch coverage is 91.96429% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.61%. Comparing base (cd03740) to head (ccb627d).

Files with missing lines	Patch %	Lines
csv/parse.ts	82.60%	6 Missing and 2 partials ⚠️
csv/_io.ts	98.48%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #7118   +/-   ##
=======================================
  Coverage   94.61%   94.61%           
=======================================
  Files         634      634           
  Lines       51799    51769   -30     
  Branches     9329     9327    -2     
=======================================
- Hits        49009    48982   -27     
+ Misses       2216     2211    -5     
- Partials      574      576    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The deno_lint no-unused-vars check flagged the parameter on parseLine (and the matching one on parseRecord and Parser.#parseRecord) — it was threaded through but never read inside the function bodies because the locate() helper computes line offsets from embedded newlines in the joined fullLine instead. Removing the param simplifies the call sites without changing behavior: all 145 parse + parse_stream + parseLine tests still pass.

fibibot

@std/csv is v1.0.6 and csv/mod.ts:184 re-exports everything from ./parse.ts, so the new parseLine lands on the stable surface. Per .github/CONTRIBUTING.md, new public APIs in v1.0.0+ packages need to live in csv/unstable_parse_line.ts, must not be re-exported from mod.ts, and must carry @experimental **UNSTABLE**: New API, yet to be vetted.. Title would shift to feat(csv/unstable): add parseLine.

The underlying refactor — unifying Parser.#parseRecord and the streaming parseRecord onto one field-state machine in _io.ts — is fine; the locate() helper correctly recomputes (line, col) for embedded \n and the EOF branches match the old line.length === 0 / line.length > 0 split.

nit: parse.ts:14 was previously a \uXXXX-escaped BOM constant; this PR replaces it with the raw U+FEFF character, which renders invisibly in most editors and breaks grep. The new BOM test in parse_test.ts does the same. Please keep the escape form — parse_test.ts already defines a BYTE_ORDER_MARK constant using it at the top of the file.

bartlomieju

Thanks for the rework — collapsing the three copies of the field state machine into one shared sync primitive is the right shape, and the (line, options) public surface matches what #3765 sketched. Error-location preservation across multi-line quoted records via locate() is a nice touch, and the new tests cover the cases that matter (BOM, custom separator, escaped quotes, embedded \n, lazyQuotes, comment, unclosed-field error).

Two things to fix before merge, plus a few smaller ones inline:

Quadratic re-parsing for multi-line quoted records. Both wrappers (parseRecord in _io.ts and Parser.#parseRecord in parse.ts) re-run parseLine over the entire accumulated buffer every time another line is appended. For an N-line quoted field that's O(N²) field-scan work plus O(N²) string-concat allocation, where the old code was linear. Multi-megabyte quoted fields (embedded JSON/HTML in a cell — a real pattern in data exports) would regress noticeably. Worth a benchmark; if confirmed, parseLine needs a resume-position mode or the wrappers need to keep incremental state.
BOM literal replaces "\ufeff". The U+FEFF character is invisible in most editors and diff viewers — a future reader will see the constant as "". Keep the escape.

The public surface looks right; one missing piece is that parse() validates separator against ["\r", "\n", '"'] but the new public parseLine doesn't (inline). Also, issue #3765 had a second bullet about requiring TextLineStream upstream of CsvParseStream — this PR only does the first bullet, which is fine, but worth a follow-up note so it doesn't get lost.

Not blocking: the PR description still has an unchecked reviewer-confirmation item on the public shape — confirming that (line, options) → string[] matches #3765's spirit.

bartlomieju · 2026-05-26T09:41:25Z

 export type { ParseResult, RecordWithColumn };

-const BYTE_ORDER_MARK = "\ufeff";
+const BYTE_ORDER_MARK = "";


Please keep this as "\ufeff". The literal U+FEFF is invisible in most editors and diff viewers — a maintainer scanning this line will read it as an empty string. Same issue in the new test at csv/parse_test.ts:1071.

bartlomieju · 2026-05-26T09:41:25Z

+export function parseLine(
+  line: string,
+  options: Omit<ParseOptions, "skipFirstRow" | "columns" | "fieldsPerRecord"> =
+    {},
+): string[] {
+  const { separator = ",", trimLeadingSpace = false, comment, lazyQuotes } =
+    options;
+  const stripped = line.startsWith(BYTE_ORDER_MARK) ? line.slice(1) : line;
+  // Treat a single trailing CR/LF/CRLF as a record terminator (callers that
+  // forgot to trim should not see a phantom empty trailing field).
+  const normalized = stripped.endsWith("\r\n")
+    ? stripped.slice(0, -2)
+    : stripped.endsWith("\n") || stripped.endsWith("\r")
+    ? stripped.slice(0, -1)
+    : stripped;
+  const readOptions: ReadOptions = {
+    separator,
+    trimLeadingSpace,
+    ...(comment !== undefined ? { comment } : {}),
+    ...(lazyQuotes !== undefined ? { lazyQuotes } : {}),
+  };
+  const result = parseLineInternal(normalized, readOptions, 0, true);
+  return result ?? [];
+}


Two gaps on the public parseLine:

Separator not validated. parse() rejects separators in INVALID_RUNE = ["\r", "\n", '"'] (see the existing check in Parser.parse). parseLine("a\"b", { separator: '"' }) would behave unpredictably here. Apply the same guard.

comment behavior is undocumented and silently lossy. parseLine("# x", { comment: "#" }) returns [], indistinguishable from an empty line — see the new test at parse_test.ts:1113. Either drop comment from the public single-line surface (it's a record-stream concept, not a per-line one) or document explicitly in the JSDoc that comment lines return [].

bartlomieju · 2026-05-26T09:41:25Z

+    let accumulated = first;
+    while (true) {
+      const result = parseLineInternal(
+        accumulated,
+        this.#options,
+        zeroBasedStartLine,
+        this.#isEOF(),
+      );
+      if (result !== null) return result;
+      const next = this.#readLine();
+      if (next === null) {
+        // Force the EOF decision (will throw unless lazyQuotes is set).
+        return parseLineInternal(
+          accumulated,
+          this.#options,
+          zeroBasedStartLine,
+          true,
+        ) ?? [];
      }
+      accumulated += "\n" + next;
    }


Quadratic re-parse. Every additional line, parseLineInternal re-scans accumulated from byte 0. For an N-line quoted field this is O(N²) parse work plus the O(N²) string concatenation on line 161. The old #parseRecord parsed each pulled line incrementally. Real CSV exports do put multi-MB quoted blobs (HTML, JSON, base64) in a single cell — this is a real regression risk. Worth benchmarking against a synthetic input like one record with 10k newlines inside a quoted field.

If the benchmark confirms it, options are: (a) have parseLineInternal accept a resume position so the wrapper only feeds it the new tail, (b) make it return progress state, or (c) keep a separate incremental path for the streaming case and only use the shared primitive for true single-line callers.

bartlomieju · 2026-05-26T09:41:25Z

+      // narrowing for the type system.
+      return eofResult ?? [];
+    }
+    accumulated += "\n" + next;


Same quadratic-reparse concern as the Parser.#parseRecord wrapper in parse.ts:142-161. Per call to parseLine here, the entire accumulated is re-scanned from start. For streaming, this matters more than for parse() because CsvParseStream is the documented path for memory-bounded ingest of large CSVs — a quoted field spanning many chunks would balloon to O(N²).

bartlomieju · 2026-05-26T09:41:25Z

+      const eofResult = parseLine(
+        accumulated,
+        options,
+        zeroBasedRecordStartLine,
+        true,
+      );
+      // parseLine with atEof=true cannot return null; this is a defensive
+      // narrowing for the type system.
+      return eofResult ?? [];


The comment admits this branch is unreachable. Rather than carry a ?? [] fallback the type system can't prove away, express the invariant: overload parseLine so atEof: true returns string[] and atEof: false returns string[] | null. Then this block can just be return parseLine(accumulated, options, zeroBasedRecordStartLine, true) and the dead-code comment goes away.

bartlomieju · 2026-05-26T09:41:25Z

+ */
+export function parseLine(
  fullLine: string,
-  reader: LineReader,
  options: ReadOptions,
-  zeroBasedRecordStartLine: number,
-  zeroBasedLine: number = zeroBasedRecordStartLine,
-): Promise<Array<string>> {
+  zeroBasedRecordStartLine: number = 0,
+  atEof: boolean = true,
+): string[] | null {
  // line starting with comment character is ignored


Nit on the docstring: it says parse builds on this function, but parse lives in parse.ts and goes through the Parser class — parseRecord (just below) and Parser.#parseRecord are the actual callers. Worth rewording to avoid implying a direct dependency from parse().

bartlomieju · 2026-05-26T09:41:25Z

+  const locate = (absPos: number): { line: number; col: number } => {
+    let line = zeroBasedRecordStartLine;
+    let lastNewline = -1;
+    for (let i = 0; i < absPos; i++) {
+      if (fullLine[i] === "\n") {
+        line++;
+        lastNewline = i;
+      }
+    }
+    const col = codePointLength(fullLine.slice(lastNewline + 1, absPos));
+    return { line, col };
+  };
+


Nit: locate rescans fullLine[0..absPos] on each call. It's only hit on error paths so this isn't a real perf concern, but if you're touching the file anyway, a one-pass precomputed lineStarts: number[] indexed via findLastIndex would be cleaner and let you reuse it across both error sites.

MukundaKatta · 2026-05-27T05:47:58Z

Moved to draft while I rework — thanks for the careful review. Plan:

Unstable surface (per @fibibot): move parseLine to csv/unstable_parse_line.ts, drop the mod.ts re-export, add @experimental **UNSTABLE** tag, retitle to feat(csv/unstable): add parseLine.
Quadratic re-parse (per @bartlomieju): convert both wrappers to a stateful incremental parser — feed each pulled line into a persistent field-state machine instead of re-scanning accumulated from offset 0. Will measure on a multi-MB single quoted field before re-requesting review.
parseLine overloads: split into atEof: true → string[] and atEof: false → string[] | null so the unreachable ?? [] fallback in _io.ts:280 goes away cleanly.
Doc + locate cleanups: reword the parseRecord docstring (parse goes through the Parser class, not this fn directly), and precompute lineStarts if I'm touching the error paths anyway.

Will re-request once (2) is benchmarked and clean.

github-actions Bot added the csv label Apr 28, 2026

fibibot requested changes May 13, 2026

View reviewed changes

bartlomieju requested changes May 26, 2026

View reviewed changes

MukundaKatta marked this pull request as draft May 27, 2026 05:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(csv): make parseLine the synchronous primitive (refs #3765)#7118

feat(csv): make parseLine the synchronous primitive (refs #3765)#7118
MukundaKatta wants to merge 2 commits into
denoland:mainfrom
MukundaKatta:feat/csv-parse-line-primitive

MukundaKatta commented Apr 28, 2026

Uh oh!

codecov Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

fibibot left a comment •

edited

Loading

Uh oh!

bartlomieju left a comment

Uh oh!

bartlomieju May 26, 2026

Uh oh!

bartlomieju May 26, 2026

Uh oh!

bartlomieju May 26, 2026

Uh oh!

bartlomieju May 26, 2026

Uh oh!

bartlomieju May 26, 2026

Uh oh!

bartlomieju May 26, 2026

Uh oh!

bartlomieju May 26, 2026

Uh oh!

MukundaKatta commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MukundaKatta commented Apr 28, 2026

Summary

What changed

Test plan

Uh oh!

codecov Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fibibot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bartlomieju left a comment

Choose a reason for hiding this comment

Uh oh!

bartlomieju May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bartlomieju May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bartlomieju May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bartlomieju May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bartlomieju May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bartlomieju May 26, 2026

Choose a reason for hiding this comment

Uh oh!

bartlomieju May 26, 2026

Choose a reason for hiding this comment

Uh oh!

MukundaKatta commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Apr 28, 2026 •

edited

Loading

fibibot left a comment •

edited

Loading