Skip to content

THRIFT-5802: Validate set uniqueness in Node.js library#3475

Open
KantConnect wants to merge 1 commit into
apache:masterfrom
KantConnect:master
Open

THRIFT-5802: Validate set uniqueness in Node.js library#3475
KantConnect wants to merge 1 commit into
apache:masterfrom
KantConnect:master

Conversation

@KantConnect
Copy link
Copy Markdown

A Thrift set is defined to contain unique elements, but currently different language implementations have handled duplicate elements inconsistently: Go throws, Python silently deduplicates, while Node.js passed duplicates through unchanged, breaking cross-language interop.

Add duplicate-element validation in the JavaScript code generator and runtime libraries:

  • lib/nodejs/lib/thrift/thrift.js, lib/js/src/thrift.js: new Thrift.checkSetUniqueness(arr) helper. Duplicate detection via a native JS Set; throws TProtocolException(INVALID_DATA) on the first duplicate. Both the Node.js and the browser/ES6/TS runtime libraries are updated, since the JS code generator emits the same Thrift.checkSetUniqueness call regardless of target.

  • compiler/cpp/src/thrift/generate/t_js_generator.cc: in generate_serialize_container() emit a Thrift.checkSetUniqueness call immediately before output.writeSetBegin(), so the validation runs before any bytes hit the transport. In generate_deserialize_container() for sets, allocate one parallel new Set() before the per-element loop; in generate_deserialize_set_element() guard the push with seen.has(elem) / seen.add(elem).

  • lib/nodejs/test/check_set_uniqueness.test.js, lib/nodejs/test/testAll.sh: new tape unit tests covering unique sets, duplicate sets across primitive types, empty/single-element edge cases, and the thrown exception type.

@mergeable mergeable Bot added javascript Pull requests that update Javascript code nodejs compiler labels May 14, 2026
Copy link
Copy Markdown
Member

@Jens-G Jens-G left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM otherwise

void t_js_generator::generate_deserialize_container(ostream& out, t_type* ttype, string prefix) {
string size = tmp("_size");
string rtmp3 = tmp("_rtmp3");
string seen; // populated only for sets, used for O(n) duplicate detection on read
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems off: O(n) or O(1) ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, fixed the comment.

@KantConnect KantConnect force-pushed the master branch 2 times, most recently from a50f949 to 075c3e3 Compare May 15, 2026 16:16
A Thrift set is defined to contain unique elements, but currently
different language implementations have handled duplicate elements
inconsistently: Go throws, Python silently deduplicates, while Node.js
passed duplicates through unchanged, breaking cross-language interop.

Add duplicate-element validation in the JavaScript code generator and
runtime libraries:

  * lib/nodejs/lib/thrift/thrift.js, lib/js/src/thrift.js: new
    Thrift.checkSetUniqueness(arr) helper. Duplicate detection via a
    native JS Set; throws TProtocolException(INVALID_DATA) on the first
    duplicate. Both the Node.js and the browser/ES6/TS runtime libraries
    are updated, since the JS code generator emits the same
    Thrift.checkSetUniqueness call regardless of target.

  * compiler/cpp/src/thrift/generate/t_js_generator.cc: in
    generate_serialize_container() emit a Thrift.checkSetUniqueness call
    immediately before output.writeSetBegin(), so the validation runs
    before any bytes hit the transport. In
    generate_deserialize_container() for sets, allocate one parallel
    `new Set()` before the per-element loop; in
    generate_deserialize_set_element() guard the push with
    seen.has(elem) / seen.add(elem).

  * lib/nodejs/test/check_set_uniqueness.test.js,
    lib/nodejs/test/testAll.sh: new tape unit tests covering unique
    sets, duplicate sets across primitive types, empty/single-element
    edge cases, and the thrown exception type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler javascript Pull requests that update Javascript code nodejs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants