Skip to content

Commit 5de4cc5

Browse files
khasinskinobu
authored andcommitted
Fix regexp performance regression for patterns starting with s/k
Commit 981ee02 ("Fix performance problem with /k/i and /s/i") was merged for Ruby 4.0 to enable partial Boyer-Moore optimization for patterns containing 's' or 'k' by using the prefix before those characters. However, when 's' or 'k' appears at the start of a pattern (no usable prefix), set_bm_skip() returns 0 and the code returned early without setting any optimization mode, leaving reg->optimize at ONIG_OPTIMIZE_NONE. This caused up to 30x slowdown for patterns like /slackware/i when matched against strings with non-ASCII characters. This patch keeps the improvement from 981ee02 for patterns with 3+ char prefix, while fixing the regression by falling back to ONIG_OPTIMIZE_EXACT_IC with the full pattern when the usable prefix is less than 3 characters. Before: /\bslackware\b/i with non-ASCII string: 2.24 us/op After: /\bslackware\b/i with non-ASCII string: 0.70 us/op (3.2x faster) [Bug #21824]
1 parent 09cd131 commit 5de4cc5

1 file changed

Lines changed: 10 additions & 4 deletions

File tree

regcomp.c

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5264,18 +5264,24 @@ set_optimize_exact_info(regex_t* reg, OptExactInfo* e)
52645264

52655265
if (e->ignore_case > 0) {
52665266
if (e->len >= 3 || (e->len >= 2 && allow_reverse)) {
5267+
int orig_len = e->len;
52675268
e->len = set_bm_skip(reg->exact, reg->exact_end, reg,
52685269
reg->map, 1);
5269-
reg->exact_end = reg->exact + e->len;
52705270
if (e->len >= 3) {
5271+
reg->exact_end = reg->exact + e->len;
52715272
reg->optimize = (allow_reverse != 0
52725273
? ONIG_OPTIMIZE_EXACT_BM_IC : ONIG_OPTIMIZE_EXACT_BM_NOT_REV_IC);
52735274
}
5274-
else if (e->len > 0) {
5275+
else {
5276+
/* Even if BM skip table can't be built (e.g., pattern starts with
5277+
's' or 'k' which have multi-byte case fold variants), we should
5278+
still use EXACT_IC optimization with the original pattern.
5279+
Without this fallback, patterns like /slackware/i have no
5280+
optimization at all, causing severe performance regression
5281+
especially with non-ASCII strings. See [Bug #21824] */
5282+
e->len = orig_len; /* Restore original length for EXACT_IC */
52755283
reg->optimize = ONIG_OPTIMIZE_EXACT_IC;
52765284
}
5277-
else
5278-
return 0;
52795285
}
52805286
else {
52815287
reg->optimize = ONIG_OPTIMIZE_EXACT_IC;

0 commit comments

Comments
 (0)