Skip to content

perf: add literal-pattern fast path to split()#19708

Open
mattn wants to merge 4 commits intovim:masterfrom
mattn:perf/split-literal-fastpath
Open

perf: add literal-pattern fast path to split()#19708
mattn wants to merge 4 commits intovim:masterfrom
mattn:perf/split-literal-fastpath

Conversation

@mattn
Copy link
Copy Markdown
Member

@mattn mattn commented Mar 16, 2026

split() with a literal separator (e.g. ",", ":", "abc") is an extremely common pattern in Vim script, yet it currently goes through the full regexp compile-and-match path every time. This patch adds a fast path that detects patterns containing no regexp metacharacters and uses strstr() to scan instead, skipping vim_regcomp() / vim_regexec() entirely. Multi-byte characters are handled safely via mb_ptr2len().

Regexp patterns and the default whitespace pattern are unaffected and still take the existing code path.

Benchmark: 200,000 iterations per case

Pattern Before After Speedup
',' (literal 1-char) 11.284 s 2.966 s 3.8×
'abc' (literal multi-char) 5.919 s 3.350 s 1.8×
default (whitespace) 12.204 s 11.920 s 1.0×
',\+' (regexp) 9.675 s 9.633 s 1.0×

mattn added 3 commits March 16, 2026 13:03
When the pattern passed to split() is a single plain byte (not a regexp
metacharacter), bypass vim_regcomp/vim_regexec entirely and scan with
vim_strchr() instead.  This avoids regex compilation and matching
overhead for the very common case of splitting on a literal character
such as "," or ":".
Generalize the fast path from single-byte literals to any pattern that
contains no regexp metacharacters.  Use mb_ptr2len() to safely skip
multi-byte characters when scanning for metacharacters, and strstr()
for the actual splitting.
@mattn mattn changed the title perf: fast path for split() with a single-byte literal separator perf: add literal-pattern fast path to split() Mar 16, 2026
@mattn mattn changed the title perf: add literal-pattern fast path to split() perf/do_string_sub-literal-copy Mar 16, 2026
@mattn mattn changed the title perf/do_string_sub-literal-copy perf: add literal-pattern fast path to split() Mar 16, 2026
@char101
Copy link
Copy Markdown
Contributor

char101 commented Mar 16, 2026

How about adding a condition that the previous char is not \ in

for (p = pat; *p != NUL; p += mb_ptr2len(p))
	if (*p < 0x80
		&& vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL)
	    return FALSE;

that will make \.\. literal.

EDIT: I guess that will require \.\. to be compiled first by the regex engine to make it literal so this can't work.

while (*str != NUL || keepempty)
{
p = (char_u *)strstr((char *)str, (char *)pat);
end = p == NULL ? str + STRLEN(str) : p;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we avoid the strlen() inside the loop?

patlen = (int)STRLEN(pat);
while (*str != NUL || keepempty)
{
p = (char_u *)strstr((char *)str, (char *)pat);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, does strstr() handle non utf-8 multibyte chars correctly?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the Vimscript split() builtin by adding a fast path for purely-literal separator patterns, avoiding regex compilation/execution for common cases while leaving regexp and default-whitespace behavior on the existing code path.

Changes:

  • Add is_literal_pat() helper to detect patterns with no regexp metacharacters (with multibyte-safe scanning).
  • Implement a literal-separator split loop using strstr() and byte-length advancement instead of vim_regcomp()/vim_regexec().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +12166 to +12167
&& *str != NUL && p != NULL
&& end < p + patlen))
Comment on lines +182 to +200
static int
is_literal_pat(char_u *pat)
{
char_u *p;

if (pat == NULL || *pat == NUL)
return FALSE;

// Check that no character in the pattern has regexp meaning.
// Use mb_ptr2len() to skip over multi-byte characters safely so that
// trail bytes are never mistaken for ASCII metacharacters.
for (p = pat; *p != NUL; p += mb_ptr2len(p))
if (*p < 0x80
&& vim_strchr((char_u *)".^$~[]\\*?+|{}()", *p) != NULL)
return FALSE;

return TRUE;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants