Regular expression functions#
All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:
When using multi-line mode (enabled via the
(?m)flag), only\nis recognized as a line terminator. Additionally, the(?d)flag is not supported and must not be used.Case-insensitive matching (enabled via the
(?i)flag) is always performed in a Unicode-aware manner. However, context-sensitive and local-sensitive matching is not supported. Additionally, the(?u)flag is not supported and must not be used.Surrogate pairs are not supported. For example,
\uD800\uDC00is not treated asU+10000and must be specified as\x{10000}.Boundaries (
\b) are incorrectly handled for a non-spacing mark without a base character.\Qand\Eare not supported in character classes (such as[A-Z123]) and are instead treated as literals.Unicode character classes (
\p{prop}) are supported with the following differences:All underscores in names must be removed. For example, use
OldItalicinstead ofOld_Italic.Scripts must be specified directly, without the
Is,script=orsc=prefixes. Example:\p{Hiragana}Blocks must be specified with the
Inprefix. Theblock=andblk=prefixes are not supported. Example:\p{Mongolian}Categories must be specified directly, without the
Is,general_category=orgc=prefixes. Example:\p{L}Binary properties must be specified directly, without the
Is. Example:\p{NoncharacterCodePoint}
- regexp_count(string, pattern) bigint#
Returns the number of occurrence of
patterninstring:SELECT regexp_count('1a 2b 14m', '\s*[a-z]+\s*'); -- 3
- regexp_extract_all(string, pattern)#
Returns the substring(s) matched by the regular expression
patterninstring:SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14]
- regexp_extract_all(string, pattern, group)
Finds all occurrences of the regular expression
patterninstringand returns the capturing group numbergroup:SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm']
- regexp_extract(string, pattern) varchar#
Returns the first substring matched by the regular expression
patterninstring:SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1
- regexp_extract(string, pattern, group) varchar
Finds the first occurrence of the regular expression
patterninstringand returns the capturing group numbergroup:SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a'
- regexp_like(string, pattern) boolean#
Evaluates the regular expression
patternand determines if it is contained withinstring.The
patternonly needs to be contained withinstring, rather than needing to match all ofstring. In other words, this performs a contains operation rather than a match operation. You can match the entire string by anchoring the pattern using^and$:SELECT regexp_like('1a 2b 14m', '\d+b'); -- true
- regexp_position(string, pattern) integer#
Returns the index of the first occurrence (counting from 1) of
patterninstring. Returns -1 if not found:SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b'); -- 8
- regexp_position(string, pattern, start) integer
Returns the index of the first occurrence of
patterninstring, starting fromstart(includestart). Returns -1 if not found:SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 5); -- 8 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12); -- 19
- regexp_position(string, pattern, start, occurrence) integer
Returns the index of the nth
occurrenceofpatterninstring, starting fromstart(includestart). Returns -1 if not found:SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 1); -- 19 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 2); -- 31 SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 3); -- -1
- regexp_replace(string, pattern) varchar#
Removes every instance of the substring matched by the regular expression
patternfromstring:SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'
- regexp_replace(string, pattern, replacement) varchar
Replaces every instance of the substring matched by the regular expression
patterninstringwithreplacement. Capturing groups can be referenced inreplacementusing$gfor a numbered group or${name}for a named group. A dollar sign ($) may be included in the replacement by escaping it with a backslash (\$):SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m'
- regexp_replace(string, pattern, function) varchar
Replaces every instance of the substring matched by the regular expression
patterninstringusingfunction. The lambda expressionfunctionis invoked for each match with the capturing groups passed as an array. Capturing group numbers start at one; there is no group for the entire match (if you need this, surround the entire expression with parenthesis).SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); --'New York'
- regexp_split(string, pattern)#
Splits
stringusing the regular expressionpatternand returns an array. Trailing empty strings are preserved:SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]