Presto 329 Documentation

7.10. Regular Expression Functions

7.10. Regular Expression Functions

All of the regular expression functions use the Java pattern syntax, with a few notable exceptions:

regexp_count(string, pattern) → bigint

Returns the number of occurrence of pattern in string:

SELECT regexp_count('1a 2b 14m', '\s*[a-z]+\s*'); -- 3
regexp_extract_all(string, pattern) -> array(varchar)

Returns the substring(s) matched by the regular expression pattern in string:

SELECT regexp_extract_all('1a 2b 14m', '\d+'); -- [1, 2, 14]
regexp_extract_all(string, pattern, group) -> array(varchar)

Finds all occurrences of the regular expression pattern in string and returns the capturing group number group:

SELECT regexp_extract_all('1a 2b 14m', '(\d+)([a-z]+)', 2); -- ['a', 'b', 'm']
regexp_extract(string, pattern) → varchar

Returns the first substring matched by the regular expression pattern in string:

SELECT regexp_extract('1a 2b 14m', '\d+'); -- 1
regexp_extract(string, pattern, group) → varchar

Finds the first occurrence of the regular expression pattern in string and returns the capturing group number group:

SELECT regexp_extract('1a 2b 14m', '(\d+)([a-z]+)', 2); -- 'a'
regexp_like(string, pattern) → boolean

Evaluates the regular expression pattern and determines if it is contained within string.

This function is similar to the LIKE operator, except that the pattern only needs to be contained within string, rather than needing to match all of string. In other words, this performs a contains operation rather than a match operation. You can match the entire string by anchoring the pattern using ^ and $:

SELECT regexp_like('1a 2b 14m', '\d+b'); -- true
regexp_position(string, pattern) → integer

Returns the index of the first occurrence (counting from 1) of pattern in string. Returns -1 if not found:

SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b'); -- 8
regexp_position(string, pattern, start) → integer

Returns the index of the first occurrence of pattern in string, starting from start (include start). Returns -1 if not found:

SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 5); -- 8
SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12); -- 19
regexp_position(string, pattern, start, occurrence) → integer

Returns the index of the nth occurrence of pattern in string, starting from start (include start). Returns -1 if not found:

SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 1); -- 19
SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 2); -- 31
SELECT regexp_position('I have 23 apples, 5 pears and 13 oranges', '\b\d+\b', 12, 3); -- -1
regexp_replace(string, pattern) → varchar

Removes every instance of the substring matched by the regular expression pattern from string:

SELECT regexp_replace('1a 2b 14m', '\d+[ab] '); -- '14m'
regexp_replace(string, pattern, replacement) → varchar

Replaces every instance of the substring matched by the regular expression pattern in string with replacement. Capturing groups can be referenced in replacement using $g for a numbered group or ${name} for a named group. A dollar sign ($) may be included in the replacement by escaping it with a backslash (\$):

SELECT regexp_replace('1a 2b 14m', '(\d+)([ab]) ', '3c$2 '); -- '3ca 3cb 14m'
regexp_replace(string, pattern, function) → varchar

Replaces every instance of the substring matched by the regular expression pattern in string using function. The lambda expression function is invoked for each match with the capturing groups passed as an array. Capturing group numbers start at one; there is no group for the entire match (if you need this, surround the entire expression with parenthesis).

SELECT regexp_replace('new york', '(\w)(\w*)', x -> upper(x[1]) || lower(x[2])); --'New York'
regexp_split(string, pattern) -> array(varchar)

Splits string using the regular expression pattern and returns an array. Trailing empty strings are preserved:

SELECT regexp_split('1a 2b 14m', '\s*[a-z]+\s*'); -- [1, 2, 14, ]