Title: | Friendly Regular Expressions |
---|---|
Description: | A friendly interface for the construction of regular expressions. |
Authors: | Kevin Ushey [aut, cre], Jim Hester [aut], Robert Krzyzanowski [aut] |
Maintainer: | Kevin Ushey <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.2.1.9000 |
Built: | 2024-11-07 04:58:17 UTC |
Source: | https://github.com/r-lib/rex |
The special binary function %or%
can be used to specify a set
of optional matches.
x %or% y or(...)
x %or% y or(...)
x |
A string. |
y |
A string. |
... |
|
Other rex:
capture()
,
character_class()
,
counts
,
group()
,
lookarounds
,
not()
,
rex()
,
shortcuts
,
wildcards
Specify an explicit regular expression. This expression must already be escaped.
## S3 method for class 'regex' as.character(x, ...) ## S3 method for class 'regex' print(x, ...) regex(x, ...)
## S3 method for class 'regex' as.character(x, ...) ## S3 method for class 'regex' print(x, ...) regex(x, ...)
x |
Object |
... |
further arguments |
as.character
: coerce regex object to a character
print
: Print regex object
as.regex
to coerce to a regex object.
regex
.Coerce objects to a regex
.
as.regex(x, ...) ## Default S3 method: as.regex(x, ...)
as.regex(x, ...) ## Default S3 method: as.regex(x, ...)
x |
Object to coerce to |
... |
further arguments passed to methods. |
default
: Simply escape the Object.
Used to save the matched value within the group for use later in the regular
expression or to extract the values captured. Both named and unnamed groups
can later be referenced using capture_group
.
capture(..., name = NULL) capture_group(name)
capture(..., name = NULL) capture_group(name)
... |
|
name |
of the group. Unnamed capture groups are numbers starting at 1 in the order they appear in the regular expression. If two groups have the same name, the leftmost group is the used in any reference. |
group
for grouping without capturing. Perl 5 Capture
Groups https://perldoc.perl.org/perlre#Capture-groups
Other rex:
%or%()
,
character_class()
,
counts
,
group()
,
lookarounds
,
not()
,
rex()
,
shortcuts
,
wildcards
# Match paired quotation marks re <- rex( # first quotation mark capture(quotes), # match all non-matching quotation marks zero_or_more(except(capture_group(1))), # end quotation mark (matches first) capture_group(1) ) #named capture - don't match apples to oranges re <- rex( capture(name = "fruit", or("apple", "orange")), "=", capture_group("fruit") )
# Match paired quotation marks re <- rex( # first quotation mark capture(quotes), # match all non-matching quotation marks zero_or_more(except(capture_group(1))), # end quotation mark (matches first) capture_group(1) ) #named capture - don't match apples to oranges re <- rex( capture(name = "fruit", or("apple", "orange")), "=", capture_group("fruit") )
There are multiple ways you can define a character class.
character_class(x) one_of(...) any_of(..., type = c("greedy", "lazy", "possessive")) some_of(..., type = c("greedy", "lazy", "possessive")) none_of(...) except_any_of(..., type = c("greedy", "lazy", "possessive")) except_some_of(..., type = c("greedy", "lazy", "possessive")) range(start, end) `:`(start, end) exclude_range(start, end)
character_class(x) one_of(...) any_of(..., type = c("greedy", "lazy", "possessive")) some_of(..., type = c("greedy", "lazy", "possessive")) none_of(...) except_any_of(..., type = c("greedy", "lazy", "possessive")) except_some_of(..., type = c("greedy", "lazy", "possessive")) range(start, end) `:`(start, end) exclude_range(start, end)
x |
text to include in the character class (must be escaped manually) |
... |
|
type |
the type of match to perform. There are three match types
|
start |
beginning of character class |
end |
end of character class |
character_class
: explicitly define a character class
one_of
: matches one of the specified characters.
any_of
: matches zero or more of the specified characters.
some_of
: matches one or more of the specified characters.
none_of
: matches anything but one of the specified characters.
except_any_of
: matches zero or more of anything but the specified characters.
except_some_of
: matches one or more of anything but the specified characters.
range
: matches one of any of the characters in the range.
:
: matches one of any of the characters in the range.
exclude_range
: matches one of any of the characters except those in the range.
Other rex:
%or%()
,
capture()
,
counts
,
group()
,
lookarounds
,
not()
,
rex()
,
shortcuts
,
wildcards
# grey = gray re <- rex("gr", one_of("a", "e"), "y") grepl(re, c("grey", "gray")) # TRUE TRUE # Match non-vowels re <- rex(none_of("a", "e", "i", "o", "u")) # They can also be in the same string re <- rex(none_of("aeiou")) grepl(re, c("k", "l", "e")) # TRUE TRUE FALSE # Match range re <- rex(range("a", "e")) grepl(re, c("b", "d", "f")) # TRUE TRUE FALSE # Explicit creation re <- rex(character_class("abcd\\[")) grepl(re, c("a", "d", "[", "]")) # TRUE TRUE TRUE FALSE
# grey = gray re <- rex("gr", one_of("a", "e"), "y") grepl(re, c("grey", "gray")) # TRUE TRUE # Match non-vowels re <- rex(none_of("a", "e", "i", "o", "u")) # They can also be in the same string re <- rex(none_of("aeiou")) grepl(re, c("k", "l", "e")) # TRUE TRUE FALSE # Match range re <- rex(range("a", "e")) grepl(re, c("b", "d", "f")) # TRUE TRUE FALSE # Explicit creation re <- rex(character_class("abcd\\[")) grepl(re, c("a", "d", "[", "]")) # TRUE TRUE TRUE FALSE
Character class escapes
character_class_escape(x) ## S3 method for class 'regex' character_class_escape(x) ## S3 method for class 'character_class' character_class_escape(x) ## S3 method for class 'character' character_class_escape(x) ## S3 method for class 'list' character_class_escape(x) ## Default S3 method: character_class_escape(x)
character_class_escape(x) ## S3 method for class 'regex' character_class_escape(x) ## S3 method for class 'character_class' character_class_escape(x) ## S3 method for class 'character' character_class_escape(x) ## S3 method for class 'list' character_class_escape(x) ## Default S3 method: character_class_escape(x)
x |
Object to escape. |
regex
: objects are passed through unchanged.
character_class
: objects are passed through unchanged.
character
: objects properly escaped for character classes.
list
: call character_class_escape
on all elements of the list.
default
: coerce to character
and character_class_escape
.
Functions to restrict a regex to a specific number
n_times(x, n, type = c("greedy", "lazy", "possessive")) between(x, low, high, type = c("greedy", "lazy", "possessive")) at_least(x, n, type = c("greedy", "lazy", "possessive")) at_most(x, n, type = c("greedy", "lazy", "possessive"))
n_times(x, n, type = c("greedy", "lazy", "possessive")) between(x, low, high, type = c("greedy", "lazy", "possessive")) at_least(x, n, type = c("greedy", "lazy", "possessive")) at_most(x, n, type = c("greedy", "lazy", "possessive"))
x |
A regex pattern. |
n |
An integer number |
type |
the type of match to perform. There are three match types
|
low |
An integer number for the lower limit. |
high |
An integer number for the upper limit. |
n_times
: x
must occur exactly n
times.
between
: x
must occur between low
and high
times.
at_least
: x
must occur at least n
times.
at_most
: x
must occur at most n
times.
Other rex:
%or%()
,
capture()
,
character_class()
,
group()
,
lookarounds
,
not()
,
rex()
,
shortcuts
,
wildcards
Escape characters for a regex
escape(x) ## S3 method for class 'regex' escape(x) ## S3 method for class 'character_class' escape(x) ## S3 method for class 'character' escape(x) ## Default S3 method: escape(x) ## S3 method for class 'list' escape(x)
escape(x) ## S3 method for class 'regex' escape(x) ## S3 method for class 'character_class' escape(x) ## S3 method for class 'character' escape(x) ## Default S3 method: escape(x) ## S3 method for class 'list' escape(x)
x |
Object to escape. |
regex
: Objects are simply passed through unchanged.
character_class
: Objects are surrounded by braces.
character
: Objects are properly escaped for regular expressions.
default
: default escape coerces to character and escapes.
list
: simply call escape on all elements of the list.
This is similar to capture
except that it does not store the
value of the group. Best used when you want to combine several parts
together and do not reference or extract the grouped value later.
group(...)
group(...)
... |
|
capture
for grouping with capturing. Perl 5 Extended
Patterns https://perldoc.perl.org/perlre#Extended-Patterns
Other rex:
%or%()
,
capture()
,
character_class()
,
counts
,
lookarounds
,
not()
,
rex()
,
shortcuts
,
wildcards
Lookarounds
x %if_next_is% y x %if_next_isnt% y x %if_prev_is% y x %if_prev_isnt% y
x %if_next_is% y x %if_next_isnt% y x %if_prev_is% y x %if_prev_isnt% y
x |
A regex pattern. |
y |
A regex pattern. |
These functions provide an interface to perl lookarounds.
Special binary functions are used to infer an ordering, since often you might wish to match a word / set of characters conditional on the start and end of that word.
%if_next_is%
: TRUE
if x follows y
%if_next_isnt%
: TRUE
if x does not follow y
%if_prev_is%
: TRUE
if y comes before x
%if_prev_isnt%
: TRUE
if y does not come before x
Perl 5 Documentation https://perldoc.perl.org/perlre#Extended-Patterns
Other rex:
%or%()
,
capture()
,
character_class()
,
counts
,
group()
,
not()
,
rex()
,
shortcuts
,
wildcards
stopifnot(grepl(rex("crab" %if_next_is% "apple"), "crabapple", perl = TRUE)) stopifnot(grepl(rex("crab" %if_prev_is% "apple"), "applecrab", perl = TRUE)) stopifnot(grepl(rex(range("a", "e") %if_next_isnt% range("f", "g")), "ah", perl = TRUE)) stopifnot(grepl(rex(range("a", "e") %if_next_is% range("f", "i")), "ah", perl = TRUE))
stopifnot(grepl(rex("crab" %if_next_is% "apple"), "crabapple", perl = TRUE)) stopifnot(grepl(rex("crab" %if_prev_is% "apple"), "applecrab", perl = TRUE)) stopifnot(grepl(rex(range("a", "e") %if_next_isnt% range("f", "g")), "ah", perl = TRUE)) stopifnot(grepl(rex(range("a", "e") %if_next_is% range("f", "i")), "ah", perl = TRUE))
Do not match
not(..., type = c("greedy", "lazy", "possessive"))
not(..., type = c("greedy", "lazy", "possessive"))
... |
|
type |
the type of match to perform. There are three match types
|
Other rex:
%or%()
,
capture()
,
character_class()
,
counts
,
group()
,
lookarounds
,
rex()
,
shortcuts
,
wildcards
Match function
re_matches( data, pattern, global = FALSE, options = NULL, locations = FALSE, ... )
re_matches( data, pattern, global = FALSE, options = NULL, locations = FALSE, ... )
data |
character vector to match against |
pattern |
regular expression to use for matching |
global |
use global matching |
options |
regular expression options |
locations |
rather than returning the values of the matched (or
captured) string, return a |
... |
options passed to regexpr or gregexpr |
if no captures, returns a logical vector the same length as the
input character vector specifying if the relevant value matched or not. If
there are captures in the regular expression, returns a data.frame
with a
column for each capture group. If global
is TRUE
, returns a
list of data.frame
s.
regexp
Section "Perl-like Regular Expressions" for a
discussion of the supported options
string <- c("this is a", "test string") re_matches(string, rex("test")) # FALSE FALSE # named capture re_matches(string, rex(capture(alphas, name = "first_word"), space, capture(alphas, name = "second_word"))) # first_word second_word # 1 this is # 2 test string # capture returns NA when it fails to match re_matches(string, rex(capture("test"))) # 1 # 1 test # 2 <NA>
string <- c("this is a", "test string") re_matches(string, rex("test")) # FALSE FALSE # named capture re_matches(string, rex(capture(alphas, name = "first_word"), space, capture(alphas, name = "second_word"))) # first_word second_word # 1 this is # 2 test string # capture returns NA when it fails to match re_matches(string, rex(capture("test"))) # 1 # 1 test # 2 <NA>
Substitute regular expressions in a string with another string.
re_substitutes(data, pattern, replacement, global = FALSE, options = NULL, ...)
re_substitutes(data, pattern, replacement, global = FALSE, options = NULL, ...)
data |
character vector to substitute |
pattern |
regular expression to match |
replacement |
replacement text to use |
global |
substitute all occurrences |
options |
option flags |
... |
options passed to sub or gsub |
regexp
Section "Perl-like Regular Expressions" for a
discussion of the supported options
string <- c("this is a Test", "string") re_substitutes(string, "test", "not a test", options = "insensitive") re_substitutes(string, "i", "x", global = TRUE) re_substitutes(string, "(test)", "not a \\1", options = "insensitive")
string <- c("this is a Test", "string") re_substitutes(string, "test", "not a test", options = "insensitive") re_substitutes(string, "i", "x", global = TRUE) re_substitutes(string, "(test)", "not a \\1", options = "insensitive")
If you are using rex in another package you need to call this function to register all of the rex shortcuts so that spurious NOTEs about global variables being generated during R CMD check.
register_shortcuts(pkg_name)
register_shortcuts(pkg_name)
pkg_name |
the package to register the shortcuts in |
Generate a regular expression.
rex(..., env = parent.frame())
rex(..., env = parent.frame())
... |
|
env |
environment to evaluate the rex expression in. |
Other rex:
%or%()
,
capture()
,
character_class()
,
counts
,
group()
,
lookarounds
,
not()
,
shortcuts
,
wildcards
While within rex mode, functions used within the rex
function
are attached, so one can get e.g. auto-completion within editors.
rex_mode()
rex_mode()
Commonly used character classes and regular expressions. These shortcuts
are substituted inside rex
calls.
shortcuts
shortcuts
An object of class shortcut
of length 116.
names(shortcuts)
will give you the full list of available shortcuts.
Other rex:
%or%()
,
capture()
,
character_class()
,
counts
,
group()
,
lookarounds
,
not()
,
rex()
,
wildcards
Each of these shortcuts has both a plural (-s) and inverse (non_) form.
single_shortcuts
single_shortcuts
An object of class shortcut
of length 18.
Wildcards
zero_or_more(..., type = c("greedy", "lazy", "possessive")) one_or_more(..., type = c("greedy", "lazy", "possessive")) maybe(..., type = c("greedy", "lazy", "possessive"))
zero_or_more(..., type = c("greedy", "lazy", "possessive")) one_or_more(..., type = c("greedy", "lazy", "possessive")) maybe(..., type = c("greedy", "lazy", "possessive"))
... |
|
type |
the type of match to perform. There are three match types
|
zero_or_more
: match ...
zero or more times.
one_or_more
: match ...
one or more times.
maybe
: match ...
zero or one times.
Other rex:
%or%()
,
capture()
,
character_class()
,
counts
,
group()
,
lookarounds
,
not()
,
rex()
,
shortcuts