Package 'rematch2' reference manual

Title:	Tidy Output from Regular Expression Matching
Description:	Wrappers on 'regexpr' and 'gregexpr' to return the match results in tidy data frames.
Authors:	Gábor Csárdi [aut, cre], Matthew Lincoln [ctb], Posit Software, PBC [cph, fnd]
Maintainer:	Gábor Csárdi <[email protected]>
License:	MIT + file LICENSE
Version:	2.1.2.9000
Built:	2025-02-07 05:27:36 UTC
Source:	https://github.com/r-lib/rematch2

Match results from a data frame column and attach results

Description

Taking a data frame and a column name as input, this function will run re_match() and bind the results as new columns to the original table., returning a tibble::tibble(). This makes it friendly for pipe-oriented programming with magrittr.

Usage

bind_re_match(df, from, ..., keep_match = FALSE)

bind_re_match_(df, from, ..., keep_match = FALSE)
bind_re_match(df, from, ..., keep_match = FALSE)

bind_re_match_(df, from, ..., keep_match = FALSE)

Arguments

`df`	A data frame.
`from`	Name of column to use as input for `re_match()`. `bind_re_match()` takes unquoted names, while `bind_re_match_()` takes quoted names.
`...`	Arguments (including `pattern`) to pass to `re_match()`.
`keep_match`	Should the column `.match` be included in the results? Defaults to `FALSE`, to avoid column name collisions in the case that `bind_re_match()` is called multiple times in succession.

Functions

bind_re_match_(): Standard-evaluation version that takes a quoted column name.

Note

If named capture groups will result in multiple columns with the same column name, tibble::repair_names() will be called on the resulting table.

Examples

match_cars <- tibble::rownames_to_column(mtcars)
bind_re_match(match_cars, rowname, "^(?<make>\\w+) ?(?<model>.+)?$")

match_cars <- tibble::rownames_to_column(mtcars)
bind_re_match(match_cars, rowname, "^(?<make>\\w+) ?(?<model>.+)?$")

Extract Data From First Regular Expression Match Into a Data Frame

Description

Match a regular expression to a string, and return matches, match positions, and capture groups. This function is like its match() counterpart, except it returns match/capture group start and end positions in addition to the matched values.

Usage

re_exec(text, pattern, perl = TRUE, ...)

## S3 method for class 'rematch_records'
x$name

## S3 method for class 'rematch_allrecords'
x$name
re_exec(text, pattern, perl = TRUE, ...)

## S3 method for class 'rematch_records'
x$name

## S3 method for class 'rematch_allrecords'
x$name

Arguments

`text`	Character vector.
`pattern`	A regular expression. See `base::regex()` for more about regular expressions.
`perl`	logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.
`...`	Additional arguments to pass to `base::gregexpr()` (or `base::regexpr()` if `text` is of length zero).
`x`	Object returned by `re_exec` or `re_exec_all`.
`name`	`match`, `start` or `end`.

Value

A tidy data frame (see Section “Tidy Data”). Match record entries are one length vectors that are set to NA if there is no match.

Tidy Data

The return value is a tidy data frame where each row corresponds to an element of the input character vector text. The values from text appear for reference in the .text character column. All other columns are list columns containing the match data. The .match column contains the match information for full regular expression matches while other columns correspond to capture groups if there are any, and PCRE matches are enabled with perl = TRUE (this is on by default). If capture groups are named the corresponding columns will bear those names.

Each match data column list contains match records, one for each element in text. A match record is a named list, with entries match, start and end that are respectively the matching (sub) string, the start, and the end positions (using one based indexing).

Extracting Match Data

To make it easier to extract matching substrings or positions, a special $ operator is defined on match columns, both for the .match column and the columns corresponding to the capture groups. See examples below.

Examples

name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
# Match first occurrence
pos <- re_exec(notables, name_rex)
pos

# Custom $ to extract matches and positions
pos$first$match
pos$first$start
pos$first$end
name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
# Match first occurrence
pos <- re_exec(notables, name_rex)
pos

# Custom $ to extract matches and positions
pos$first$match
pos$first$start
pos$first$end

Extract Data From All Regular Expression Matches Into a Data Frame

Description

Usage

re_exec_all(text, pattern, perl = TRUE, ...)
re_exec_all(text, pattern, perl = TRUE, ...)

Arguments

`text`	Character vector.
`pattern`	A regular expression. See `base::regex()` for more about regular expressions.
`perl`	logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.
`...`	Additional arguments to pass to `base::gregexpr()` (or `base::regexpr()` if `text` is of length zero).

Value

A tidy data frame (see Section “Tidy Data”). The entries within the match records within the list columns will be one vectors as long as there are matches for the corresponding text element.

Tidy Data

Extracting Match Data

Examples

name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
# All occurrences
allpos <- re_exec_all(notables, name_rex)
allpos

# Custom $ to extract matches and positions
allpos$first$match
allpos$first$start
allpos$first$end
name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
# All occurrences
allpos <- re_exec_all(notables, name_rex)
allpos

# Custom $ to extract matches and positions
allpos$first$match
allpos$first$start
allpos$first$end

Extract Regular Expression Matches Into a Data Frame

Description

re_match wraps base::regexpr() and returns the match results in a convenient data frame. The data frame has one column for each capture group if perl=TRUE, and one final columns called .match for the matching (sub)string. The columns of the capture groups are named if the groups themselves are named.

Usage

re_match(text, pattern, perl = TRUE, ...)
re_match(text, pattern, perl = TRUE, ...)

Arguments

`text`	Character vector.
`pattern`	A regular expression. See `base::regex()` for more about regular expressions.
`perl`	logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.
`...`	Additional arguments to pass to `base::regexpr()`.

Value

A data frame of character vectors: one column per capture group, named if the group was named, and additional columns for the input text and the first matching (sub)string. Each row corresponds to an element in the text vector.

Note

re_match uses PCRE compatible regular expressions by default (i.e. perl = TRUE in base::regexpr()). You can switch this off but if you do so capture groups will no longer be reported as they are only supported by PCRE.

Examples

dates <- c("2016-04-20", "1977-08-08", "not a date", "2016",
  "76-03-02", "2012-06-30", "2015-01-21 19:58")
isodate <- "([0-9]{4})-([0-1][0-9])-([0-3][0-9])"
re_match(text = dates, pattern = isodate)

# The same with named groups
isodaten <- "(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"
re_match(text = dates, pattern = isodaten)
dates <- c("2016-04-20", "1977-08-08", "not a date", "2016",
  "76-03-02", "2012-06-30", "2015-01-21 19:58")
isodate <- "([0-9]{4})-([0-1][0-9])-([0-3][0-9])"
re_match(text = dates, pattern = isodate)

# The same with named groups
isodaten <- "(?<year>[0-9]{4})-(?<month>[0-1][0-9])-(?<day>[0-3][0-9])"
re_match(text = dates, pattern = isodaten)

Extract All Regular Expression Matches Into a Data Frame

Description

This function is a thin wrapper on the base::gregexpr() base R function, to extract the matching (sub)strings as a data frame. It extracts all matches, and potentially their capture groups as well.

Usage

re_match_all(text, pattern, perl = TRUE, ...)
re_match_all(text, pattern, perl = TRUE, ...)

Arguments

`text`	Character vector.
`pattern`	A regular expression. See `base::regex()` for more about regular expressions.
`perl`	logical should perl compatible regular expressions be used? Defaults to TRUE, setting to FALSE will disable capture groups.
`...`	Additional arguments to pass to `base::gregexpr()` (or `base::regexpr()` if `text` is of length zero).

Value

A tidy data frame (see Section “Tidy Data”). The list columns contain character vectors with as many entries as there are matches for each input element.

Tidy Data

Note

If the input text character vector has length zero, base::regexpr() is called instead of base::gregexpr(), because the latter cannot extract the number and names of the capture groups in this case.

Examples

name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
re_match_all(notables, name_rex)
name_rex <- paste0(
  "(?<first>[[:upper:]][[:lower:]]+) ",
  "(?<last>[[:upper:]][[:lower:]]+)"
)
notables <- c(
  "  Ben Franklin and Jefferson Davis",
  "\tMillard Fillmore"
)
re_match_all(notables, name_rex)

Package 'rematch2'

Help Index

Match results from a data frame column and attach results

Description

Usage

Arguments

Functions

Note

See Also

Examples

Extract Data From First Regular Expression Match Into a Data Frame

Description

Usage

Arguments

Value

Tidy Data

Extracting Match Data

See Also

Examples

Extract Data From All Regular Expression Matches Into a Data Frame

Description

Usage

Arguments

Value

Tidy Data

Extracting Match Data

See Also

Examples

Extract Regular Expression Matches Into a Data Frame

Description

Usage

Arguments

Value

Note

See Also

Examples

Extract All Regular Expression Matches Into a Data Frame

Description

Usage

Arguments

Value

Tidy Data

Note

See Also

Examples