Regular Expression Reference
Here is a quick reference for general Regular Expression help.
Introduction
Regular expressions (regex) provide powerful pattern matching features to software products:
- .NET v1.0+
- vbScript v5.5+
- Java v4+
- XML v1.0+
- Beyond Compare
- Notepad++
Flavors vary. For example, while a regex match is confined to a single line in Beyond Compare, regex can span multiple lines in Notepad++.
Syntax
- Special characters
[\^$.|?*+()
- Quantifier tags
{...}
- Escape modifier
\
Note: Escape sequences modify special characters to be literals, or modify literals for extended functions. - Beginning-of-line anchor
^
- End-of-line anchor
$
- Wildcard (any character)
.
- Alteration (or) operator
|
Character Classes
Note: Most regular expression "special characters" are treated as literal in character class definitions.- Special characters
^-]\
- Range operator
-
- Escape modifier
\
- Inclusive character class
[...]
- Exclusive/negated class
[^...]
Character Shorthand
Character Shorthand | Inclusive | Exclusive |
---|---|---|
Tab character | \t | |
Whitespace (including tabs) | \s | \S |
Numeric characters | \d | \D |
Word characters | \w | \W |
Boundaries
Word boundaries ensure whole-word matches by skipping results adjacent to other word characters (alphanumerics and underscores).- Word
\b
- Non-word
\B
Subexpressions
- Non-capturing subexpression
(?:...)
- Capturing subexpression
(...)
- Named subexpression
(?
...) - Backreference Match
\n
- Backreference Insert
$n
Quantifiers
Number of Matches | Greedy | Lazy | Possessive |
---|---|---|---|
0 or 1 (optional) | ? | ?? | ?+ |
0 or more (optional) | * | *? | *+ |
1 or more (required) | + | +? | ++ |
Exactly n matches | {n} | {n}? | |
At least n matches | {n,} | {n,}? | |
Min(n)/Max (m) | {n,m} | {n,m}? |
Note: Possessive quantifiers discard backtracking positions and can short circuit before completing all permutations. Used in performance tuning.
Mode Modifiers
Enable | Disable | |
---|---|---|
Case Insensitivity | (?i) | (?-i) |
Free-Spacing | (?x) | (?-x) |
Lookarounds
Positive | Negative | |
---|---|---|
Lookahead | (?...) | (?!...)( |
Look Behind | (?<=...) | (?<!...) |
- If/Then/Else
(?(if)then|else)
- with lookaround
(?(lookup)then|else)
- with alteration
(?(if)(then|then)|(else|else)
Atomic Expressions
Hint: A branch reset subexpression can capture an alteration match into a single backreference:
- Branch reset groups
(?|...)
- Alteration example
(?|(...)|(...)|(...))
- Atomic Subexpressions
(?>...)
Note: Atomic subexpressions discard backtracking positions and can short circuit before completing all permutations. Used in performance tuning.
Recursion
Note: The main purpose of recursion is to match balanced or nested constructs. An optional recursive expression is repeatedly applied until it fails, then the remaining expression is applied until all open levels of recursion have been closed. (?0) and \g<0> are synonyms for recursion.
- Recursion operator
(?R)
- Simple balanced match
...(?R)?...
- Begin/middle/end match
...(?:...|(?R))*...