RegEx

Regex Cheat Sheet

Basic matching

Each symbol matches a single character:

. anything (except line breaks)
\d digit (0123456789)
\D non-digit
\w “word”-characters (i.e. letters and digits and _ )
\W non-word
space
\s whitespace (␣,\t,\r,\n)
\S non-whitespace
\t tab
\r
\n
\r\n
return
new line
line break
line break encoding might be all of these depending on system

 

Character classes

[…] match any of the characters in the class: [aeiou] matches vowels
[^…] specifies complement set: [^aeiou] matches non-vowels (including non-letters!)
[…-…] specifies range: [a-e] matches abcde, [0-9a-f] matches 0123456789abcdef
POSIX Classes
[[:alpha:]] A-Z and a-z
[[:alnum:]] digits and letters A-Z and a-z
[[:punct:]] punctuation marks:     ?!.,:;

Boundaries

Boundary characters anchor pattern to some edge, but do not select any characters themselves.

\b word boundaries (any edge between \w and \W)
\B non word boundaries
^ beginning of line/string
 $ end of line/string

Disjunction

(X|Y) X or Y: \b(cat|dog)s\b matches cats and dogs

Quantifiers

X* 0 or more repeditions of X
X+ 1 or more repeditions of X
X? 0 or 1 instances of X
X{m} exactly m instances of X
X{m,} at least m instances of X
X{m,n} between m and n (inclusive) instances of X

Quantifiers just apply to one character. Use (…) to specify quantifier scope. ab+ matches ab, abb,abbb, abbbb, … ; (ab)+ matches ab,abab, ababab, …

Quantifiers are by default greedy. Add ? after quantifier to make it lazy:
Greedy:  ^.*b       aabaaba
Lazy:       ^.*?b     aabaaba

Special characters

The characters {}[]()^$.|*+?\   (and – inside […]) have special meaning and must be ‘escaped’ using  \ to match them, e.g.:
\. matches period .
\\ matches the backslash \