Regex Cheat Sheet
Basic matching
Each symbol matches a single character:
| . | anything (except line breaks) |
| \d | digit (0123456789) |
| \D | non-digit |
| \w | “word”-characters (i.e. letters and digits and _ ) |
| \W | non-word |
| ␣ | space |
| \s | whitespace (␣,\t,\r,\n) |
| \S | non-whitespace |
| \t | tab |
| \r \n \r\n |
return new line line break line break encoding might be all of these depending on system |
Character classes
| […] | match any of the characters in the class: [aeiou] matches vowels |
| [^…] | specifies complement set: [^aeiou] matches non-vowels (including non-letters!) |
| […-…] | specifies range: [a-e] matches abcde, [0-9a-f] matches 0123456789abcdef |
| POSIX Classes | |
| [[:alpha:]] | A-Z and a-z |
| [[:alnum:]] | digits and letters A-Z and a-z |
| [[:punct:]] | punctuation marks: ?!.,:; |
Boundaries
Boundary characters anchor pattern to some edge, but do not select any characters themselves.
| \b | word boundaries (any edge between \w and \W) |
| \B | non word boundaries |
| ^ | beginning of line/string |
| $ | end of line/string |
Disjunction
| (X|Y) | X or Y: \b(cat|dog)s\b matches cats and dogs |
Quantifiers
| X* | 0 or more repeditions of X |
| X+ | 1 or more repeditions of X |
| X? | 0 or 1 instances of X |
| X{m} | exactly m instances of X |
| X{m,} | at least m instances of X |
| X{m,n} | between m and n (inclusive) instances of X |
Quantifiers just apply to one character. Use (…) to specify quantifier scope. ab+ matches ab, abb,abbb, abbbb, … ; (ab)+ matches ab,abab, ababab, …
Quantifiers are by default greedy. Add ? after quantifier to make it lazy:
Greedy: ^.*b aabaaba
Lazy: ^.*?b aabaaba
Special characters
The characters {}[]()^$.|*+?\ (and – inside […]) have special meaning and must be ‘escaped’ using \ to match them, e.g.:
\. matches period .
\\ matches the backslash \