Regular Expressions Cheat Sheet
What Are Regular Expressions?
Regular expressions (regex or regexp) are patterns used to match, search, and manipulate text. They are one of the most powerful tools in a developer’s toolkit, supported in virtually every programming language, text editor, and command-line tool. A well-crafted regex can replace dozens of lines of string-parsing code with a single expression.
Despite their power, regular expressions have a reputation for being difficult to read and write. This guide breaks down the syntax into manageable pieces and provides practical patterns you can use immediately.
Basic Character Matching
The simplest regex patterns match literal characters:
hellomatches the exact string “hello”.matches any single character except a newline\.matches a literal period (the backslash escapes the special meaning)
Character classes let you match one character from a set:
[abc]matches a, b, or c[a-z]matches any lowercase letter[A-Za-z0-9]matches any alphanumeric character[^abc]matches any character except a, b, or c
Predefined character classes provide shortcuts:
\dmatches any digit (equivalent to[0-9])\wmatches any word character (equivalent to[A-Za-z0-9_])\smatches any whitespace character (space, tab, newline)\D,\W,\Smatch the opposite of their lowercase versions
Quantifiers
Quantifiers specify how many times a pattern should repeat:
*matches zero or more times+matches one or more times?matches zero or one time (makes the preceding element optional){3}matches exactly 3 times{2,5}matches between 2 and 5 times{3,}matches 3 or more times
By default, quantifiers are greedy, meaning they match as much text as possible. Adding ? after a quantifier makes it lazy, matching as little as possible:
.*greedily matches everything.*?lazily matches as little as possible
This distinction matters when extracting content between delimiters. For example, given the text <b>one</b><b>two</b>, the pattern <b>.*</b> greedily matches the entire string, while <b>.*?</b> matches <b>one</b> only.
Anchors and Boundaries
Anchors match positions rather than characters:
^matches the start of a line$matches the end of a line\bmatches a word boundary (between a word character and a non-word character)\Bmatches a non-word boundary
These are essential for precise matching. \bcat\b matches “cat” as a whole word but not “category” or “concatenate.” Without word boundaries, you would get false matches in longer words.
Groups and Alternation
Groups use parentheses to create sub-patterns:
(abc)creates a capturing group that matches “abc” and remembers it(?:abc)creates a non-capturing group (matches but does not remember)(a|b)alternation matches “a” or “b”
Captured groups can be referenced later:
- In search-and-replace,
$1or\1refers to the first captured group - Backreferences like
\1within the pattern itself match the same text that was captured
Common Practical Patterns
Here are regex patterns for everyday tasks:
- Email (simple):
[\w.+-]+@[\w-]+\.[\w.-]+ - URL:
https?://[\w.-]+(?:/[\w./-]*)? - Phone (US):
\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4} - Date (YYYY-MM-DD):
\d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01]) - IPv4 address:
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} - Hex color:
#(?:[0-9a-fA-F]{3}){1,2} - HTML tag:
<([a-z]+)(?:\s[^>]*)?>.*?</\1> - Leading/trailing whitespace:
^\s+|\s+$
Note that simple regex patterns for emails and URLs work for common cases but do not cover every edge case defined in the respective RFCs. For production validation, consider using dedicated libraries.
Lookaheads and Lookbehinds
Lookahead and lookbehind assertions match a position based on what comes before or after, without including that text in the match:
(?=abc)positive lookahead: matches a position followed by “abc”(?!abc)negative lookahead: matches a position not followed by “abc”(?<=abc)positive lookbehind: matches a position preceded by “abc”(?<!abc)negative lookbehind: matches a position not preceded by “abc”
These are useful for extracting text adjacent to specific patterns without capturing the delimiter.
Regex Flags
Flags modify how the pattern is applied:
g(global): match all occurrences, not just the firsti(case-insensitive): ignore case differencesm(multiline):^and$match line starts and ends, not just string starts and endss(dotall):.matches newline characters as wellu(unicode): enable full Unicode support
Tips for Writing Better Regex
Follow these practices to write maintainable patterns:
- Start simple and iterate: Build your pattern piece by piece, testing each addition.
- Use a testing tool: A regex tester with live highlighting shows exactly what your pattern matches as you type.
- Be specific: Prefer
\dover.when you expect digits. Overly permissive patterns lead to false matches. - Comment complex patterns: Many languages support verbose regex mode where you can add comments and whitespace.
- Avoid catastrophic backtracking: Nested quantifiers like
(a+)+can cause exponential matching time on certain inputs. Keep your patterns efficient. - Know when not to use regex: Parsing HTML, JSON, or other structured formats is better handled by dedicated parsers.
Try our free Regex Tester — no signup required.
Explore all free tools on CalcHub
Browse Tools