Regular Expressions Cheat Sheet

2026-04-06

What Are Regular Expressions?

Regular expressions (regex or regexp) are patterns used to match, search, and manipulate text. They are one of the most powerful tools in a developer’s toolkit, supported in virtually every programming language, text editor, and command-line tool. A well-crafted regex can replace dozens of lines of string-parsing code with a single expression.

Despite their power, regular expressions have a reputation for being difficult to read and write. This guide breaks down the syntax into manageable pieces and provides practical patterns you can use immediately.

Basic Character Matching

The simplest regex patterns match literal characters:

hello matches the exact string “hello”
. matches any single character except a newline
\. matches a literal period (the backslash escapes the special meaning)

Character classes let you match one character from a set:

[abc] matches a, b, or c
[a-z] matches any lowercase letter
[A-Za-z0-9] matches any alphanumeric character
[^abc] matches any character except a, b, or c

Predefined character classes provide shortcuts:

\d matches any digit (equivalent to [0-9])
\w matches any word character (equivalent to [A-Za-z0-9_])
\s matches any whitespace character (space, tab, newline)
\D, \W, \S match the opposite of their lowercase versions

Quantifiers

Quantifiers specify how many times a pattern should repeat:

* matches zero or more times
+ matches one or more times
? matches zero or one time (makes the preceding element optional)
{3} matches exactly 3 times
{2,5} matches between 2 and 5 times
{3,} matches 3 or more times

By default, quantifiers are greedy, meaning they match as much text as possible. Adding ? after a quantifier makes it lazy, matching as little as possible:

.* greedily matches everything
.*? lazily matches as little as possible

This distinction matters when extracting content between delimiters. For example, given the text onetwo, the pattern .* greedily matches the entire string, while .*? matches one only.

Anchors and Boundaries

Anchors match positions rather than characters:

^ matches the start of a line
$ matches the end of a line
\b matches a word boundary (between a word character and a non-word character)
\B matches a non-word boundary

These are essential for precise matching. \bcat\b matches “cat” as a whole word but not “category” or “concatenate.” Without word boundaries, you would get false matches in longer words.

Groups and Alternation

Groups use parentheses to create sub-patterns:

(abc) creates a capturing group that matches “abc” and remembers it
(?:abc) creates a non-capturing group (matches but does not remember)
(a|b) alternation matches “a” or “b”

Captured groups can be referenced later:

In search-and-replace, $1 or \1 refers to the first captured group
Backreferences like \1 within the pattern itself match the same text that was captured

Common Practical Patterns

Here are regex patterns for everyday tasks:

Email (simple): [\w.+-]+@[\w-]+\.[\w.-]+
URL: https?://[\w.-]+(?:/[\w./-]*)?
Phone (US): $?\d{3}$?[-.\s]?\d{3}[-.\s]?\d{4}
Date (YYYY-MM-DD): \d{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12]\d|3[01])
IPv4 address: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
Hex color: #(?:[0-9a-fA-F]{3}){1,2}
HTML tag: <([a-z]+)(?:\s[^>]*)?>.*?</\1>
Leading/trailing whitespace: ^\s+|\s+$

Note that simple regex patterns for emails and URLs work for common cases but do not cover every edge case defined in the respective RFCs. For production validation, consider using dedicated libraries.

Lookaheads and Lookbehinds

Lookahead and lookbehind assertions match a position based on what comes before or after, without including that text in the match:

(?=abc) positive lookahead: matches a position followed by “abc”
(?!abc) negative lookahead: matches a position not followed by “abc”
(?<=abc) positive lookbehind: matches a position preceded by “abc”
(?<!abc) negative lookbehind: matches a position not preceded by “abc”

These are useful for extracting text adjacent to specific patterns without capturing the delimiter.

Regex Flags

Flags modify how the pattern is applied:

g (global): match all occurrences, not just the first
i (case-insensitive): ignore case differences
m (multiline): ^ and $ match line starts and ends, not just string starts and ends
s (dotall): . matches newline characters as well
u (unicode): enable full Unicode support

Tips for Writing Better Regex

Follow these practices to write maintainable patterns:

Start simple and iterate: Build your pattern piece by piece, testing each addition.
Use a testing tool: A regex tester with live highlighting shows exactly what your pattern matches as you type.
Be specific: Prefer \d over . when you expect digits. Overly permissive patterns lead to false matches.
Comment complex patterns: Many languages support verbose regex mode where you can add comments and whitespace.
Avoid catastrophic backtracking: Nested quantifiers like (a+)+ can cause exponential matching time on certain inputs. Keep your patterns efficient.
Know when not to use regex: Parsing HTML, JSON, or other structured formats is better handled by dedicated parsers.

Try our free Regex Tester — no signup required.

Explore all free tools on CalcHub

Browse Tools