Why Every Developer Needs Regex
Regular expressions (regex) are one of the most powerful tools in a developer's toolkit. Whether you are validating form input, parsing log files, extracting data from HTML, or performing complex search-and-replace operations, regex lets you describe text patterns in a concise, expressive language that works across virtually every programming language.
Despite their reputation for being cryptic, regular expressions follow a logical set of rules. Once you understand the building blocks — metacharacters, character classes, quantifiers, anchors, groups, and lookarounds — you can read and write even complex patterns with confidence. This regex cheat sheet covers every essential concept with practical examples you can copy and use immediately.
Regex Basics: Metacharacters and Literals
At its simplest, a regex pattern is a string of literal characters. The pattern hello matches the exact text "hello". But the real power of regex comes from metacharacters — special characters that have a meaning beyond their literal value.
| Metacharacter | Meaning | Example | Matches |
|---|---|---|---|
. |
Any character except newline | h.t |
hat, hot, hit, h9t |
^ |
Start of string (or line with m flag) |
^Hello |
"Hello world" but not "Say Hello" |
$ |
End of string (or line with m flag) |
world$ |
"Hello world" but not "world peace" |
* |
Zero or more of preceding element | ab*c |
ac, abc, abbc, abbbc |
+ |
One or more of preceding element | ab+c |
abc, abbc, abbbc (not ac) |
? |
Zero or one of preceding element | colou?r |
color, colour |
{n,m} |
Between n and m of preceding element | a{2,4} |
aa, aaa, aaaa |
[ ] |
Character class — match any one character inside | [aeiou] |
Any single vowel |
\ |
Escape a metacharacter to match it literally | \. |
A literal dot |
| |
Alternation (OR) | cat|dog |
cat or dog |
( ) |
Grouping and capturing | (ab)+ |
ab, abab, ababab |
To match a metacharacter literally, escape it with a backslash. For example, \. matches a literal period, \* matches a literal asterisk, and \\ matches a literal backslash.
Character Classes and Shorthand
Character classes let you match any one character from a defined set. You create a custom class by placing characters inside square brackets, or you can use built-in shorthand classes that represent common character groups.
| Shorthand | Equivalent | Meaning | Example Match |
|---|---|---|---|
\d |
[0-9] |
Any digit | 0, 5, 9 |
\D |
[^0-9] |
Any non-digit | a, #, space |
\w |
[a-zA-Z0-9_] |
Any word character | a, Z, 3, _ |
\W |
[^a-zA-Z0-9_] |
Any non-word character | !, @, space |
\s |
[ \t\n\r\f\v] |
Any whitespace | space, tab, newline |
\S |
[^ \t\n\r\f\v] |
Any non-whitespace | a, 1, ! |
[a-z] |
Custom range | Any lowercase letter | a, m, z |
[A-Z] |
Custom range | Any uppercase letter | A, M, Z |
[0-9] |
Same as \d |
Any digit | 0, 5, 9 |
[^abc] |
Negated class | Any character except a, b, or c | d, 1, ! |
You can combine ranges and individual characters in a single class. For example, [a-zA-Z0-9._%+-] matches any letter, digit, period, underscore, percent, plus, or hyphen — the common characters allowed in the local part of an email address.
// Match a hex color code: # followed by 3 or 6 hex digits
/#([0-9a-fA-F]{3}){1,2}\b/
// Match a US ZIP code: 5 digits, optionally followed by -4 digits
/^\d{5}(-\d{4})?$/
Quantifiers: How Many to Match
Quantifiers control how many times the preceding element must occur for a match to succeed. By default, quantifiers are greedy — they match as many characters as possible. Add a ? after the quantifier to make it lazy (match as few as possible).
| Quantifier | Meaning | Greedy Example | Lazy Version |
|---|---|---|---|
* |
0 or more | a.*b matches "aXYZb" in "aXYZbXb" |
a.*?b matches "aXYZb" |
+ |
1 or more | \d+ matches "123" in "abc123def" |
\d+? matches "1" |
? |
0 or 1 | https? matches "http" or "https" |
?? (rarely used) |
{n} |
Exactly n | \d{4} matches "2025" |
N/A (exact) |
{n,} |
n or more | \w{3,} matches words with 3+ chars |
\w{3,}? matches exactly 3 |
{n,m} |
Between n and m | [a-z]{2,5} matches 2 to 5 lowercase letters |
[a-z]{2,5}? matches exactly 2 |
Understanding greedy vs. lazy matching is essential for writing correct patterns. When scraping HTML, for example, using <div>.*</div> (greedy) would match from the first <div> all the way to the last </div> on the page. Using <div>.*?</div> (lazy) matches each individual div block.
Anchors and Boundaries
Anchors do not match characters — they match positions in the string. They are zero-width assertions that constrain where a match can occur.
^— Matches the start of the string. With themflag, matches the start of each line.$— Matches the end of the string. With themflag, matches the end of each line.\b— Word boundary. Matches the position between a word character (\w) and a non-word character.\B— Non-word boundary. Matches any position that is not a word boundary.
// \b prevents partial matches
/\bcat\b/ matches "cat" but NOT "concatenate" or "category"
// ^ and $ together ensure the ENTIRE string matches
/^\d{3}-\d{3}-\d{4}$/ validates "555-123-4567" as a complete phone format
// \B matches inside a word
/\Bcat\B/ matches "cat" in "concatenate" but NOT in "cat" or "category"
Groups and Capturing
Groups bundle part of a pattern together. Parentheses create a capturing group that saves the matched text for later use — in backreferences, replacements, or programmatic extraction.
(abc)— Capturing group. Matches "abc" and stores it as group 1.(?:abc)— Non-capturing group. Groups the pattern but does not store the match.(?<name>abc)— Named capturing group. Stores the match under the label "name".\1,\2— Backreferences. Match the same text that was captured by group 1, group 2, etc.
// Capturing group: extract area code from phone number
/\((\d{3})\)\s\d{3}-\d{4}/
// Input: "(555) 123-4567" -> Group 1 captures "555"
// Named group: extract date parts
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
// Input: "2025-08-15" -> year="2025", month="08", day="15"
// Backreference: find repeated words
/\b(\w+)\s+\1\b/
// Matches "the the", "is is", "hello hello"
// Non-capturing group: match file extensions without capturing
/\.(?:jpg|png|gif|webp)$/i
// Matches ".jpg", ".PNG", ".gif" — but does not capture the extension
Groups are especially useful in find-and-replace operations. You can reference captured groups in the replacement string using $1, $2, or $<name>. Our Regex Tester supports a replace mode where you can test patterns with backreferences like $1 and $2 in the replacement string.
Lookahead and Lookbehind
Lookahead and lookbehind are zero-width assertions that check whether a pattern exists ahead of or behind the current position without including those characters in the match. They are sometimes called "lookarounds."
(?=...)— Positive lookahead. Asserts that what follows matches the pattern.(?!...)— Negative lookahead. Asserts that what follows does not match the pattern.(?<=...)— Positive lookbehind. Asserts that what precedes matches the pattern.(?<!...)— Negative lookbehind. Asserts that what precedes does not match the pattern.
// Positive lookahead: match a number followed by "px"
/\d+(?=px)/ matches "16" in "16px" but not "16em"
// Negative lookahead: match a number NOT followed by "px"
/\d+(?!px)/ matches "16" in "16em" but not the "16" in "16px"
// Positive lookbehind: match a number preceded by "$"
/(?<=\$)\d+/ matches "50" in "$50" but not "50" in "50 items"
// Negative lookbehind: match a number NOT preceded by "$"
/(?<!\$)\d+/ matches "50" in "50 items" but not in "$50"
Lookarounds are commonly used for password validation. For example, to require at least one uppercase letter, one lowercase letter, one digit, and one special character:
// Password: 8+ chars, uppercase, lowercase, digit, special char
/^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/
Each (?=...) checks that the required character type exists somewhere in the string, without consuming any characters. The final .{8,} then matches the entire password of 8 or more characters. If you need to generate passwords that pass this kind of validation, our Password Generator creates cryptographically secure passwords with customizable character sets and length.
Common Regex Patterns
Here are battle-tested regex patterns for the most common validation tasks. You can paste any of these directly into our Regex Tester to see them in action.
| Pattern Name | Regex | Matches |
|---|---|---|
| Email address | ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ |
user@example.com |
| URL (HTTP/HTTPS) | ^https?:\/\/[^\s/$.?#].[^\s]*$ |
https://example.com/path |
| US phone number | ^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$ |
(555) 123-4567, 555.123.4567 |
| IPv4 address | ^(\d{1,3}\.){3}\d{1,3}$ |
192.168.1.1 |
| Date (YYYY-MM-DD) | ^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$ |
2025-08-15 |
| Hex color code | ^#([0-9a-fA-F]{3}){1,2}$ |
#FF8C00, #f0f |
| Strong password | ^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[\W_]).{8,}$ |
P@ssw0rd!, Str0ng#Key |
When working with URLs that contain special characters, remember that you may need to URL-encode values before applying regex validation. Similarly, when parsing structured data formats like JSON, our JSON Formatter can validate and pretty-print the data before you apply regex extraction.
Regex Flags
Flags (also called modifiers) change how the regex engine processes the pattern. In JavaScript, flags appear after the closing delimiter: /pattern/flags.
g(global) — Find all matches instead of stopping at the first one.i(case-insensitive) — Treat uppercase and lowercase letters as equivalent./hello/imatches "Hello", "HELLO", "hElLo".m(multiline) — Make^and$match the start and end of each line, not just the entire string.s(dotAll) — Make.match newline characters (\n) as well. Without this flag,.matches everything except newlines.u(unicode) — Enable full Unicode matching. Required for correctly handling emoji and multi-byte characters.y(sticky) — Match only from the position indicated bylastIndex. Useful for building tokenizers and parsers.
// Global + case-insensitive: find all "the" regardless of case
"The cat and the dog".match(/the/gi)
// Result: ["The", "the"]
// Multiline: match the start of each line
"line 1\nline 2\nline 3".match(/^line/gm)
// Result: ["line", "line", "line"]
// dotAll: match across newlines
"hello\nworld".match(/hello.world/s)
// Result: ["hello\nworld"]
When working with regex in different programming languages, be aware that flag syntax varies. Python uses re.IGNORECASE, PHP uses /pattern/i like JavaScript, and Java uses Pattern.CASE_INSENSITIVE. The concepts are the same, but the API differs. If you need to compare output from different regex implementations, our Text Diff Checker can highlight differences between results.
Frequently Asked Questions
What is a regular expression (regex)?
A regular expression (regex) is a sequence of characters that defines a search pattern. It is used for string matching, validation, search-and-replace operations, and text extraction in virtually every programming language including JavaScript, Python, PHP, Java, and Go. Regex patterns can match simple literal text or complex patterns using metacharacters, quantifiers, and groups.
What is the difference between .* and .*? in regex?
.* is a greedy quantifier that matches as many characters as possible, while .*? is a lazy (non-greedy) quantifier that matches as few characters as possible. For example, given the string "<b>hello</b><b>world</b>", the pattern <b>.*</b> matches the entire string from the first <b> to the last </b>, while <b>.*?</b> matches only "<b>hello</b>" — stopping at the first closing tag.
How do I validate an email address with regex?
A practical regex for email validation is ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. This pattern checks for one or more valid characters before the @ symbol, a domain name with dots, and a top-level domain of at least two letters. Note that the full RFC 5322 email specification is extremely complex, so most applications use a simplified pattern for basic validation and then confirm via a verification email.
What are lookahead and lookbehind in regex?
Lookahead and lookbehind are zero-width assertions that check whether a pattern exists ahead of or behind the current position without consuming characters. Positive lookahead (?=...) asserts that what follows matches the pattern. Negative lookahead (?!...) asserts that what follows does not match. Positive lookbehind (?<=...) and negative lookbehind (?<!...) work the same way but look backward. They are commonly used for password validation, conditional matching, and extracting text between delimiters.
What do the regex flags g, i, m, s, u, and y mean?
In JavaScript regex: g (global) finds all matches instead of stopping at the first; i (case-insensitive) ignores uppercase vs lowercase; m (multiline) makes ^ and $ match the start and end of each line instead of the entire string; s (dotAll) makes the dot (.) match newline characters; u (unicode) enables full Unicode matching; y (sticky) matches only at the exact position indicated by the lastIndex property. Combine flags like /pattern/gi for a global, case-insensitive search.
Conclusion
Regular expressions are a universal tool that every developer encounters sooner or later. The fundamentals are straightforward: metacharacters give special meaning to characters, character classes define sets of characters to match, quantifiers control repetition, anchors pin matches to specific positions, groups capture and organize sub-patterns, and lookarounds enable conditional matching without consuming text.
The patterns in this cheat sheet cover the vast majority of real-world use cases — from validating email addresses and phone numbers to extracting data from structured text. Bookmark this page as a reference, and when you need to build and test a regex pattern interactively, use our free Regex Tester. It highlights matches in real time, shows capture group details, supports all JavaScript flags, and includes a library of preset patterns to get you started.