Regex Patterns for Common Use Cases: Emails, URLs, Dates & More
A practical reference guide for regular expressions. Copy-paste ready patterns for validating emails, URLs, phone numbers, dates, and other common data formats.
Regular expressions are one of those tools that inspire equal parts admiration and dread. A well-crafted regex can validate, extract, and transform text in a single line. A poorly-crafted one can be incomprehensible, fragile, and in extreme cases, crash your application. This guide provides battle-tested patterns for common tasks, with enough explanation that you'll understand why each pattern works — not just how to copy-paste it.
How Regex Engines Think
Before diving into patterns, it helps to understand the basic mechanics. A regex engine reads your pattern left to right and tries to match it against the input string, character by character. When the engine encounters a quantifier like + or *, it has two strategies: greedy (match as much as possible, then backtrack if needed) and lazy (match as little as possible, then expand if needed). Most quantifiers are greedy by default, which leads to the most common source of unexpected behavior.
The engine also treats certain characters as special. A dot (.) matches any character, a caret (^) anchors to the start of the string, and a dollar sign ($) anchors to the end. When you need to match these characters literally — which happens constantly with URLs, IP addresses, and file paths — you escape them with a backslash.
Quick Reference
| Symbol | Meaning | Example |
|---|---|---|
. | Any character except newline | a.c matches "abc", "a1c" |
^ / $ | Start / end of string | ^Hello$ matches only "Hello" |
\\d / \\w / \\s | Digit / word char / whitespace | \\d{3} matches "123" |
[abc] | Character class | [aeiou] matches any vowel |
[^abc] | Negated class | [^0-9] matches non-digits |
{n,m} | Between n and m times | a{2,4} matches "aa", "aaa", "aaaa" |
(?=...) | Lookahead | foo(?=bar) matches "foo" only before "bar" |
Email Validation
Email validation is the classic regex use case, and also the classic example of where regex can mislead you. The full RFC 5322 email specification is so complex that a truly compliant regex would be over 6,000 characters long. In practice, you want a pattern that catches obvious typos without rejecting valid but unusual addresses.
The Practical Pattern
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
This pattern works by dividing the email into three parts. The ^[a-zA-Z0-9._%+-]+ portion matches the local part (before the @), allowing letters, digits, dots, underscores, percent signs, plus signs, and hyphens — these cover virtually all real-world email addresses. The @ is matched literally. Then [a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ handles the domain, requiring at least one dot followed by a TLD of two or more letters.
What this pattern deliberately doesn't handle: quoted local parts (like "john doe"@example.com), IP address domains (like user@[192.168.1.1]), and internationalized domain names. These are all technically valid but vanishingly rare. A pattern that accepts them would be far more complex and would rarely if ever catch a legitimate address that the simpler pattern rejects.
The deeper truth about email validation is that regex can only verify format, not deliverability. The only way to truly validate an email address is to send a confirmation message to it.
URL Validation
Standard URL Pattern
^(https?:\/\/)?([\da-z.-]+)\.([a-z.]{2,6})([\/\w .-]*)*\/?$
This pattern starts with an optional protocol (https?:\/\/ makes the "s" optional, matching both HTTP and HTTPS). The domain portion ([\da-z.-]+)\.([a-z.]{2,6}) matches the hostname and TLD. The path section ([\/\w .-]*)*\/?$ matches optional path segments.
Note the TLD length restriction of 2-6 characters ({2,6}). This was a reasonable assumption when TLDs were limited to .com, .org, and country codes, but newer TLDs like .museum, .technology, and .photography can exceed that limit. You may want to increase it to {2,63} or simply use {2,} if you don't want to impose a maximum.
URL with Query Parameters
^(https?:\/\/)?([\w.-]+)\.([a-z]{2,})(:\d+)?(\/[\w.-]*)*(\/)?([?].*)?$
This extended pattern adds support for port numbers ((:\d+)?), which you'll encounter with development servers (localhost:3000) and non-standard deployments. The ([?].*)? at the end matches the entire query string, which is intentionally permissive — trying to validate query parameter format is rarely worth the complexity.
Phone Numbers
Phone number validation is inherently regional, which makes a single universal pattern impractical. The E.164 international standard helps, but most applications need to accept the variety of formats users naturally type.
US Phone Number
^(\+1[-\s]?)?(\([0-9]{3}\)|[0-9]{3})[-\s]?[0-9]{3}[-\s]?[0-9]{4}$
This pattern handles the many ways Americans write phone numbers. The (\+1[-\s]?)? optionally matches the country code with an optional separator. The area code portion (\([0-9]{3}\)|[0-9]{3}) accepts both parenthesized and bare formats. The [-\s]? separators between groups allow dashes, spaces, or nothing at all.
The pattern successfully matches (123) 456-7890, 123-456-7890, +1 123 456 7890, and 1234567890. However, it won't catch impossible area codes (like 000) or invalid exchanges. For production use, consider validating against the North American Numbering Plan if accuracy matters.
International E.164
^\+[1-9]\d{1,14}$
The E.164 format is intentionally simple: a plus sign, a non-zero digit, then 1 to 14 additional digits. This format was designed for machine consumption, which is why it's the standard for storing phone numbers in databases and APIs.
Date Validation
ISO 8601 (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
The month portion (0[1-9]|1[0-2]) uses alternation to allow 01-09 or 10-12. The day portion (0[1-9]|[12]\d|3[01]) similarly restricts to 01-31. This catches obviously wrong months (13, 00) but doesn't validate that the specific day exists in the given month — February 30th would match. For calendar-accurate validation, you need programmatic logic, not regex.
US Format (MM/DD/YYYY) and European Format (DD/MM/YYYY)
^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$
Same validation logic, different field order. When accepting dates from international users, the ambiguity between MM/DD and DD/MM formats is a real problem that regex can't solve — you need to know which format the user intended. ISO 8601 avoids this ambiguity entirely, which is why it's the preferred format for data interchange.
Password Strength Validation
Lookahead-Based Password Rules
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
This pattern is a masterclass in lookaheads — assertions that check for a condition without consuming characters. Each (?=...) group peeks ahead through the entire string to confirm a condition is met, then returns to the current position.
The (?=.*[a-z]) confirms at least one lowercase letter exists anywhere. The (?=.*[A-Z]) does the same for uppercase. The (?=.*\d) requires at least one digit, and (?=.*[@$!%*?&]) requires a special character. Only after all four lookaheads pass does the main pattern [A-Za-z\d@$!%*?&]{8,} consume the actual characters, enforcing the minimum length.
Note that rigid password composition rules (requiring uppercase, numbers, special characters) are increasingly being questioned by security researchers. NIST's current guidelines recommend minimum length (at least 8, ideally 15+) over composition rules, since users respond to composition rules with predictable patterns like "Password1!".
Identifiers and Codes
UUID v4
^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
UUID v4 has a specific structure that makes it identifiable: the third group always starts with 4 (indicating version 4), and the fourth group always starts with 8, 9, a, or b (indicating the variant). The rest is random hexadecimal. This pattern validates both the format and these structural constraints.
Hex Color Code
^#?([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
This matches both 6-digit (#FF5733) and 3-digit shorthand (#F00) hex colors, with an optional hash prefix. The alternation tries the 6-digit match first (because it's listed first), which prevents the engine from matching only the first 3 characters of a 6-digit color.
IP Addresses
IPv4
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
IPv4 validation is more complex than it appears because each octet must be between 0 and 255, and regex doesn't understand numeric ranges natively. The pattern handles this by breaking the range into cases: 25[0-5] matches 250-255, 2[0-4][0-9] matches 200-249, and [01]?[0-9][0-9]? matches 0-199. This group repeats three times (with dots) plus once more for the final octet.
A naive pattern like \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} would match "999.999.999.999", which is why the explicit range checking matters.
Performance: Avoiding ReDoS
Regular Expression Denial of Service (ReDoS) is a real vulnerability that occurs when a pattern causes catastrophic backtracking, effectively hanging the regex engine. The classic trigger is nested quantifiers with overlapping character classes.
Dangerous Patterns
(a+)+$ ← Catastrophic on "aaaaaaaaaaaaaaaaX"
(a|a)+$ ← Same problem
([a-zA-Z]+)*$ ← Dangerous with long non-matching strings
The pattern (a+)+$ looks harmless, but when it fails to match (because of the trailing "X"), the engine backtracks through every possible way to divide the "a" characters between the inner + and the outer +. For n characters, this creates 2^n possible groupings. With just 25 characters, that's over 33 million combinations.
Prevention Strategies
Use atomic groups or possessive quantifiers when your regex engine supports them. These prevent backtracking once a quantifier has consumed characters. In engines without these features (like JavaScript), replace nested quantifiers with more specific character classes. Instead of ([a-zA-Z]+)*, use [a-zA-Z]* if you don't actually need the grouping.
For user-supplied regex patterns (like search features), set a timeout. Most regex engines allow this, and it's the most effective defense against ReDoS from untrusted patterns.
When Regex Isn't the Answer
Regex is a text pattern language, not a parser or a calculator. Some tasks look like regex problems but are better solved with other tools.
HTML/XML parsing is the most frequently cited example. The regex <[^>]+> matches HTML tags, but it breaks on attributes containing >, nested tags, comments, CDATA sections, and self-closing tags. Use a proper DOM parser instead — the regex approach will fail in production eventually.
Numeric range validation is another poor fit. Validating that a number falls between 1 and 365 requires an unwieldy pattern like ^([1-9]|[1-9]\d|[12]\d{2}|3[0-5]\d|36[0-5])$. Parsing the number and comparing it with if (n >= 1 && n <= 365) is clearer, faster, and less error-prone.
Complex grammar parsing — like mathematical expressions, programming languages, or nested structures (matching balanced parentheses) — exceeds what regular expressions can theoretically handle. These require context-free grammars and real parsers.
The rule of thumb: if your regex requires more than a few minutes to understand, and you find yourself writing complex lookaheads or backreferences to handle edge cases, a different approach will almost certainly be more maintainable.
Debugging Regex Patterns
When a pattern doesn't match what you expect, the most productive debugging method is to break it into pieces. Test each segment independently against your input, starting from the left. This quickly reveals which part of the pattern is failing.
Online tools that visualize the match process step-by-step (like regex101 or Loopaloo's Regex Tester) are invaluable. They show exactly where the engine backtracks, which helps you understand both why a match fails and whether your pattern has performance issues.
Another effective technique is to start with an overly-permissive pattern and progressively tighten it. Begin with .* matching everything, then replace pieces with specific character classes and quantifiers until your pattern is precise. This is less error-prone than trying to write the perfect pattern in one shot.
Conclusion
The patterns in this guide handle the most common validation tasks you'll encounter in web development. The key insight is that regex excels at format validation — confirming that a string looks like an email, URL, or phone number — but can't validate meaning. An email that passes regex validation might not exist. A date in the right format might not be a real calendar date.
Use these patterns as a first-pass filter to catch typos and obvious errors. Combine them with application logic for deeper validation. And when you find yourself building a regex that's longer than the code that calls it, step back and consider whether a different tool might be a better fit.
Try our Regex Tester to experiment with these patterns, test edge cases, and build your own — all processed locally in your browser.
Related Tools
Related Articles
Try Our Free Tools
200+ browser-based tools for developers and creators. No uploads, complete privacy.
Explore All Tools