Skip to content

Regex syntax reference

This reference covers the complete regular expression syntax supported by Python's re module. Use it as a lookup resource when building patterns.

For the official documentation, see the Python Regular Expression HOWTO and the re module documentation.

Literal characters

Most characters match themselves literally. For example, the pattern hello matches the text hello.

The following characters have special meanings and must be escaped with a backslash (\) to match them literally:

. ^ $ * + ? { } [ ] \ | ( )
import re

re.search(r'3\.14', '3.14')     # Matches literal dot
re.search(r'\$100', '$100')     # Matches literal dollar sign
re.search(r'file\(1\)', 'file(1)')  # Matches literal parentheses

Metacharacters

The dot (.)

Pattern Matches
. Any single character except a newline (unless re.DOTALL is set)
import re

re.findall(r'h.t', 'hat hot hut hit')  # ['hat', 'hot', 'hut', 'hit']

Anchors

Anchors match positions, not characters.

Pattern Matches
^ Start of the string (or start of each line with re.MULTILINE)
$ End of the string (or end of each line with re.MULTILINE)
\b Word boundary (between \w and \W, or at the start/end of the string)
\B Non-word boundary
\A Start of the string (not affected by re.MULTILINE)
\Z End of the string (not affected by re.MULTILINE)
import re

re.search(r'^Hello', 'Hello world')     # Matches at start
re.search(r'world$', 'Hello world')     # Matches at end
re.findall(r'\bcat\b', 'cat concatenate')  # ['cat']

Character classes

Character classes match a single character from a defined set.

Custom character classes

Pattern Matches
[abc] Any one of a, b, or c
[a-z] Any lowercase letter
[A-Z] Any uppercase letter
[0-9] Any digit
[a-zA-Z0-9] Any letter or digit
[^abc] Any character except a, b, or c
[^0-9] Any non-digit character

Special rules inside character classes:

  • Most metacharacters lose their special meaning inside [...]
  • The caret ^ has special meaning only at the start: [^abc]
  • The hyphen - indicates a range, except at the start or end: [-abc] or [abc-]
  • The closing bracket ] must be first if included literally: []abc]
  • The backslash \ still works as an escape character

Shorthand character classes

Pattern Equivalent Matches
\d [0-9] Any digit
\D [^0-9] Any non-digit
\w [a-zA-Z0-9_] Any word character (letter, digit, or underscore)
\W [^a-zA-Z0-9_] Any non-word character
\s [ \t\n\r\f\v] Any whitespace character
\S [^ \t\n\r\f\v] Any non-whitespace character

Note

With the re.UNICODE flag (the default in Python 3), \d, \w, and \s match Unicode equivalents as well. Use the re.ASCII flag to restrict them to ASCII characters only.

Quantifiers

Quantifiers control how many times the preceding element is matched.

Greedy quantifiers

Greedy quantifiers match as much text as possible.

Pattern Matches
* Zero or more times
+ One or more times
? Zero or one time
{n} Exactly n times
{n,} At least n times
{n,m} Between n and m times (inclusive)

Lazy quantifiers

Lazy quantifiers match as little text as possible. They are created by appending ? to a greedy quantifier.

Pattern Matches
*? Zero or more times (lazy)
+? One or more times (lazy)
?? Zero or one time (lazy)
{n,}? At least n times (lazy)
{n,m}? Between n and m times (lazy)
import re

text = '<b>bold</b>'

re.search(r'<.+>', text).group()     # '<b>bold</b>' (greedy)
re.search(r'<.+?>', text).group()    # '<b>' (lazy)

Groups

Capturing groups

Pattern Description
(...) Create a capturing group. The matched text is accessible through .group(n).

Named groups

Pattern Description
(?P<name>...) Create a named capturing group. Accessible through .group('name') or .groupdict().
(?P=name) Backreference to a named group within the same pattern.

Non-capturing groups

Pattern Description
(?:...) Group without capturing. Useful for applying quantifiers to a group.

Backreferences

Pattern Description
\1, \2, and so on Match the same text as the corresponding numbered group.
(?P=name) Match the same text as the named group name.
import re

# Backreference: match repeated words
re.search(r'\b(\w+)\s+\1\b', 'the the cat').group()
# 'the the'

# Named backreference
re.search(r'(?P<word>\w+)\s+(?P=word)', 'the the cat').group()
# 'the the'

Alternation

Pattern Description
a|b Match either a or b. Alternation has the lowest precedence of all operators.
import re

re.findall(r'cat|dog', 'I have a cat and a dog')
# ['cat', 'dog']

# Use groups to limit the scope of alternation
re.findall(r'col(?:ou|o)r', 'colour and color')
# ['colour', 'color']

Lookahead and lookbehind

Lookahead and lookbehind assertions match a position without consuming characters. They are sometimes called zero-width assertions.

Lookahead

Pattern Description
(?=...) Positive lookahead: matches if ... matches next, without consuming.
(?!...) Negative lookahead: matches if ... does not match next.
import re

# Positive lookahead: find words followed by a colon
re.findall(r'\w+(?=:)', 'name: Alice age: 30')
# ['name', 'age']

# Negative lookahead: find words NOT followed by a colon
re.findall(r'\w+(?!:)\b', 'name: Alice age: 30')
# ['nam', 'Alice', 'ag', '30']

Lookbehind

Pattern Description
(?<=...) Positive lookbehind: matches if ... matches immediately before the current position.
(?<!...) Negative lookbehind: matches if ... does not match immediately before.

Warning

Lookbehind patterns must be fixed-length in Python. You cannot use variable-length quantifiers (*, +, {n,m} where n and m differ) inside a lookbehind.

import re

# Positive lookbehind: find numbers preceded by £
re.findall(r'(?<=£)\d+\.?\d*', 'Prices: £5.99 and £12')
# ['5.99', '12']

# Negative lookbehind: find numbers NOT preceded by £
re.findall(r'(?<!£)\b\d+\.?\d*', 'Prices: £5.99 and 12 items')
# ['99', '12']

Conditional patterns

Pattern Description
(?(id)yes|no) Match yes pattern if group id matched, otherwise match no pattern. The no part is optional.
import re

# Match an optionally quoted word
pattern = re.compile(r'(")?(\w+)(?(1)")')
print(pattern.search('"hello"').group())   # "hello"
print(pattern.search('hello').group())     # hello

Special sequences summary

Sequence Description
\d Digit
\D Non-digit
\w Word character
\W Non-word character
\s Whitespace
\S Non-whitespace
\b Word boundary
\B Non-word boundary
\A Start of string
\Z End of string
\1 ... \9 Backreference to group 1\u20139
\n, \t, \r Newline, tab, carriage return (in raw strings, use \n and so on directly)