Groups and capturing¶
In this tutorial, you will learn how to use parentheses to group parts of a pattern and extract specific portions of matched text. Groups are one of the most useful features in regular expressions, allowing you to organise patterns and pull out the exact data you need.
Time commitment: 15–20 minutes
Prerequisites:
- Completion of Character classes and quantifiers
- Basic Python knowledge (strings, variables, and functions)
Learning objectives¶
By the end of this tutorial, you will be able to:
- Use parentheses to create capturing groups
- Extract matched text with
.group(),.groups(), and.groupdict() - Use named groups with
(?P<name>...) - Apply non-capturing groups with
(?:...) - Use the alternation operator
|with groups
import re
Capturing groups¶
Capturing groups are created by enclosing part of a pattern in parentheses (...). They serve two purposes:
- They group elements together so that quantifiers apply to the whole group
- They capture the matched text so you can extract it later
Let us start with a simple example — extracting the day, month, and year from a date.
text = 'The event is on 25/12/2026'
match = re.search(r'(\d{2})/(\d{2})/(\d{4})', text)
if match:
print(f'Full match: {match.group()}')
print(f'Day: {match.group(1)}')
print(f'Month: {match.group(2)}')
print(f'Year: {match.group(3)}')
Each pair of parentheses creates a numbered group, starting from 1. The .group(0) (or simply .group()) returns the entire match, whilst .group(1), .group(2), and so on return each captured group.
You can also use .groups() to get all captured groups as a tuple.
text = 'The event is on 25/12/2026'
match = re.search(r'(\d{2})/(\d{2})/(\d{4})', text)
if match:
day, month, year = match.groups()
print(f'Date: {day}/{month}/{year}')
print(f'All groups: {match.groups()}')
Named groups¶
Numbered groups can be hard to read, especially with many groups. Named groups solve this by letting you assign a name to each group using the syntax (?P<name>...).
text = 'The event is on 25/12/2026'
match = re.search(r'(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})', text)
if match:
print(f'Day: {match.group("day")}')
print(f'Month: {match.group("month")}')
print(f'Year: {match.group("year")}')
Named groups are especially useful when working with complex patterns. You can also access them through .groupdict(), which returns a dictionary of all named groups.
text = 'Contact: Alice Smith, alice@example.com'
pattern = re.compile(
r'(?P<name>[A-Z][a-z]+ [A-Z][a-z]+), (?P<email>[\w.]+@[\w.]+)'
)
match = pattern.search(text)
if match:
details = match.groupdict()
print(f'Group dictionary: {details}')
print(f'Name: {details["name"]}')
print(f'Email: {details["email"]}')
Non-capturing groups¶
Sometimes you need parentheses for grouping (for example, to apply a quantifier to a group of characters) but you do not need to capture the matched text. Use (?:...) to create a non-capturing group.
# Capturing group: the prefix is captured
text = 'http://example.com'
match = re.search(r'(https?)://(.+)', text)
if match:
print(f'Groups with capturing: {match.groups()}')
# Non-capturing group: the prefix is not captured
match = re.search(r'(?:https?)://(.+)', text)
if match:
print(f'Groups with non-capturing: {match.groups()}')
In the first example, match.groups() returns two items (the protocol and the domain). In the second example, only the domain is captured because the protocol group uses (?:...). Non-capturing groups are useful for keeping your group numbering clean.
Alternation with groups¶
The alternation operator | works like a logical OR. When combined with groups, it lets you match one of several alternatives.
# Match different file extensions
filenames = ['report.pdf', 'image.png', 'data.csv', 'script.py', 'notes.txt']
pattern = re.compile(r'\w+\.(?:pdf|png|csv)')
for name in filenames:
if pattern.fullmatch(name):
print(f'"{name}" → matched')
else:
print(f'"{name}" → not matched')
# Use a capturing group to also extract the extension
pattern = re.compile(r'(\w+)\.(pdf|png|csv)')
for name in filenames:
match = pattern.fullmatch(name)
if match:
print(f'"{name}" → base: "{match.group(1)}", extension: "{match.group(2)}"')
Groups with quantifiers¶
You can apply quantifiers to groups to repeat the entire group pattern.
# Match an IP address (simplified)
text = 'Server IP: 192.168.1.100'
match = re.search(r'(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})', text)
if match:
print(f'Full IP: {match.group()}')
print(f'Octets: {match.groups()}')
# Use a non-capturing group with a quantifier to match repeated patterns
# Match a sequence like "abc-abc-abc"
text = 'Reference: ABC-123-XYZ'
match = re.search(r'[A-Z]{3}(?:-[A-Z0-9]{3}){2}', text)
if match:
print(f'Reference found: {match.group()}')
Groups with re.findall()¶
When you use re.findall() with a pattern that contains capturing groups, it returns the captured groups rather than the full matches. This is a common source of confusion, but it is also very useful for extracting data.
text = 'Dates: 25/12/2026, 01/01/2027, 14/02/2027'
# Without groups: returns full matches
print('Without groups:', re.findall(r'\d{2}/\d{2}/\d{4}', text))
# With one group: returns list of captured strings
print('One group (year):', re.findall(r'\d{2}/\d{2}/(\d{4})', text))
# With multiple groups: returns list of tuples
print('Multiple groups:', re.findall(r'(\d{2})/(\d{2})/(\d{4})', text))
A practical example: parsing log entries¶
Let us combine everything to parse a realistic log entry.
log_entry = '2026-02-09 14:30:45 [ERROR] Connection timeout after 30s'
pattern = re.compile(
r'(?P<date>\d{4}-\d{2}-\d{2})\s'
r'(?P<time>\d{2}:\d{2}:\d{2})\s'
r'\[(?P<level>\w+)\]\s'
r'(?P<message>.+)'
)
match = pattern.search(log_entry)
if match:
info = match.groupdict()
for key, value in info.items():
print(f'{key:>10}: {value}')
text = 'Meeting at 14:30'
# Your code here
Exercise 2¶
Use re.findall() with capturing groups to extract all the names and ages from the following text.
text = 'Alice is 30, Bob is 25, and Charlie is 35'
# Your code here
Exercise 3¶
Write a pattern using alternation to match both 'colour' and 'color'. Use a non-capturing group to avoid capturing the optional 'u'.
texts = ['I like this colour', 'I like this color', 'colourful display']
# Your code here
Solutions¶
# Exercise 1
text = 'Meeting at 14:30'
match = re.search(r'(?P<hours>\d{2}):(?P<minutes>\d{2})', text)
if match:
print(f'Hours: {match.group("hours")}')
print(f'Minutes: {match.group("minutes")}')
# Exercise 2
text = 'Alice is 30, Bob is 25, and Charlie is 35'
results = re.findall(r'(\w+) is (\d+)', text)
for name, age in results:
print(f'{name} is {age} years old')
# Exercise 3
texts = ['I like this colour', 'I like this color', 'colourful display']
pattern = re.compile(r'colou?r')
for text in texts:
match = pattern.search(text)
if match:
print(f'"{text}" → found "{match.group()}"')
Summary¶
In this tutorial, you learned:
- Capturing groups: Use
(...)to capture matched text and access it with.group(n)or.groups() - Named groups: Use
(?P<name>...)for readable patterns and access groups with.group("name")or.groupdict() - Non-capturing groups: Use
(?:...)when you need grouping but do not need to capture the text - Alternation: Use
|inside groups to match one of several alternatives - Groups with
re.findall(): When groups are present,re.findall()returns captured groups rather than full matches
Next steps¶
In the next tutorial, Find and replace, you will learn how to use re.sub(), re.findall(), re.finditer(), and backreferences to search, extract, and transform text.