Find and replace¶
In this tutorial, you will master the art of finding, extracting, and transforming text with regular expressions. You will learn how to use re.findall(), re.finditer(), and re.sub() along with backreferences to perform powerful text manipulation.
Time commitment: 15–20 minutes
Prerequisites:
- Completion of Groups and capturing
- Basic Python knowledge (strings, variables, and functions)
Learning objectives¶
By the end of this tutorial, you will be able to:
- Use
re.findall()to extract all matches from a string - Use
re.finditer()to iterate over matches with full match object details - Use
re.sub()to replace matched text - Use backreferences in replacement strings
- Use
re.split()to split strings on patterns - Use functions as replacement arguments in
re.sub()
import re
Finding all matches with re.findall()¶
You have already seen re.search(), which finds the first match. The re.findall() function finds all non-overlapping matches and returns them as a list of strings.
text = 'The prices are £5.99, £12.50, and £100.00'
# Find all prices
prices = re.findall(r'£\d+\.\d{2}', text)
print(f'Prices found: {prices}')
# Find all words that start with a capital letter
text = 'Alice and Bob visited London and Paris last Summer'
capitalised_words = re.findall(r'\b[A-Z][a-z]+\b', text)
print(f'Capitalised words: {capitalised_words}')
Remember from the previous tutorial: when the pattern contains capturing groups, re.findall() returns the captured groups rather than the full matches.
text = 'Contact: alice@example.com, bob@test.co.uk'
# Without groups: returns full matches
print('Full matches:', re.findall(r'[\w.]+@[\w.]+', text))
# With groups: returns only the captured parts
print('Domains only:', re.findall(r'[\w.]+@([\w.]+)', text))
Iterating over matches with re.finditer()¶
The re.finditer() function returns an iterator of match objects, giving you access to the full match details (position, groups, and so on) for each match. This is more powerful than re.findall() when you need more than just the matched text.
text = 'Order #101 placed on 15/01/2026, Order #102 placed on 20/01/2026'
for match in re.finditer(r'Order #(\d+)', text):
print(f'Found "{match.group()}" at position {match.start()}-{match.end()}')
print(f' Order number: {match.group(1)}')
# Using finditer() with named groups for structured extraction
log_text = """2026-02-09 14:30:00 [INFO] Server started
2026-02-09 14:31:15 [WARNING] High memory usage
2026-02-09 14:32:00 [ERROR] Connection refused"""
pattern = re.compile(
r'(?P<date>[\d-]+) (?P<time>[\d:]+) \[(?P<level>\w+)\] (?P<message>.+)'
)
for match in pattern.finditer(log_text):
info = match.groupdict()
print(f'[{info["level"]:>7}] {info["time"]} - {info["message"]}')
Replacing text with re.sub()¶
The re.sub() function replaces all occurrences of a pattern with a replacement string. Its basic syntax is:
re.sub(pattern, replacement, string)
text = 'The colour of the colour wheel is colourful'
# Replace 'colour' with 'color'
result = re.sub(r'colour', 'color', text)
print(result)
# Remove all digits from a string
text = 'Room 42, Floor 3, Building 7'
result = re.sub(r'\d+', '', text)
print(f'Without digits: "{result}"')
# Replace multiple whitespace characters with a single space
text = 'Too many spaces here'
result = re.sub(r'\s+', ' ', text)
print(f'Cleaned: "{result}"')
Limiting replacements with count¶
You can limit the number of replacements using the count parameter.
text = 'one two three four five'
# Replace only the first two words with 'X'
result = re.sub(r'\w+', 'X', text, count=2)
print(result)
Backreferences in replacements¶
Backreferences allow you to refer to captured groups in the replacement string. Use \1, \2, and so on for numbered groups, or \g<name> for named groups.
# Swap first name and last name
text = 'Smith, Alice'
result = re.sub(r'(\w+), (\w+)', r'\2 \1', text)
print(result)
# Convert dates from DD/MM/YYYY to YYYY-MM-DD using named groups
text = 'Dates: 25/12/2026, 01/01/2027'
result = re.sub(
r'(?P<day>\d{2})/(?P<month>\d{2})/(?P<year>\d{4})',
r'\g<year>-\g<month>-\g<day>',
text,
)
print(result)
# Wrap all email addresses in angle brackets
text = 'Contact alice@example.com or bob@test.co.uk'
result = re.sub(r'([\w.]+@[\w.]+)', r'<\1>', text)
print(result)
Using functions as replacements¶
For more complex replacements, you can pass a function as the replacement argument. The function receives a match object and must return the replacement string.
def double_number(match: re.Match) -> str:
"""Double the matched number."""
number = int(match.group())
return str(number * 2)
text = 'I have 3 cats and 5 dogs'
result = re.sub(r'\d+', double_number, text)
print(result)
# Convert temperatures from Fahrenheit to Celsius
def fahrenheit_to_celsius(match: re.Match) -> str:
"""Convert a Fahrenheit temperature to Celsius."""
fahrenheit = float(match.group(1))
celsius = (fahrenheit - 32) * 5 / 9
return f'{celsius:.1f}°C'
text = 'Today: 68°F, Tomorrow: 77°F, Next week: 50°F'
result = re.sub(r'(\d+)°F', fahrenheit_to_celsius, text)
print(result)
Splitting strings with re.split()¶
The re.split() function splits a string at each occurrence of the pattern. This is more powerful than the built-in str.split() because you can split on patterns.
# Split on any whitespace (similar to str.split())
text = 'one two\tthree\nfour'
parts = re.split(r'\s+', text)
print(f'Split on whitespace: {parts}')
# Split on multiple delimiters (comma, semicolon, or pipe)
text = 'apple,banana;cherry|date'
parts = re.split(r'[,;|]', text)
print(f'Split on delimiters: {parts}')
# Split sentences on punctuation
text = 'First sentence. Second sentence! Third sentence? Fourth.'
sentences = re.split(r'[.!?]\s*', text)
# Filter out empty strings
sentences = [s for s in sentences if s]
print(f'Sentences: {sentences}')
Keeping the delimiters¶
If you wrap the pattern in a capturing group, re.split() includes the delimiters in the result.
text = '3+5-2*4'
# Without capturing group: delimiters are removed
print('Without delimiters:', re.split(r'[+\-*]', text))
# With capturing group: delimiters are kept
print('With delimiters: ', re.split(r'([+\-*])', text))
Using re.subn() to count replacements¶
The re.subn() function works just like re.sub() but also returns the number of replacements made.
text = 'foo bar foo baz foo'
result, count = re.subn(r'foo', 'qux', text)
print(f'Result: "{result}"')
print(f'Replacements made: {count}')
A practical example: cleaning messy data¶
Let us combine the techniques from this tutorial to clean up messy input data.
messy_data = """
Name: Alice Smith
Email: alice@example.com
Phone: 01234 567890
Date: 25/12/2026
"""
# Step 1: Extract key-value pairs
pairs = re.findall(r'(\w+):\s*(.+?)\s*$', messy_data, re.MULTILINE)
print('Extracted pairs:')
for key, value in pairs:
# Step 2: Clean up extra whitespace within values
clean_value = re.sub(r'\s+', ' ', value.strip())
print(f' {key}: {clean_value}')
text = 'Learning #Python and #regex is great! #coding #programming'
# Your code here
Exercise 2¶
Use re.sub() with a backreference to convert the names from "Last, First" format to "First Last" format.
names = 'Smith, Alice\nJones, Bob\nBrown, Charlie'
# Your code here
Exercise 3¶
Write a replacement function for re.sub() that censors any number in the text by replacing each digit with *.
text = 'Card number: 1234 5678 9012, PIN: 4321'
# Your code here
Solutions¶
# Exercise 1
text = 'Learning #Python and #regex is great! #coding #programming'
hashtags = re.findall(r'#\w+', text)
print(f'Hashtags: {hashtags}')
# Exercise 2
names = 'Smith, Alice\nJones, Bob\nBrown, Charlie'
result = re.sub(r'(\w+), (\w+)', r'\2 \1', names)
print(result)
# Exercise 3
def censor_number(match: re.Match) -> str:
"""Replace each digit in the matched text with an asterisk."""
return '*' * len(match.group())
text = 'Card number: 1234 5678 9012, PIN: 4321'
result = re.sub(r'\d+', censor_number, text)
print(result)
Summary¶
In this tutorial, you learned:
re.findall(): Find all non-overlapping matches and return them as a listre.finditer(): Iterate over matches with full match object detailsre.sub(): Replace all occurrences of a pattern with a replacement string- Backreferences: Use
\1,\2, or\g<name>in replacement strings to refer to captured groups - Function replacements: Pass a function to
re.sub()for complex replacement logic re.split(): Split strings on regex patterns, with optional delimiter retentionre.subn(): Replace and count the number of replacements made
Next steps¶
Congratulations — you have completed all four tutorials! You now have a solid foundation in Python regular expressions. From here, you can: