String searching¶
In this tutorial, you will learn how to search within strings and test for the presence of substrings. These skills are essential for tasks such as validating input, extracting information, and filtering text.
Time commitment: 15–20 minutes
Prerequisites:
- Completion of String formatting
- Understanding of string methods and indexing
Learning objectives¶
By the end of this tutorial, you will be able to:
- Use the
inoperator to test for substring presence - Find the position of substrings with
find()andindex() - Search from the right with
rfind()andrindex() - Test string beginnings and endings with
startswith()andendswith() - Count occurrences with
count() - Combine searching methods to solve practical problems
The in operator¶
The simplest way to check whether a substring exists within a string is the in operator. It returns True if the substring is found and False otherwise.
sentence = "The quick brown fox jumps over the lazy dog"
print("fox" in sentence)
print("cat" in sentence)
The in operator is case-sensitive. If you need a case-insensitive search, convert both strings to the same case first.
text = "Hello, World!"
print("hello" in text) # False – case mismatch
print("hello" in text.lower()) # True – both lowercase
You can also use not in to check for the absence of a substring.
sentence = "The quick brown fox jumps over the lazy dog"
print("cat" not in sentence)
Finding substrings¶
While in tells you whether a substring exists, str.find() tells you where it is. It returns the index of the first occurrence of the substring, or -1 if the substring is not found.
sentence = "Python is a powerful programming language"
print(sentence.find("powerful")) # 12
print(sentence.find("Ruby")) # -1–not found
str.index() -- an alternative to str.find()¶
The str.index() method works the same as str.find(), but it raises a ValueError if the substring is not found. Use str.index() when you expect the substring to be present and want an error to alert you if it is not.
sentence = "Python is a powerful programming language"
print(sentence.index("powerful")) # 12
# Uncommenting the following line would raise a ValueError:
# sentence.index("Ruby")
str.rfind() and str.rindex() -- searching from the right¶
The methods str.rfind() and str.rindex() work the same way as str.find() and str.index(), but they search from the right side of the string. They return the index of the last occurrence of the substring.
path = "/home/user/documents/report.final.pdf"
print(path.find(".")) # 30–first dot
print(path.rfind(".")) # 36–last dot
Specifying start and end positions¶
All four methods -- str.find(), str.index(), str.rfind(), and str.rindex() -- accept optional start and end arguments to limit the search to a specific portion of the string.
text = "banana"
# Find the first "a" starting from index 2
print(text.find("a", 2)) # 3
# Find "a" between index 2 and index 4
print(text.find("a", 2, 4)) # 3
# Find "a" between index 4 and index 5
print(text.find("a", 4, 5)) # -1–not found in that range
Testing beginnings and endings¶
Python provides str.startswith() and str.endswith() for checking whether a string begins or ends with a particular substring.
url = "https://www.python.org"
print(url.startswith("https")) # True
print(url.startswith("http://")) # False
filename = "report.pdf"
print(filename.endswith(".pdf")) # True
print(filename.endswith(".docx")) # False
Using tuples for multiple options¶
Both str.startswith() and str.endswith() accept a tuple of strings to test against. The method returns True if the string matches any of the options in the tuple.
filename = "photo.jpg"
# Check for common image file extensions
is_image = filename.endswith((".jpg", ".jpeg", ".png", ".gif"))
print(is_image) # True
# Check for common web protocols
url = "ftp://files.example.com"
is_web = url.startswith(("http://", "https://"))
print(is_web) # False – this is an FTP URL
Counting occurrences¶
The str.count() method returns the number of non-overlapping occurrences of a substring within a string.
text = "she sells sea shells by the sea shore"
print(text.count("sea")) # 2
print(text.count("s")) # 6
print(text.count("xyz")) # 0
# Non-overlapping matches
text = "aaaa"
# Counting "aa" -- finds 2, not 3, because matches do not overlap
print(text.count("aa")) # 2
Character testing methods¶
Python strings include several methods that test whether all characters in the string satisfy a particular condition. These methods return True or False and are especially useful for input validation.
print("hello".isalpha()) # True – all alphabetic characters
print("hello123".isalpha()) # False – contains digits
print("12345".isdigit()) # True – all digit characters
print("12.34".isdigit()) # False – contains a dot
print("hello123".isalnum()) # True – all alphanumeric characters
print("hello 123".isalnum()) # False – contains a space
print("HELLO".isupper()) # True
print("Hello".isupper()) # False
print("hello".islower()) # True
print("Hello".islower()) # False
print("Hello World".istitle()) # True
print("Hello world".istitle()) # False
Practical use: input validation¶
def is_valid_username(username: str) -> bool:
"""Check whether a username is valid.
A valid username contains only alphanumeric characters
and is between 3 and 20 characters long.
"""
return username.isalnum() and 3 <= len(username) <= 20
print(is_valid_username("alice42")) # True
print(is_valid_username("ab")) # False – too short
print(is_valid_username("hello world")) # False – contains a space
print(is_valid_username("user@name")) # False – contains @
def get_extension(filename: str) -> str:
"""Return the file extension, including the leading dot.
Returns an empty string if no extension is found.
"""
dot_position = filename.rfind(".")
if dot_position == -1:
return ""
return filename[dot_position:]
print(get_extension("report.pdf")) # .pdf
print(get_extension("archive.tar.gz")) # .gz
print(get_extension("README")) # (empty string)
Basic email format validation¶
def is_valid_email(email: str) -> bool:
"""Perform a basic check on the format of an email address.
Checks that the email contains exactly one @ symbol,
has text before and after it, and the domain contains a dot.
"""
if email.count("@") != 1:
return False
local_part, domain = email.split("@")
if not local_part or not domain:
return False
if "." not in domain:
return False
if domain.startswith(".") or domain.endswith("."):
return False
return True
print(is_valid_email("user@example.com")) # True
print(is_valid_email("user@.com")) # False
print(is_valid_email("user@@example.com")) # False
print(is_valid_email("@example.com")) # False
Counting words in text¶
def count_words(text: str) -> int:
"""Count the number of words in a string.
Words are separated by whitespace. Leading and trailing
whitespace is ignored.
"""
return len(text.split())
print(count_words("The quick brown fox")) # 4
print(count_words(" lots of spaces ")) # 3
print(count_words("")) # 0
Exercises¶
Now it is time to practise what you have learned. Try to complete each exercise before looking at the solution.
Exercise 1: Extracting a domain name¶
Write a function called extract_domain() that takes a URL string and returns the domain name. For example, extract_domain("https://www.python.org/docs") should return "www.python.org". You can assume the URL always starts with "http://" or "https://".
# Exercise 1: Write your solution here
Exercise 2: Counting vowels¶
Write a function called count_vowels() that takes a string and returns the number of vowels (a, e, i, o, and u) it contains. The function should be case-insensitive.
# Exercise 2: Write your solution here
Exercise 3: Identifying a pangram¶
A pangram is a sentence that contains every letter of the alphabet at least once. Write a function called is_pangram() that takes a string and returns True if it is a pangram and False otherwise.
For example, is_pangram("The quick brown fox jumps over the lazy dog") should return True.
# Exercise 3: Write your solution here
Solutions¶
# Solution 1: Extracting a domain name
def extract_domain(url: str) -> str:
"""Extract the domain name from a URL."""
start = url.find("//") + 2
end = url.find("/", start)
if end == -1:
return url[start:]
return url[start:end]
print(extract_domain("https://www.python.org/docs")) # www.python.org
print(extract_domain("http://example.com")) # example.com
# Solution 2: Counting vowels
def count_vowels(text: str) -> int:
"""Count the number of vowels in a string (case-insensitive)."""
vowels = "aeiou"
return sum(1 for char in text.lower() if char in vowels)
print(count_vowels("Hello World")) # 3
print(count_vowels("AEIOU")) # 5
print(count_vowels("rhythm")) # 0
# Solution 3: Identifying a pangram
def is_pangram(text: str) -> bool:
"""Check whether the given text is a pangram."""
alphabet = "abcdefghijklmnopqrstuvwxyz"
lower_text = text.lower()
return all(letter in lower_text for letter in alphabet)
print(is_pangram("The quick brown fox jumps over the lazy dog")) # True
print(is_pangram("Hello World")) # False
Summary¶
In this tutorial, you learned how to search within strings and test for the presence of substrings using a variety of built-in methods and operators.
Here is a summary of the key points:
| Method / Operator | Purpose | Returns |
|---|---|---|
in / not in |
Test for substring presence or absence | True or False |
str.find() |
Find the first occurrence of a substring | Index, or -1 if not found |
str.index() |
Find the first occurrence of a substring | Index, or raises ValueError |
str.rfind() |
Find the last occurrence of a substring | Index, or -1 if not found |
str.startswith() |
Test whether a string starts with a prefix | True or False |
str.endswith() |
Test whether a string ends with a suffix | True or False |
str.count() |
Count non-overlapping occurrences | Integer count |
What you have covered in this tutorial series¶
This is the final tutorial in the introductory series on string processing with Python. Across the four tutorials, you have learned:
- String basics -- creating strings, indexing, slicing, and immutability
- String methods -- transforming and manipulating text with built-in methods
- String formatting -- producing well-formatted output with f-strings and
str.format() - String searching -- finding substrings, testing beginnings and endings, and validating input
Next steps¶
Now that you have a solid foundation in string processing, you can explore further:
- Browse the recipes for practical, task-oriented guides that solve specific string processing problems
- Consult the reference documentation for detailed information on string methods and the
stringmodule - Read the explanations for deeper understanding of topics such as string immutability, Unicode, and encoding
Well done on completing the tutorial series – you are well equipped to handle a wide variety of string processing tasks in Python!