How do I parse a structured string into fields?¶
You've got a string with a known shape — a key=value config line, a log entry, a comma-separated row, a query string — and you want to pull the pieces out as separate values you can work with.
This recipe covers the standard-library tools for the job: str.split(), str.partition(), and a worked example that combines them to parse a query string into a dictionary. For anything more complex than a single delimiter, reach for the regex guide or a proper parser library.
def parse_query_string(query: str) -> dict[str, str]:
"""Parse a URL query string like 'name=alice&city=London' into a dict."""
result = {}
for pair in query.split("&"):
if not pair:
continue
# partition() always returns three values, even if "=" is missing,
# so we handle "flag" (no value) the same way as "key=value".
key, sep, value = pair.partition("=")
result[key] = value if sep else ""
return result
parsed = parse_query_string("name=alice&city=London&debug")
print(parsed)
# {'name': 'alice', 'city': 'London', 'debug': ''}
Three smaller patterns the worked example draws on.
# split() — when you need every occurrence of a delimiter
row = "Alice,28,London,Engineer"
fields = row.split(",")
print(fields) # ['Alice', '28', 'London', 'Engineer']
name, age, city, role = fields
print(f"{name} ({age}) — {role} in {city}")
# partition() — safer than split() when the separator might be missing
def parse_setting(line: str) -> tuple[str, str | None]:
key, sep, value = line.partition("=")
return key.strip(), value.strip() if sep else None
print(parse_setting("timeout=30")) # ('timeout', '30')
print(parse_setting("debug_mode")) # ('debug_mode', None) — no ValueError
# Parsing fixed-width fields — slice the string by column position
record = "001Alice Engineer London "
record_id = record[0:3].strip()
name = record[3:13].strip()
role = record[13:23].strip()
city = record[23:33].strip()
print({"id": record_id, "name": name, "role": role, "city": city})
# {'id': '001', 'name': 'Alice', 'role': 'Engineer', 'city': 'London'}
Why it works¶
The standard library splits the parsing problem into two methods that look similar but solve different problems.
str.split(sep) divides the string at every occurrence of sep and returns a list. It's the right tool when you have a list of values: CSV rows, command-line arguments, multi-value query parameters. Splitting "a,b,c" on "," gives you ["a", "b", "c"]. Splitting "a" on "," gives you ["a"] — never an exception, never a surprise.
str.partition(sep) splits on the first occurrence and always returns a three-tuple (before, sep, after). The sep slot tells you whether the separator was actually found — it's the empty string when it wasn't. That's why the worked example checks if sep else "": partition("=") lets you treat "flag" and "flag=" differently from "flag=value", all without an exception.
The pattern of "split on the outer delimiter, then partition on the inner one" is the bread-and-butter shape for query strings, header lines, simple config files, and any flat key-value format. The outer split("&") gives you a list of pairs; the inner partition("=") gives you a key and (maybe) a value for each one.
Trade-offs¶
These tools work for flat structures with predictable delimiters. The moment your data needs to handle quoted strings, escaped delimiters, or nested structure, switch to a real parser.
For CSV: use the csv module, not split(","). Real CSV files contain commas inside quoted fields, and split will silently corrupt the row. For JSON: use json.loads. For URL query strings in production: use urllib.parse.parse_qs, which handles percent-encoding correctly.
The fixed-width approach is fragile by design — it assumes every record has the same column layout. That's fine for legacy mainframe exports where the spec is locked in concrete; it's a bug factory for anything else.
If your input only sometimes contains the separator and you'd rather raise than guess, use split(sep, maxsplit=1) and unpack into two variables — that gives you a ValueError you can catch, which is sometimes what you want.
For irregular text — log lines, free-form addresses, anything where the structure is "mostly there" — regex earns its keep. See the related links.
Related¶
- How to clean and normalise text — run this before parsing so stray whitespace doesn't end up in your field values.
- How to extract data from text with regex — when
splitandpartitionaren't expressive enough. - How to avoid common string mistakes — including when to reach for
partitionoversplit. - String methods reference — the full menu, including
rsplit,splitlines, andrpartition.