{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How do I parse a structured string into fields?\n",
    "\n",
    "You've got a string with a known shape — a `key=value` config line, a log entry, a comma-separated row, a query string — and you want to pull the pieces out as separate values you can work with.\n",
    "\n",
    "This recipe covers the standard-library tools for the job: `str.split()`, `str.partition()`, and a worked example that combines them to parse a query string into a dictionary. For anything more complex than a single delimiter, [reach for the regex guide](https://agilearn.co.uk/guides/regex/recipes/extract-data-from-text) or a proper parser library."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_query_string(query: str) -> dict[str, str]:\n",
    "    \"\"\"Parse a URL query string like 'name=alice&city=London' into a dict.\"\"\"\n",
    "    result = {}\n",
    "    for pair in query.split(\"&\"):\n",
    "        if not pair:\n",
    "            continue\n",
    "        # partition() always returns three values, even if \"=\" is missing,\n",
    "        # so we handle \"flag\" (no value) the same way as \"key=value\".\n",
    "        key, sep, value = pair.partition(\"=\")\n",
    "        result[key] = value if sep else \"\"\n",
    "    return result\n",
    "\n",
    "\n",
    "parsed = parse_query_string(\"name=alice&city=London&debug\")\n",
    "print(parsed)\n",
    "# {'name': 'alice', 'city': 'London', 'debug': ''}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Three smaller patterns the worked example draws on."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# split() — when you need every occurrence of a delimiter\n",
    "row = \"Alice,28,London,Engineer\"\n",
    "fields = row.split(\",\")\n",
    "print(fields)  # ['Alice', '28', 'London', 'Engineer']\n",
    "\n",
    "name, age, city, role = fields\n",
    "print(f\"{name} ({age}) — {role} in {city}\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# partition() — safer than split() when the separator might be missing\n",
    "def parse_setting(line: str) -> tuple[str, str | None]:\n",
    "    key, sep, value = line.partition(\"=\")\n",
    "    return key.strip(), value.strip() if sep else None\n",
    "\n",
    "\n",
    "print(parse_setting(\"timeout=30\"))   # ('timeout', '30')\n",
    "print(parse_setting(\"debug_mode\"))   # ('debug_mode', None)  — no ValueError\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parsing fixed-width fields — slice the string by column position\n",
    "record = \"001Alice     Engineer  London    \"\n",
    "record_id = record[0:3].strip()\n",
    "name      = record[3:13].strip()\n",
    "role      = record[13:23].strip()\n",
    "city      = record[23:33].strip()\n",
    "\n",
    "print({\"id\": record_id, \"name\": name, \"role\": role, \"city\": city})\n",
    "# {'id': '001', 'name': 'Alice', 'role': 'Engineer', 'city': 'London'}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why it works\n",
    "\n",
    "The standard library splits the parsing problem into two methods that look similar but solve different problems.\n",
    "\n",
    "**`str.split(sep)`** divides the string at every occurrence of `sep` and returns a list. It's the right tool when you have a list of values: CSV rows, command-line arguments, multi-value query parameters. Splitting `\"a,b,c\"` on `\",\"` gives you `[\"a\", \"b\", \"c\"]`. Splitting `\"a\"` on `\",\"` gives you `[\"a\"]` — never an exception, never a surprise.\n",
    "\n",
    "**`str.partition(sep)`** splits on the *first* occurrence and always returns a three-tuple `(before, sep, after)`. The `sep` slot tells you whether the separator was actually found — it's the empty string when it wasn't. That's why the worked example checks `if sep else \"\"`: `partition(\"=\")` lets you treat `\"flag\"` and `\"flag=\"` differently from `\"flag=value\"`, all without an exception.\n",
    "\n",
    "The pattern of \"split on the outer delimiter, then partition on the inner one\" is the bread-and-butter shape for query strings, header lines, simple config files, and any flat key-value format. The outer `split(\"&\")` gives you a list of pairs; the inner `partition(\"=\")` gives you a key and (maybe) a value for each one."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Trade-offs\n",
    "\n",
    "These tools work for *flat* structures with predictable delimiters. The moment your data needs to handle quoted strings, escaped delimiters, or nested structure, switch to a real parser.\n",
    "\n",
    "For CSV: use the `csv` module, not `split(\",\")`. Real CSV files contain commas inside quoted fields, and `split` will silently corrupt the row. For JSON: use `json.loads`. For URL query strings in production: use `urllib.parse.parse_qs`, which handles percent-encoding correctly.\n",
    "\n",
    "The fixed-width approach is fragile by design — it assumes every record has the same column layout. That's fine for legacy mainframe exports where the spec is locked in concrete; it's a bug factory for anything else.\n",
    "\n",
    "If your input only *sometimes* contains the separator and you'd rather raise than guess, use `split(sep, maxsplit=1)` and unpack into two variables — that gives you a `ValueError` you can catch, which is sometimes what you want.\n",
    "\n",
    "For irregular text — log lines, free-form addresses, anything where the structure is \"mostly there\" — regex earns its keep. See the related links."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Related\n",
    "\n",
    "- [How to clean and normalise text](https://agilearn.co.uk/guides/string-processing/recipes/clean-and-normalise-text) — run this *before* parsing so stray whitespace doesn't end up in your field values.\n",
    "- [How to extract data from text with regex](https://agilearn.co.uk/guides/regex/recipes/extract-data-from-text) — when `split` and `partition` aren't expressive enough.\n",
    "- [How to avoid common string mistakes](https://agilearn.co.uk/guides/string-processing/recipes/avoid-common-string-mistakes) — including when to reach for `partition` over `split`.\n",
    "- [String methods reference](https://agilearn.co.uk/guides/string-processing/reference/string-methods-reference) — the full menu, including `rsplit`, `splitlines`, and `rpartition`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.12.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}