{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "ea1317a0",
   "metadata": {},
   "source": "# Parsing and formatting\n\nDatetimes spend most of their lives as strings — in JSON payloads, log files, CSV columns, filenames. This notebook covers turning strings into datetimes and back again. The two main tools are `fromisoformat`/`isoformat` for the canonical ISO 8601 format, and `strptime`/`strftime` for everything else."
  },
  {
   "cell_type": "markdown",
   "id": "8b59d36c",
   "metadata": {},
   "source": "## ISO 8601 — use this where you can\n\n`YYYY-MM-DDTHH:MM:SS` (with optional microseconds and time zone offset). It sorts correctly as a string, it's unambiguous, it's the canonical format for almost every API. Python parses and emits it natively."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "523efddc",
   "metadata": {},
   "outputs": [],
   "source": "from datetime import date, datetime\n\nd = date.fromisoformat(\"2026-04-21\")\ndt = datetime.fromisoformat(\"2026-04-21T14:30:00\")\nprint(d)\nprint(dt)"
  },
  {
   "cell_type": "markdown",
   "id": "5d5e49c6",
   "metadata": {},
   "source": "And the round-trip:"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7f18bda7",
   "metadata": {},
   "outputs": [],
   "source": "print(d.isoformat())\nprint(dt.isoformat())\nprint(dt.isoformat(sep=\" \", timespec=\"minutes\"))"
  },
  {
   "cell_type": "markdown",
   "id": "eea5c5d7",
   "metadata": {},
   "source": "Two notes:\n\n- **Python 3.11+** accepts almost any ISO 8601 string, including time-zone offsets like `2026-04-21T14:30:00+01:00` and `Z` (UTC). Earlier versions are stricter — they don't accept the `Z` suffix or all offset forms. If you're targeting older Python, stick with `strptime` for anything beyond plain `YYYY-MM-DDTHH:MM:SS`.\n- `fromisoformat` is *much* faster than `strptime`. If your data is ISO 8601, use it."
  },
  {
   "cell_type": "markdown",
   "id": "5a08b499",
   "metadata": {},
   "source": "## `strptime` — parsing arbitrary formats\n\nWhen the string isn't ISO 8601, use `strptime(string, format)`. The format string uses the same directives as `strftime` — `%Y`, `%m`, `%d`, and so on — see the [format codes reference](https://agilearn.co.uk/guides/dates-and-times/reference/strftime-format-codes) for the full table."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2f0b0f1d",
   "metadata": {},
   "outputs": [],
   "source": "from datetime import datetime\n\ndt = datetime.strptime(\"21/04/2026 14:30\", \"%d/%m/%Y %H:%M\")\nprint(dt)\n\ndt = datetime.strptime(\"April 21, 2026\", \"%B %d, %Y\")\nprint(dt)\n\ndt = datetime.strptime(\"2026-112\", \"%Y-%j\")    # day-of-year\nprint(dt)"
  },
  {
   "cell_type": "markdown",
   "id": "9c303c08",
   "metadata": {},
   "source": "If the format doesn't match the string, `strptime` raises `ValueError`. Catch that if you're processing untrusted data."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4289c2bb",
   "metadata": {},
   "outputs": [],
   "source": "try:\n    datetime.strptime(\"21-04-2026\", \"%d/%m/%Y\")\nexcept ValueError as e:\n    print(f\"{type(e).__name__}: {e}\")"
  },
  {
   "cell_type": "markdown",
   "id": "65919215",
   "metadata": {},
   "source": "## `strftime` — formatting datetimes as strings\n\n`strftime(format)` is the inverse — turn a datetime into a string using format directives."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4897189",
   "metadata": {},
   "outputs": [],
   "source": "dt = datetime(2026, 4, 21, 14, 30)\n\nprint(dt.strftime(\"%d/%m/%Y\"))                # 21/04/2026\nprint(dt.strftime(\"%A, %d %B %Y\"))            # Tuesday, 21 April 2026\nprint(dt.strftime(\"%Y-%m-%d %H:%M\"))          # 2026-04-21 14:30\nprint(dt.strftime(\"%Y-W%V-%u\"))                # ISO week-day"
  },
  {
   "cell_type": "markdown",
   "id": "fec6b910",
   "metadata": {},
   "source": "f-strings work too — `f\"{dt:%d/%m/%Y}\"` is equivalent to `dt.strftime(\"%d/%m/%Y\")` and shorter in the common case:"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b4c4be8",
   "metadata": {},
   "outputs": [],
   "source": "print(f\"{dt:%A %d %B %Y at %H:%M}\")"
  },
  {
   "cell_type": "markdown",
   "id": "7f08f29e",
   "metadata": {},
   "source": "### The directives you actually use\n\nThe full table is in the [reference page](https://agilearn.co.uk/guides/dates-and-times/reference/strftime-format-codes). The handful you'll use most:\n\n| Directive | Meaning | Example |\n| --- | --- | --- |\n| `%Y` | 4-digit year | `2026` |\n| `%m` | Zero-padded month | `04` |\n| `%d` | Zero-padded day | `21` |\n| `%H` | 24-hour hour | `14` |\n| `%M` | Zero-padded minute | `30` |\n| `%S` | Zero-padded second | `00` |\n| `%A` | Full weekday name | `Tuesday` |\n| `%B` | Full month name | `April` |\n| `%j` | Day of year | `112` |"
  },
  {
   "cell_type": "markdown",
   "id": "989cc4a2",
   "metadata": {},
   "source": "## Locale and weekday names\n\n`%A` and `%B` give *English* names by default. Locale-specific names (\"Mardi\", \"Avril\") require setting the locale, which is global state and platform-specific — usually more trouble than it's worth. For internationalised output, prefer the `babel` library or compute the day-of-week as an integer and look up the localised string yourself."
  },
  {
   "cell_type": "markdown",
   "id": "349eb974",
   "metadata": {},
   "source": "## Parsing data with mixed formats\n\nReal-world date columns often contain several formats. The common pattern is to try each format in turn and use the first one that works:"
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd87eb98",
   "metadata": {},
   "outputs": [],
   "source": "FORMATS = (\"%Y-%m-%d\", \"%d/%m/%Y\", \"%d-%b-%Y\")\n\ndef parse_date(s):\n    for fmt in FORMATS:\n        try:\n            return datetime.strptime(s, fmt).date()\n        except ValueError:\n            continue\n    raise ValueError(f\"no format matched {s!r}\")\n\nprint(parse_date(\"2026-04-21\"))\nprint(parse_date(\"21/04/2026\"))\nprint(parse_date(\"21-Apr-2026\"))"
  },
  {
   "cell_type": "markdown",
   "id": "2578c5d0",
   "metadata": {},
   "source": "The [parse-a-messy-date-column recipe](https://agilearn.co.uk/guides/dates-and-times/recipes/parse-a-messy-date-column) takes this idea further — handling missing values, ambiguous dates, and the dataframe case."
  },
  {
   "cell_type": "markdown",
   "id": "6bb6047d",
   "metadata": {},
   "source": "## Exercise\n\nYou receive log lines that look like:\n\n```\n2026-04-21T14:30:42 [INFO] User logged in\n2026-04-21T14:31:05 [ERROR] Database timeout\n```\n\nWrite a function `parse_log_line(line)` that returns a tuple of `(timestamp, level, message)`, where `timestamp` is a `datetime` object, `level` is the level string, and `message` is the rest. Test it on the two lines above."
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3298e48",
   "metadata": {},
   "outputs": [],
   "source": "# Your code here\n"
  },
  {
   "cell_type": "markdown",
   "id": "0641cb07",
   "metadata": {},
   "source": "<details>\n<summary>Solution</summary>\n\n```python\nfrom datetime import datetime\n\ndef parse_log_line(line):\n    ts_str, rest = line.split(\" \", 1)\n    timestamp = datetime.fromisoformat(ts_str)\n    level_part, message = rest.split(\"] \", 1)\n    level = level_part.lstrip(\"[\")\n    return timestamp, level, message\n\nprint(parse_log_line(\"2026-04-21T14:30:42 [INFO] User logged in\"))\nprint(parse_log_line(\"2026-04-21T14:31:05 [ERROR] Database timeout\"))\n```\n</details>"
  },
  {
   "cell_type": "markdown",
   "id": "70a0520e",
   "metadata": {},
   "source": "## Recap\n\n- `fromisoformat`/`isoformat` for ISO 8601 — the canonical format. Use this whenever possible.\n- `strptime(string, format)` to parse arbitrary formats.\n- `strftime(format)` (or the f-string `{dt:format}`) to produce arbitrary formats.\n- `%Y`/`%m`/`%d`/`%H`/`%M`/`%S` cover most of what you need; the [reference](https://agilearn.co.uk/guides/dates-and-times/reference/strftime-format-codes) has the rest.\n- For mixed-format inputs, try each format until one works.\n\nNext: [Time zones with `zoneinfo`](https://agilearn.co.uk/guides/dates-and-times/learn/03-time-zones-with-zoneinfo), where datetimes get a lot more interesting."
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}