When to use this
Date operations are at the heart of most analytics — calculating how many days since a customer’s last purchase, extracting the month from a timestamp for grouping, or formatting dates for a report. Each SQL engine handles dates slightly differently, so this recipe covers the most common operations side by side.Sample input
Anorders table in the Raw layer:
| order_id | customer_id | order_date | shipped_date |
|---|---|---|---|
| 1001 | 201 | 2024-01-15 10:30:00 | 2024-01-18 14:00:00 |
| 1002 | 202 | 2024-02-20 09:00:00 | 2024-02-22 11:30:00 |
| 1003 | 201 | 2024-03-05 16:45:00 | NULL |
- Extract the year and month from
order_date - Calculate the days between order and shipment
- Determine days since the order (relative to today)
Implementation
- Nekt Express / BigQuery
- Athena SQL
- Python (Nekt SDK)
BigQuery uses
EXTRACT, DATE_DIFF, and CURRENT_DATE().Expected output
Assuming today is 2024-12-01:| order_id | customer_id | order_date | shipped_date | order_year | order_month | days_to_ship | days_since_order |
|---|---|---|---|---|---|---|---|
| 1001 | 201 | 2024-01-15 10:30:00 | 2024-01-18 14:00:00 | 2024 | 1 | 3 | 321 |
| 1002 | 202 | 2024-02-20 09:00:00 | 2024-02-22 11:30:00 | 2024 | 2 | 2 | 285 |
| 1003 | 201 | 2024-03-05 16:45:00 | NULL | 2024 | 3 | NULL | 271 |
Tips and gotchas
When
shipped_date is NULL (e.g., order 1003 hasn’t shipped yet), all date difference calculations involving that column return NULL. This is the correct behavior — use Handle NULL values if you want to replace it with a default.Timestamps include both date and time. When calculating day differences, most engines ignore the time component and count calendar days. If you need hour-level or minute-level precision, use the appropriate unit (
'hour', 'minute') in SQL or F.unix_timestamp differences in PySpark.