Skip to main content

When to use this

Many API sources return columns that contain arrays — for example, a CRM that stores deal labels as ["hot", "enterprise", "renewal"] in a single column, or an e-commerce platform that nests line items inside each order. To filter, aggregate, or join on these values you first need to unnest (or explode) the array so that each element becomes its own row.

Sample input

Imagine a crm_deals table ingested from your CRM into the Raw layer:
deal_iddeal_nameamountlabels
1Acme Corp50000[“hot”, “enterprise”]
2Globex Inc12000[“renewal”]
3Initech8500[“hot”, “smb”, “trial”]
Each row has a labels column that is an array of strings. We want to produce one row per deal-label pair while keeping the other columns intact.

Implementation

BigQuery uses CROSS JOIN UNNEST directly on the array column. The alias becomes the new column name.
SELECT
  d.deal_id,
  d.deal_name,
  d.amount,
  label
FROM `raw.crm_deals` AS d
CROSS JOIN UNNEST(d.labels) AS label
If the column is a JSON string instead of a native ARRAY, parse it first:
CROSS JOIN UNNEST(JSON_EXTRACT_STRING_ARRAY(d.labels)) AS label

Expected output

After unnesting, the table contains one row for every deal-label combination:
deal_iddeal_nameamountlabel
1Acme Corp50000hot
1Acme Corp50000enterprise
2Globex Inc12000renewal
3Initech8500hot
3Initech8500smb
3Initech8500trial

Tips and gotchas

Unnesting removes rows where the array is empty ([]). If you need to keep those rows, use a LEFT JOIN variant instead:
  • Athena SQL: not directly supported — use a LEFT JOIN UNNEST (available since Trino 360+) or a UNION with a WHERE cardinality(labels) = 0 clause.
  • BigQuery: use LEFT JOIN UNNEST(d.labels) AS label instead of CROSS JOIN.
  • PySpark: use F.explode_outer("labels") instead of F.explode("labels") — it produces a NULL row for empty arrays.
Unnesting multiplies the row count — if a deal has 3 labels, it becomes 3 rows. Keep this in mind when calculating aggregates like SUM(amount), as you’ll need to deduplicate or aggregate at the right grain to avoid inflated numbers.