Unnest arrays

When to use this

Many API sources return columns that contain arrays — for example, a CRM that stores deal labels as ["hot", "enterprise", "renewal"] in a single column, or an e-commerce platform that nests line items inside each order. To filter, aggregate, or join on these values you first need to unnest (or explode) the array so that each element becomes its own row.

Sample input

Imagine a crm_deals table ingested from your CRM into the Raw layer:

deal_id	deal_name	amount	labels
1	Acme Corp	50000	[“hot”, “enterprise”]
2	Globex Inc	12000	[“renewal”]
3	Initech	8500	[“hot”, “smb”, “trial”]

Each row has a labels column that is an array of strings. We want to produce one row per deal-label pair while keeping the other columns intact.

Implementation

Nekt Express / BigQuery
Athena SQL
Python (Nekt SDK)

BigQuery uses CROSS JOIN UNNEST directly on the array column. The alias becomes the new column name.

SELECT
  d.deal_id,
  d.deal_name,
  d.amount,
  label
FROM `raw.crm_deals` AS d
CROSS JOIN UNNEST(d.labels) AS label

If the column is a JSON string instead of a native ARRAY, parse it first:

CROSS JOIN UNNEST(JSON_EXTRACT_STRING_ARRAY(d.labels)) AS label

Athena (Trino / Presto) uses CROSS JOIN UNNEST to expand an array column. You alias both the virtual table and the new column.

SELECT
  d.deal_id,
  d.deal_name,
  d.amount,
  label
FROM raw.crm_deals AS d
CROSS JOIN UNNEST(d.labels) AS t(label)

If the labels column is stored as a JSON string rather than a native array, cast it first:

CROSS JOIN UNNEST(CAST(json_parse(d.labels) AS ARRAY(VARCHAR))) AS t(label)

In PySpark, use the explode function to turn each array element into a separate row.

import nekt
from pyspark.sql import functions as F

deals_df = nekt.load_table(layer_name="Raw", table_name="crm_deals")

unnested_df = deals_df.select(
    "deal_id",
    "deal_name",
    "amount",
    F.explode("labels").alias("label")
)

nekt.save_table(
    df=unnested_df,
    layer_name="Trusted",
    table_name="crm_deals_labels"
)

If the column contains a JSON string instead of a native array, parse it first with from_json:

from pyspark.sql.types import ArrayType, StringType

deals_df = deals_df.withColumn(
    "labels",
    F.from_json(F.col("labels"), ArrayType(StringType()))
)

Expected output

After unnesting, the table contains one row for every deal-label combination:

deal_id	deal_name	amount	label
1	Acme Corp	50000	hot
1	Acme Corp	50000	enterprise
2	Globex Inc	12000	renewal
3	Initech	8500	hot
3	Initech	8500	smb
3	Initech	8500	trial

Tips and gotchas

Unnesting removes rows where the array is empty ([]). If you need to keep those rows, use a LEFT JOIN variant instead:

Athena SQL: not directly supported — use a LEFT JOIN UNNEST (available since Trino 360+) or a UNION with a WHERE cardinality(labels) = 0 clause.
BigQuery: use LEFT JOIN UNNEST(d.labels) AS label instead of CROSS JOIN.
PySpark: use F.explode_outer("labels") instead of F.explode("labels") — it produces a NULL row for empty arrays.

Unnesting multiplies the row count — if a deal has 3 labels, it becomes 3 rows. Keep this in mind when calculating aggregates like SUM(amount), as you’ll need to deduplicate or aggregate at the right grain to avoid inflated numbers.

​When to use this

​Sample input

​Implementation

​Expected output

​Tips and gotchas

When to use this

Sample input

Implementation

Expected output

Tips and gotchas