When to use this
Sometimes your data is in the wrong shape for the analysis you need. Pivoting turns distinct row values into columns (e.g., turning monthly rows into one column per month). Unpivoting does the reverse — turning columns into rows (e.g., turning q1_revenue, q2_revenue, q3_revenue columns into quarter and revenue rows).
A monthly_sales table in the Raw layer, with one row per product per month:
| product | month | revenue |
|---|
| Widget A | January | 5000 |
| Widget A | February | 7200 |
| Widget A | March | 6100 |
| Widget B | January | 3200 |
| Widget B | February | 4100 |
| Widget B | March | 3800 |
We want one row per product with a column for each month.
A quarterly_revenue table where each quarter is a separate column:
| product | q1_revenue | q2_revenue | q3_revenue |
|---|
| Widget A | 18300 | 22100 | 19500 |
| Widget B | 11100 | 13200 | 12800 |
We want to normalize this into product, quarter, and revenue columns.
Implementation — Pivot
Nekt Express / BigQuery
Athena SQL
Python (Nekt SDK)
BigQuery supports a native PIVOT clause for cleaner syntax.SELECT *
FROM (
SELECT product, month, revenue
FROM `raw.monthly_sales`
)
PIVOT (
SUM(revenue)
FOR month IN ('January', 'February', 'March')
)
The values in the IN clause must be known at query time. If your months are dynamic, use the conditional aggregation approach shown in the Athena tab.
Athena doesn’t have a native PIVOT keyword. Use conditional aggregation instead.SELECT
product,
SUM(CASE WHEN month = 'January' THEN revenue ELSE 0 END) AS january,
SUM(CASE WHEN month = 'February' THEN revenue ELSE 0 END) AS february,
SUM(CASE WHEN month = 'March' THEN revenue ELSE 0 END) AS march
FROM raw.monthly_sales
GROUP BY product
In PySpark, groupBy + pivot handles this natively.import nekt
df = nekt.load_table(layer_name="Raw", table_name="monthly_sales")
pivoted_df = (
df
.groupBy("product")
.pivot("month", ["January", "February", "March"])
.sum("revenue")
)
nekt.save_table(
df=pivoted_df,
layer_name="Trusted",
table_name="sales_by_month"
)
Passing the list of values to pivot() is optional but strongly recommended for performance — without it, Spark must scan the data to discover all distinct values first.
Pivot — Expected output
| product | January | February | March |
|---|
| Widget A | 5000 | 7200 | 6100 |
| Widget B | 3200 | 4100 | 3800 |
Implementation — Unpivot
Nekt Express / BigQuery
Athena SQL
Python (Nekt SDK)
BigQuery supports a native UNPIVOT clause.SELECT *
FROM `raw.quarterly_revenue`
UNPIVOT (
revenue FOR quarter IN (
q1_revenue AS 'Q1',
q2_revenue AS 'Q2',
q3_revenue AS 'Q3'
)
)
Athena doesn’t have a native UNPIVOT. Use CROSS JOIN UNNEST with arrays of column names and values.SELECT
product,
quarter,
revenue
FROM raw.quarterly_revenue
CROSS JOIN UNNEST(
ARRAY['Q1', 'Q2', 'Q3'],
ARRAY[q1_revenue, q2_revenue, q3_revenue]
) AS t(quarter, revenue)
In PySpark, use stack to unpivot columns into rows.import nekt
from pyspark.sql import functions as F
df = nekt.load_table(layer_name="Raw", table_name="quarterly_revenue")
unpivoted_df = df.select(
"product",
F.expr("""
stack(3,
'Q1', q1_revenue,
'Q2', q2_revenue,
'Q3', q3_revenue
) AS (quarter, revenue)
""")
)
nekt.save_table(
df=unpivoted_df,
layer_name="Trusted",
table_name="revenue_by_quarter"
)
Unpivot — Expected output
| product | quarter | revenue |
|---|
| Widget A | Q1 | 18300 |
| Widget A | Q2 | 22100 |
| Widget A | Q3 | 19500 |
| Widget B | Q1 | 11100 |
| Widget B | Q2 | 13200 |
| Widget B | Q3 | 12800 |
Tips and gotchas
Pivoting requires knowing the distinct values upfront (months, quarters, etc.). If the values change over time, you’ll need to update the query. For fully dynamic pivots, the Python approach is the most flexible — you can read distinct values from the data and build the pivot programmatically.
After pivoting, column names come from data values and may contain spaces, special characters, or start with numbers. Clean them up with Rename columns if needed.