When to use this
Many APIs store complex data as JSON strings inside a single column — for example, anaddress field that contains {"city": "São Paulo", "state": "SP", "zip": "01310-100"}, or a metadata column with nested properties. To query individual values, you need to parse the JSON and extract the fields you care about.
Sample input
Acustomers table in the Raw layer where the address column is a JSON string:
| customer_id | name | address |
|---|---|---|
| 1 | Alice Johnson | {“city”: “São Paulo”, “state”: “SP”, “zip”: “01310-100”} |
| 2 | Bob Smith | {“city”: “New York”, “state”: “NY”, “zip”: “10001”} |
| 3 | Carol Lee | {“city”: “London”, “state”: “England”, “zip”: “EC1A 1BB”} |
city, state, and zip into their own columns.
Implementation
- Nekt Express / BigQuery
- Athena SQL
- Python (Nekt SDK)
BigQuery uses
JSON_EXTRACT_SCALAR (or the shorthand JSON_VALUE in Standard SQL) to pull values from a JSON string.Expected output
| customer_id | name | city | state | zip |
|---|---|---|---|---|
| 1 | Alice Johnson | São Paulo | SP | 01310-100 |
| 2 | Bob Smith | New York | NY | 10001 |
| 3 | Carol Lee | London | England | EC1A 1BB |
Tips and gotchas
If your JSON contains arrays (e.g.,
"tags": ["hot", "enterprise"]), you’ll need to combine JSON parsing with Unnest arrays. First extract the array with JSON functions, then unnest the result.