Documentation Index
Fetch the complete documentation index at: https://docs.nekt.com/llms.txt
Use this file to discover all available pages before exploring further.
When to use this
Many APIs store complex data as JSON strings inside a single column — for example, an address field that contains {"city": "São Paulo", "state": "SP", "zip": "01310-100"}, or a metadata column with nested properties. To query individual values, you need to parse the JSON and extract the fields you care about.
A customers table in the Raw layer where the address column is a JSON string:
| customer_id | name | address |
|---|
| 1 | Alice Johnson | {“city”: “São Paulo”, “state”: “SP”, “zip”: “01310-100”} |
| 2 | Bob Smith | {“city”: “New York”, “state”: “NY”, “zip”: “10001”} |
| 3 | Carol Lee | {“city”: “London”, “state”: “England”, “zip”: “EC1A 1BB”} |
We want to extract city, state, and zip into their own columns.
Implementation
Nekt Express / BigQuery
Athena SQL
Python (Nekt SDK)
BigQuery uses JSON_EXTRACT_SCALAR (or the shorthand JSON_VALUE in Standard SQL) to pull values from a JSON string.SELECT
customer_id,
name,
JSON_VALUE(address, '$.city') AS city,
JSON_VALUE(address, '$.state') AS state,
JSON_VALUE(address, '$.zip') AS zip
FROM `raw.customers`
JSON_VALUE returns a STRING. To extract typed values, use JSON_EXTRACT with a CAST:CAST(JSON_EXTRACT(data, '$.age') AS INT64) AS age
For arrays nested inside JSON, combine with JSON_EXTRACT_ARRAY and UNNEST. Athena (Trino) provides json_extract_scalar to pull individual values from a JSON string.SELECT
customer_id,
name,
json_extract_scalar(address, '$.city') AS city,
json_extract_scalar(address, '$.state') AS state,
json_extract_scalar(address, '$.zip') AS zip
FROM raw.customers
For nested JSON, chain the path: json_extract_scalar(data, '$.address.city'). For non-string values (numbers, booleans), use json_extract and then CAST:CAST(json_extract(data, '$.age') AS INTEGER) AS age
In PySpark, use from_json to parse the JSON string into a struct, then access individual fields with dot notation.import nekt
from pyspark.sql import functions as F
from pyspark.sql.types import StructType, StructField, StringType
df = nekt.load_table(layer_name="Raw", table_name="customers")
address_schema = StructType([
StructField("city", StringType()),
StructField("state", StringType()),
StructField("zip", StringType())
])
parsed_df = (
df
.withColumn("address_parsed", F.from_json(F.col("address"), address_schema))
.select(
"customer_id",
"name",
F.col("address_parsed.city").alias("city"),
F.col("address_parsed.state").alias("state"),
F.col("address_parsed.zip").alias("zip")
)
)
nekt.save_table(
df=parsed_df,
layer_name="Trusted",
table_name="customers_with_address"
)
If you don’t know the JSON schema upfront, you can use get_json_object for quick extraction without defining a struct:df.withColumn("city", F.get_json_object("address", "$.city"))
Expected output
| customer_id | name | city | state | zip |
|---|
| 1 | Alice Johnson | São Paulo | SP | 01310-100 |
| 2 | Bob Smith | New York | NY | 10001 |
| 3 | Carol Lee | London | England | EC1A 1BB |
Tips and gotchas
JSON extraction returns NULL when a key doesn’t exist in the object. If some rows have different JSON shapes (e.g., some include country and others don’t), the missing keys will silently become NULL — combine this with the Handle NULL values recipe as needed.
If your JSON contains arrays (e.g., "tags": ["hot", "enterprise"]), you’ll need to combine JSON parsing with Unnest arrays. First extract the array with JSON functions, then unnest the result.