When to use this
CRMs like Pipedrive, HubSpot, and Salesforce allow users to define custom fields on entities like deals, contacts, or companies. In the raw data, these typically appear as a separate key-value table (e.g., deal_fields) where each row maps a field key to a label, rather than as columns on the main entity table.
To make this data usable for analysis, you need to join the custom fields definition table with the main entity and pivot the key-value pairs into proper columns.
Two tables in the Raw layer:
deals — the main entity table, with custom fields stored as a JSON object mapping field keys to values:
| deal_id | deal_name | amount | custom_fields |
|---|
| 1 | Acme Corp | 50000 | {“cf_1”: “Enterprise”, “cf_2”: “Alice”, “cf_3”: “2024-06-01”} |
| 2 | Globex Inc | 12000 | {“cf_1”: “SMB”, “cf_2”: “Bob”, “cf_3”: “2024-07-15”} |
| 3 | Initech | 8500 | {“cf_1”: “Mid-Market”, “cf_2”: “Carol”, “cf_3”: null} |
deal_fields — the field definitions table, mapping each key to a human-readable label:
| field_key | field_label |
|---|
| cf_1 | Segment |
| cf_2 | Account Owner |
| cf_3 | Expected Close |
We want a final table where each custom field becomes a properly named column.
Implementation
Nekt Express / BigQuery
Athena SQL
Python (Nekt SDK)
Use JSON_VALUE to extract each custom field, with column aliases derived from the field definitions.SELECT
d.deal_id,
d.deal_name,
d.amount,
JSON_VALUE(d.custom_fields, '$.cf_1') AS segment,
JSON_VALUE(d.custom_fields, '$.cf_2') AS account_owner,
JSON_VALUE(d.custom_fields, '$.cf_3') AS expected_close
FROM `raw.deals` AS d
For a fully dynamic approach where custom field definitions change frequently, consider using a scripting block in BigQuery:DECLARE field_keys ARRAY<STRING>;
SET field_keys = (SELECT ARRAY_AGG(field_key) FROM `raw.deal_fields`);
-- Build dynamic SQL based on field_keys
Use json_extract_scalar to pull each custom field value and rename it using the known mapping.SELECT
d.deal_id,
d.deal_name,
d.amount,
json_extract_scalar(d.custom_fields, '$.cf_1') AS segment,
json_extract_scalar(d.custom_fields, '$.cf_2') AS account_owner,
json_extract_scalar(d.custom_fields, '$.cf_3') AS expected_close
FROM raw.deals AS d
If you want to build this dynamically from the deal_fields table (e.g., when field keys change often), you’ll need to generate the SQL programmatically. A common approach is to query deal_fields first, build the SELECT list, and then execute the final query.
In PySpark, parse the JSON into a map, then use the field definitions table to create named columns dynamically.import nekt
from pyspark.sql import functions as F
from pyspark.sql.types import MapType, StringType
deals_df = nekt.load_table(layer_name="Raw", table_name="deals")
fields_df = nekt.load_table(layer_name="Raw", table_name="deal_fields")
field_mapping = {
row["field_key"]: row["field_label"]
for row in fields_df.collect()
}
deals_df = deals_df.withColumn(
"custom_fields_map",
F.from_json(F.col("custom_fields"), MapType(StringType(), StringType()))
)
for key, label in field_mapping.items():
col_name = label.lower().replace(" ", "_")
deals_df = deals_df.withColumn(
col_name,
F.col("custom_fields_map").getItem(key)
)
result_df = deals_df.drop("custom_fields", "custom_fields_map")
nekt.save_table(
df=result_df,
layer_name="Trusted",
table_name="deals_expanded"
)
The PySpark approach is fully dynamic — whenever a new custom field is added to the CRM, it will automatically appear as a new column in the output without changing the code. This is one of the main advantages of using Python over static SQL for this transformation.
Expected output
| deal_id | deal_name | amount | segment | account_owner | expected_close |
|---|
| 1 | Acme Corp | 50000 | Enterprise | Alice | 2024-06-01 |
| 2 | Globex Inc | 12000 | SMB | Bob | 2024-07-15 |
| 3 | Initech | 8500 | Mid-Market | Carol | NULL |
Tips and gotchas
Custom field keys (like cf_1, cf_2) are internal identifiers and can be cryptic. Always map them to human-readable labels using the field definitions table before exposing the data to end users.
In the SQL examples, the field-to-column mapping is hardcoded. If your CRM has dozens of custom fields that change frequently, the Python approach is more maintainable — it reads the definitions table and builds columns dynamically. For SQL-based pipelines, consider generating the SQL as a pre-step using a script or Nekt’s AI assistant.
Some CRMs store custom fields differently — Salesforce uses actual columns on the object, while Pipedrive and HubSpot use key-value mappings. Check how your specific source stores custom fields in the Raw layer before applying this pattern.