Expand custom fields

When to use this

CRMs like Pipedrive, HubSpot, and Salesforce allow users to define custom fields on entities like deals, contacts, or companies. In the raw data, these typically appear as a separate key-value table (e.g., deal_fields) where each row maps a field key to a label, rather than as columns on the main entity table. To make this data usable for analysis, you need to join the custom fields definition table with the main entity and pivot the key-value pairs into proper columns.

Sample input

Two tables in the Raw layer: deals — the main entity table, with custom fields stored as a JSON object mapping field keys to values:

deal_id	deal_name	amount	custom_fields
1	Acme Corp	50000	{“cf_1”: “Enterprise”, “cf_2”: “Alice”, “cf_3”: “2024-06-01”}
2	Globex Inc	12000	{“cf_1”: “SMB”, “cf_2”: “Bob”, “cf_3”: “2024-07-15”}
3	Initech	8500	{“cf_1”: “Mid-Market”, “cf_2”: “Carol”, “cf_3”: null}

deal_fields — the field definitions table, mapping each key to a human-readable label:

field_key	field_label
cf_1	Segment
cf_2	Account Owner
cf_3	Expected Close

We want a final table where each custom field becomes a properly named column.

Implementation

Nekt Express / BigQuery
Athena SQL
Python (Nekt SDK)

Use JSON_VALUE to extract each custom field, with column aliases derived from the field definitions.

SELECT
  d.deal_id,
  d.deal_name,
  d.amount,
  JSON_VALUE(d.custom_fields, '$.cf_1') AS segment,
  JSON_VALUE(d.custom_fields, '$.cf_2') AS account_owner,
  JSON_VALUE(d.custom_fields, '$.cf_3') AS expected_close
FROM `raw.deals` AS d

For a fully dynamic approach where custom field definitions change frequently, consider using a scripting block in BigQuery:

DECLARE field_keys ARRAY<STRING>;
SET field_keys = (SELECT ARRAY_AGG(field_key) FROM `raw.deal_fields`);
-- Build dynamic SQL based on field_keys

Use json_extract_scalar to pull each custom field value and rename it using the known mapping.

SELECT
  d.deal_id,
  d.deal_name,
  d.amount,
  json_extract_scalar(d.custom_fields, '$.cf_1') AS segment,
  json_extract_scalar(d.custom_fields, '$.cf_2') AS account_owner,
  json_extract_scalar(d.custom_fields, '$.cf_3') AS expected_close
FROM raw.deals AS d

If you want to build this dynamically from the deal_fields table (e.g., when field keys change often), you’ll need to generate the SQL programmatically. A common approach is to query deal_fields first, build the SELECT list, and then execute the final query.

In PySpark, parse the JSON into a map, then use the field definitions table to create named columns dynamically.

import nekt
from pyspark.sql import functions as F
from pyspark.sql.types import MapType, StringType

deals_df = nekt.load_table(layer_name="Raw", table_name="deals")
fields_df = nekt.load_table(layer_name="Raw", table_name="deal_fields")

field_mapping = {
    row["field_key"]: row["field_label"]
    for row in fields_df.collect()
}

deals_df = deals_df.withColumn(
    "custom_fields_map",
    F.from_json(F.col("custom_fields"), MapType(StringType(), StringType()))
)

for key, label in field_mapping.items():
    col_name = label.lower().replace(" ", "_")
    deals_df = deals_df.withColumn(
        col_name,
        F.col("custom_fields_map").getItem(key)
    )

result_df = deals_df.drop("custom_fields", "custom_fields_map")

nekt.save_table(
    df=result_df,
    layer_name="Trusted",
    table_name="deals_expanded"
)

The PySpark approach is fully dynamic — whenever a new custom field is added to the CRM, it will automatically appear as a new column in the output without changing the code. This is one of the main advantages of using Python over static SQL for this transformation.

Expected output

deal_id	deal_name	amount	segment	account_owner	expected_close
1	Acme Corp	50000	Enterprise	Alice	2024-06-01
2	Globex Inc	12000	SMB	Bob	2024-07-15
3	Initech	8500	Mid-Market	Carol	NULL

Tips and gotchas

Custom field keys (like cf_1, cf_2) are internal identifiers and can be cryptic. Always map them to human-readable labels using the field definitions table before exposing the data to end users.

In the SQL examples, the field-to-column mapping is hardcoded. If your CRM has dozens of custom fields that change frequently, the Python approach is more maintainable — it reads the definitions table and builds columns dynamically. For SQL-based pipelines, consider generating the SQL as a pre-step using a script or Nekt’s AI assistant.

Some CRMs store custom fields differently — Salesforce uses actual columns on the object, while Pipedrive and HubSpot use key-value mappings. Check how your specific source stores custom fields in the Raw layer before applying this pattern.

​When to use this

​Sample input

​Implementation

​Expected output

​Tips and gotchas

When to use this

Sample input

Implementation

Expected output

Tips and gotchas