Skip to main content

When to use this

CRMs like Pipedrive, HubSpot, and Salesforce allow users to define custom fields on entities like deals, contacts, or companies. In the raw data, these typically appear as a separate key-value table (e.g., deal_fields) where each row maps a field key to a label, rather than as columns on the main entity table. To make this data usable for analysis, you need to join the custom fields definition table with the main entity and pivot the key-value pairs into proper columns.

Sample input

Two tables in the Raw layer: deals — the main entity table, with custom fields stored as a JSON object mapping field keys to values:
deal_iddeal_nameamountcustom_fields
1Acme Corp50000{“cf_1”: “Enterprise”, “cf_2”: “Alice”, “cf_3”: “2024-06-01”}
2Globex Inc12000{“cf_1”: “SMB”, “cf_2”: “Bob”, “cf_3”: “2024-07-15”}
3Initech8500{“cf_1”: “Mid-Market”, “cf_2”: “Carol”, “cf_3”: null}
deal_fields — the field definitions table, mapping each key to a human-readable label:
field_keyfield_label
cf_1Segment
cf_2Account Owner
cf_3Expected Close
We want a final table where each custom field becomes a properly named column.

Implementation

Use JSON_VALUE to extract each custom field, with column aliases derived from the field definitions.
SELECT
  d.deal_id,
  d.deal_name,
  d.amount,
  JSON_VALUE(d.custom_fields, '$.cf_1') AS segment,
  JSON_VALUE(d.custom_fields, '$.cf_2') AS account_owner,
  JSON_VALUE(d.custom_fields, '$.cf_3') AS expected_close
FROM `raw.deals` AS d
For a fully dynamic approach where custom field definitions change frequently, consider using a scripting block in BigQuery:
DECLARE field_keys ARRAY<STRING>;
SET field_keys = (SELECT ARRAY_AGG(field_key) FROM `raw.deal_fields`);
-- Build dynamic SQL based on field_keys

Expected output

deal_iddeal_nameamountsegmentaccount_ownerexpected_close
1Acme Corp50000EnterpriseAlice2024-06-01
2Globex Inc12000SMBBob2024-07-15
3Initech8500Mid-MarketCarolNULL

Tips and gotchas

Custom field keys (like cf_1, cf_2) are internal identifiers and can be cryptic. Always map them to human-readable labels using the field definitions table before exposing the data to end users.
In the SQL examples, the field-to-column mapping is hardcoded. If your CRM has dozens of custom fields that change frequently, the Python approach is more maintainable — it reads the definitions table and builds columns dynamically. For SQL-based pipelines, consider generating the SQL as a pre-step using a script or Nekt’s AI assistant.
Some CRMs store custom fields differently — Salesforce uses actual columns on the object, while Pipedrive and HubSpot use key-value mappings. Check how your specific source stores custom fields in the Raw layer before applying this pattern.