When to use this
Different data sources follow different naming conventions. A CRM might return ContactFirstName, an advertising platform might use campaign-id, and a database export might use TOTAL AMT. Before joining or analyzing data from multiple sources, renaming columns to a consistent convention (typically snake_case) makes everything easier to work with downstream.
A hubspot_contacts table in the Raw layer with mixed naming conventions:
| ContactId | FirstName | LastName | emailAddress | Phone Number |
|---|
| 101 | Alice | Johnson | alice@acme.com | +1-555-0101 |
| 102 | Bob | Smith | bob@globex.com | +1-555-0102 |
| 103 | Carol | Lee | carol@initech.com | +1-555-0103 |
We want to rename all columns to snake_case:
Implementation
Nekt Express / BigQuery
Athena SQL
Python (Nekt SDK)
BigQuery also uses column aliases. Use backticks for columns with spaces or special characters.SELECT
ContactId AS contact_id,
FirstName AS first_name,
LastName AS last_name,
emailAddress AS email_address,
`Phone Number` AS phone_number
FROM `raw.hubspot_contacts`
BigQuery is case-insensitive for column names by default, but using explicit aliases ensures consistent naming for BI tools and downstream consumers.
In Athena, use column aliases in a SELECT statement to rename each column explicitly.SELECT
"ContactId" AS contact_id,
"FirstName" AS first_name,
"LastName" AS last_name,
"emailAddress" AS email_address,
"Phone Number" AS phone_number
FROM raw.hubspot_contacts
Athena requires double quotes around column names that contain spaces, special characters, or are case-sensitive. Once aliased, downstream queries can use the clean snake_case names.
In PySpark, use withColumnRenamed for individual columns or toDF to rename all columns at once.import nekt
df = nekt.load_table(layer_name="Raw", table_name="hubspot_contacts")
renamed_df = (
df
.withColumnRenamed("ContactId", "contact_id")
.withColumnRenamed("FirstName", "first_name")
.withColumnRenamed("LastName", "last_name")
.withColumnRenamed("emailAddress", "email_address")
.withColumnRenamed("Phone Number", "phone_number")
)
nekt.save_table(
df=renamed_df,
layer_name="Trusted",
table_name="hubspot_contacts_clean"
)
For bulk renaming to snake_case, you can automate it with a helper that converts all column names at once:import re
def to_snake_case(name):
s = re.sub(r'([A-Z])', r'_\1', name).lower()
s = re.sub(r'[\s\-]+', '_', s)
return re.sub(r'_+', '_', s).strip('_')
renamed_df = df.toDF(*[to_snake_case(c) for c in df.columns])
Expected output
| contact_id | first_name | last_name | email_address | phone_number |
|---|
| 101 | Alice | Johnson | alice@acme.com | +1-555-0101 |
| 102 | Bob | Smith | bob@globex.com | +1-555-0102 |
| 103 | Carol | Lee | carol@initech.com | +1-555-0103 |
Tips and gotchas
Renaming columns doesn’t change the underlying data — it only affects how the column is referenced in the output. Always save the renamed table to a new layer (e.g., Trusted) so the raw data remains untouched.
Some BI tools are case-sensitive when referencing column names. Sticking to snake_case with all lowercase avoids surprises when connecting Looker Studio, Power BI, or Metabase to your data.