When to use this
Raw data almost always has gaps — optional fields left blank, API responses with missing keys, or records that were partially synced. NULLs can break aggregations, cause unexpected JOIN behavior, and confuse BI tools. Handling them explicitly ensures your trusted layer is clean and predictable.Sample input
Acontacts table in the Raw layer with some missing values:
| contact_id | name | company | phone | |
|---|---|---|---|---|
| 1 | Alice Johnson | alice@acme.com | Acme | +1-555-0101 |
| 2 | Bob Smith | NULL | NULL | +1-555-0102 |
| 3 | NULL | carol@initech.com | Initech | NULL |
"Unknown" for text fields and "N/A" for phone.
Implementation
- Nekt Express / BigQuery
- Athena SQL
- Python (Nekt SDK)
BigQuery supports the same
COALESCE syntax. You can also use IFNULL as a shorthand when there are only two arguments.Expected output
| contact_id | name | company | phone | |
|---|---|---|---|---|
| 1 | Alice Johnson | alice@acme.com | Acme | +1-555-0101 |
| 2 | Bob Smith | Unknown | Unknown | +1-555-0102 |
| 3 | Unknown | carol@initech.com | Initech | N/A |
Tips and gotchas
Choose your default values carefully. Using
"Unknown" or "N/A" is common, but in some cases 0 for numbers or a specific sentinel date (e.g., 1970-01-01) may be more appropriate. Document your conventions so downstream consumers know what to expect.