Skip to main content
Data transformations are like recipes - they take raw ingredients (your source data) and turn them into a finished dish (clean, organized data ready for analysis). A transformation can be as simples as cleaning up your data, to as complex as joining multiple tables, applying complex logic and even using AI to analyze the data for you. The possibilities are endless. In Nekt, you can build transformations using SQL or Python Notebooks. In the sections below you’ll find examples on how to build your first transformation using SQL and Python notebooks.

Use case

Using the data from the source configured in the previous section, let’s create a transformation that builds a lead scoring and segmentation engine. Here’s a sample of the source data:
NameCreation DateEmailSourceAge RangeMain Interest
Bonnie Jones2023-10-22 17:24:17erikakramer@example.orgEmail Campaign65+Healthcare
Victoria Schaefer2023-10-23 17:24:17fisherjohn@example.netTrade Show25-34Technology
Jonathon Lucas2023-10-23 17:24:17arichardson@example.comLinkedIn25-34Travel
Rebecca Buchanan2023-10-26 17:24:17kblankenship@example.netReferral65+Food & Beverage
Jennifer Rodriguez2023-10-27 17:24:17garycampos@example.orgGoogle Ads18-24Travel
With our data transformation, we will segment the leads, ordering them by highest likelihood to convert. Let’s put this into practice 🚀

SQL

We can use the AI assistant to help us build the SQL query:
  1. Go to the Explorer page and click on the AI Assistant button on the right side of the screen.
  2. Select the table you want to use for the query (in our case outbound_leads).
  3. Enter the following prompt:
    Create a SQL transformation that builds a lead scoring and segmentation engine.
    
     1. Calculate a `lead_score` (0-100) based on weighted factors:
       - Source quality: Referral/Direct (30pts), LinkedIn/Trade Show (25pts), Email Campaign (20pts), Google/Facebook Ads (15pts), Social Media (10pts), Organic Search (5pts)
       - Recency: Leads from last 30 days (30pts), 31-90 days (20pts), 91-180 days (10pts), older (5pts)
       - Interest value: Technology/Healthcare/Finance (25pts), Real Estate/Consulting (20pts), others (15pts)
       - Age range fit: 25-44 (15pts), 35-54 (12pts), others (8pts)
    
     2. Assign a `lead_tier` based on score:
       - 'Hot' for scores 80-100
       - 'Warm' for scores 50-79
       - 'Cold' for scores 0-49
     
     Output all original columns plus the new calculated fields.
    
  4. The AI assistant will generate the SQL query.
  5. You’ll see the results in the section below the query editor.
Once the results are available, we can create a transformation. This ensures the results are saved in your Lakehouse and can be accessed later via Explorer, Destinations or Visualizations (dataviz tools):
  1. In the action bar below the query editor, click on Create transformation.
  2. Select the layer and define a name for the table that will be saved based on the query results.
  3. Click Next.
  4. Create a description and define the trigger.
  5. Click Done.
Now that you created the transformation, you can run it manually or wait for the automated trigger to see the output table available in your Catalog.
The difference between a query on Explorer and a transformation is that a transformation can be orchestrated and generates an output table that is saved in your Catalog, which can be used downstream for further processing and activation.
Video Tutorial For a visual walkthrough of how to build your first transformation using SQL, watch this video:

Python Notebooks

We have a set of resources available to help you build your Python transformations, including:
  • Ready-to-use notebook templates
  • Data access tokens to securely access your data
  • Nekt SDK to easily access tables from your Lakehouse
Notebook templates are pre-configured with the necessary imports and setup to access data from your Lakehouse. They work like a playground where you can explore the data and validate your transformation logic before running it on Nekt.

Working with notebook templates

Here’s the step by step guide for using a notebook template:
  1. Go to the Transformations module and click on the Add Transformation button.
  2. Choose PySpark as the transformation type.
  3. Click on Tokens to create a data access token.
  4. In the modal that pops up, click on Create token.
  5. Select the tables you want to use in your transformation.
  6. Click Create.
  7. Close the modal, we’ll come back to it later to copy the token and input tables.
  8. Click on Notebooks and select Google Collab (feel free to use any other notebook provider you prefer).
  9. This will open the Nekt template on Google Collab.
  10. Click on File > Save a copy in Drive to save a copy of this template on your own Google account.
  11. Run the cell right after the Default installations section to install the necessary dependencies - this will ensure you have the latest version of the Nekt SDK and the necessary libraries.
  12. Copy the token you created earlier and replace the ADD_YOUR_TOKEN_HERE placeholder in the cell below.
  13. In the Example section you have an example of transformation - you can keep the imports and remove the rest of the code.
  14. Copy the input tables and paste them right below the imports.
Now you have everything you need to work on your transformation - the setup is done and data is already loaded into the notebook. Below you’ll find the PySpark equivalent code from the previous use case to validate your transformation logic before running it on Nekt.
# Custom imports
import nekt
from pyspark.sql import DataFrame
from pyspark.sql import functions as F

outbound_leads_df = nekt.load_table(layer_name="Raw", table_name="google_sheets_outbound_leads")

creation_date_col = F.to_date(F.substring(F.col("creation_date"), 1, 10))

# Calculate scores
outbound_leads_df = outbound_leads_df \
  .withColumn("source_score",
      F.when(F.col("source").isin("Referral", "Direct"), 30)
          .when(F.col("source").isin("LinkedIn", "Trade Show"), 25)
          .when(F.col("source") == "Email Campaign", 20)
          .when(F.col("source").isin("Google Ads", "Facebook Ads"), 15)
          .when(F.col("source") == "Social Media", 10)
          .when(F.col("source") == "Organic Search", 5)
          .otherwise(0)
      ) \
  .withColumn("recency_score",
    F.when(creation_date_col >= F.date_sub(F.current_date(), 30), 30)
    .when(creation_date_col >= F.date_sub(F.current_date(), 90), 20)
    .when(creation_date_col >= F.date_sub(F.current_date(), 180), 10)
    .otherwise(5)
  ) \
  .withColumn("interest_score",
    F.when(F.col("main_interest").isin("Technology", "Healthcare", "Finance"), 25)
    .when(F.col("main_interest").isin("Real Estate", "Consulting"), 20)
    .otherwise(15)
  ) \
  .withColumn("age_score",
    F.when(F.col("age_range") == "25-44", 15)
    .when(F.col("age_range") == "35-54", 12)
    .otherwise(8)
)

# Calculate total lead score
outbound_leads_with_score_df = outbound_leads_df \
  .withColumn("lead_score", F.col("source_score") + F.col("recency_score") + F.col("interest_score") + F.col("age_score"))

outbound_leads_with_score_df = outbound_leads_with_score_df \
  .withColumn("lead_tier", F.when(F.col("lead_score").between(80, 100), "Hot")
    .when(F.col("lead_score").between(50, 79), "Warm")
    .otherwise("Cold")
)

# Create the final dataframe with all columns
final_df = outbound_leads_with_score_df.select(
    "name",
    "creation_date",
    "email",
    "source",
    "age_range",
    "main_interest",
    "lead_score",
    "lead_tier",
    "_nekt_sync_at"
)
When working with notebooks, you can split the code into multiple cells to make it easier to debug and test, as we can run them separately. You can make extensive use of the printSchema() function to check the schema of the dataframe and the show() function to print the first few rows of a dataframe.

Creating a PySpark transformation

Now that you have validated your transformation logic, you can create a transformation at Nekt.
  1. Go to the Transformations module and click on the Add Transformation button.
  2. Choose PySpark as the transformation type.
  3. Copy and paste the code from your notebook template into the transformation editor.
  4. Add the nekt.save_table() call to ensure the final dataframe is saved to your Lakehouse as a new table.
  5. Click Next.
  6. Create a description and define the trigger.
  7. Click Done.
Here’s the full code to create a transformation at Nekt.
# Custom imports
import nekt
from pyspark.sql import DataFrame
from pyspark.sql import functions as F

outbound_leads_df = nekt.load_table(layer_name="Raw", table_name="google_sheets_outbound_leads")

creation_date_col = F.to_date(F.substring(F.col("creation_date"), 1, 10))

# Calculate scores
outbound_leads_df = outbound_leads_df \
  .withColumn("source_score",
      F.when(F.col("source").isin("Referral", "Direct"), 30)
          .when(F.col("source").isin("LinkedIn", "Trade Show"), 25)
          .when(F.col("source") == "Email Campaign", 20)
          .when(F.col("source").isin("Google Ads", "Facebook Ads"), 15)
          .when(F.col("source") == "Social Media", 10)
          .when(F.col("source") == "Organic Search", 5)
          .otherwise(0)
      ) \
  .withColumn("recency_score",
    F.when(creation_date_col >= F.date_sub(F.current_date(), 30), 30)
    .when(creation_date_col >= F.date_sub(F.current_date(), 90), 20)
    .when(creation_date_col >= F.date_sub(F.current_date(), 180), 10)
    .otherwise(5)
  ) \
  .withColumn("interest_score",
    F.when(F.col("main_interest").isin("Technology", "Healthcare", "Finance"), 25)
    .when(F.col("main_interest").isin("Real Estate", "Consulting"), 20)
    .otherwise(15)
  ) \
  .withColumn("age_score",
    F.when(F.col("age_range") == "25-44", 15)
    .when(F.col("age_range") == "35-54", 12)
    .otherwise(8)
)

# Calculate total lead score
outbound_leads_with_score_df = outbound_leads_df \
  .withColumn("lead_score", F.col("source_score") + F.col("recency_score") + F.col("interest_score") + F.col("age_score"))

outbound_leads_with_score_df = outbound_leads_with_score_df \
  .withColumn("lead_tier", F.when(F.col("lead_score").between(80, 100), "Hot")
    .when(F.col("lead_score").between(50, 79), "Warm")
    .otherwise("Cold")
)

# Create the final dataframe with all columns
final_df = outbound_leads_with_score_df.select(
    "name",
    "creation_date",
    "email",
    "source",
    "age_range",
    "main_interest",
    "lead_score",
    "lead_tier",
    "_nekt_sync_at"
)

# Save the final dataframe to your Lakehouse as a new table - you're free to choose the layer and table name you prefer
nekt.save_table(dataframe=final_df, layer_name="Trusted", table_name="outbound_leads_with_score_pyspark")
The main difference is the addition of the nekt.save_table() call at the end to ensure the final dataframe is saved to your Lakehouse as a new table. Now that you created the transformation, you can run it manually or wait for the automated trigger to see the output table available in your Catalog. Video Tutorial For a visual walkthrough of how to build your first transformation using SQL, watch this video:
It’s been an exciting journey so far! But transforming the data is just the beginning. ➡️ Let’s move on to the next page to understand how we can put the transformed data in the hands of the right people.
Need Help? If you encounter any issues during onboarding or need assistance, feel free to reach out to our support team. We’re here to help you get started.
I