SQL
SQL is a good choice for simple transformations that don’t require complex logic. You can use the AI assistant to generate the SQL query for you, or you can write the SQL query yourself.
Python Notebooks
Python is a good choice for transformations that require complex logic, data enrichment, or AI. You can also use popular libraries like pandas, numpy, scikit-learn, and more.
Use case
Using the data from the source configured in the previous section, let’s create a transformation that builds a lead scoring and segmentation engine. Here’s a sample of the source data:| Name | Creation Date | Source | Age Range | Main Interest | |
|---|---|---|---|---|---|
| Bonnie Jones | 2023-10-22 17:24:17 | erikakramer@example.org | Email Campaign | 65+ | Healthcare |
| Victoria Schaefer | 2023-10-23 17:24:17 | fisherjohn@example.net | Trade Show | 25-34 | Technology |
| Jonathon Lucas | 2023-10-23 17:24:17 | arichardson@example.com | 25-34 | Travel | |
| Rebecca Buchanan | 2023-10-26 17:24:17 | kblankenship@example.net | Referral | 65+ | Food & Beverage |
| Jennifer Rodriguez | 2023-10-27 17:24:17 | garycampos@example.org | Google Ads | 18-24 | Travel |
SQL
We can use the AI assistant to help us build the SQL query:- Go to the Explorer page and click on the AI Assistant button on the right side of the screen.
-
Select the table you want to use for the query (in our case
outbound_leads). -
Enter the following prompt:
- The AI assistant will generate the SQL query.
- You’ll see the results in the section below the query editor.
- In the action bar below the query editor, click on Create transformation.
- Select the layer and define a name for the table that will be saved based on the query results.
- Click Next.
- Create a description and define the trigger.
- Click Done.
The difference between a query on Explorer and a transformation is that a transformation can be orchestrated and generates an output table that is saved in your Catalog, which can be used downstream for further processing and activation.
Python Notebooks
We have a set of resources available to help you build your Python transformations, including:- Ready-to-use notebook templates
- Data access tokens to securely access your data
- Nekt SDK to easily access tables from your Lakehouse
Notebook templates are pre-configured with the necessary imports and setup to access data from your Lakehouse. They work like a playground where you can explore the data and validate your transformation logic before running it on Nekt.
Working with notebook templates
Here’s the step by step guide for using a notebook template:- Go to the Transformations module and click on the Add Transformation button.
- Choose PySpark as the transformation type.
- Click on Tokens to create a data access token.
- In the modal that pops up, click on Create token.
- Select the tables you want to use in your transformation.
- Click Create.
- Close the modal, we’ll come back to it later to copy the token and input tables.
- Click on Notebooks and select Google Collab (feel free to use any other notebook provider you prefer).
- This will open the Nekt template on Google Collab.
- Click on File > Save a copy in Drive to save a copy of this template on your own Google account.
- Run the cell right after the Default installations section to install the necessary dependencies - this will ensure you have the latest version of the Nekt SDK and the necessary libraries.
- Copy the token you created earlier and replace the
ADD_YOUR_TOKEN_HEREplaceholder in the cell below. - In the Example section you have an example of transformation - you can keep the imports and remove the rest of the code.
- Copy the input tables and paste them right below the imports.
PySpark Code
PySpark Code
When working with notebooks, you can split the code into multiple cells to make it easier to debug and test, as we can run them separately. You can make extensive use of the
printSchema() function to check the schema of the dataframe and the show() function to print the first few rows of a dataframe.Creating a PySpark transformation
Now that you have validated your transformation logic, you can create a transformation at Nekt.- Go to the Transformations module and click on the Add Transformation button.
- Choose PySpark as the transformation type.
- Copy and paste the code from your notebook template into the transformation editor.
- Add the
nekt.save_table()call to ensure the final dataframe is saved to your Lakehouse as a new table. - Click Next.
- Create a description and define the trigger.
- Click Done.
PySpark Code to create a transformation at Nekt
PySpark Code to create a transformation at Nekt
nekt.save_table() call at the end to ensure the final dataframe is saved to your Lakehouse as a new table.
Now that you created the transformation, you can run it manually or wait for the automated trigger to see the output table available in your Catalog.
Video Tutorial
For a visual walkthrough of how to build your first transformation using SQL, watch this video:
It’s been an exciting journey so far! But transforming the data is just the beginning.
➡️ Let’s move on to the next page to understand how we can put the transformed data in the hands of the right people.
Need Help? If you encounter any issues during onboarding or need assistance, feel free to reach out to our support team. We’re here to help you get started.