Find the insights you need with the power of SQL and Python
Processing your data is like cooking — you take raw ingredients (your source data) and turn them into a finished dish (clean, organized data ready for analysis).In Nekt, there are three modules for processing your data:
Queries — Write SQL to process and aggregate your data
Notebooks — Run Python or PySpark for complex logic, API enrichment, or web scraping
History — Track changes in your data over time with a no-code SCD Type 2 template
This guide walks you through your first Query (SQL) and your first Notebook (Python/PySpark).
Using the data from the source configured in the previous section, let’s build a query that creates a lead scoring and segmentation engine.Here’s a sample of the source data:
We can use the AI assistant to help us build the SQL query:
Go to the Explorer page and click on the AI Assistant button on the right side of the screen.
Select the table you want to use for the query (in our case outbound_leads).
Enter the following prompt:
Copy
Ask AI
Create a SQL query that builds a lead scoring and segmentation engine. 1. Calculate a `lead_score` (0-100) based on weighted factors: - Source quality: Referral/Direct (30pts), LinkedIn/Trade Show (25pts), Email Campaign (20pts), Google/Facebook Ads (15pts), Social Media (10pts), Organic Search (5pts) - Recency: Leads from last 30 days (30pts), 31-90 days (20pts), 91-180 days (10pts), older (5pts) - Interest value: Technology/Healthcare/Finance (25pts), Real Estate/Consulting (20pts), others (15pts) - Age range fit: 25-44 (15pts), 35-54 (12pts), others (8pts) 2. Assign a `lead_tier` based on score: - 'Hot' for scores 80-100 - 'Warm' for scores 50-79 - 'Cold' for scores 0-49 Output all original columns plus the new calculated fields.
The AI assistant will generate the SQL query.
You’ll see the results in the section below the query editor.
Once the results are available, we can save it as a Query. This ensures the results are saved in your Lakehouse and can be accessed later via Explorer, Destinations, or Integrations (BI tools, AI agents, APIs):
In the action bar below the query editor, click on Save as Query.
Select the layer and define a name for the table that will be saved based on the query results.
Click Next.
Create a description and define the trigger.
Click Done.
Now that you created the query, you can run it manually or wait for the automated trigger to see the output table available in your Catalog.
The difference between running a query in Explorer and saving it as a Query is that a saved Query can be orchestrated and generates an output table in your Catalog, which can be used downstream for further processing and activation.
Video TutorialFor a visual walkthrough of how to build your first SQL query, watch this video:
We have a set of resources available to help you build Python notebooks, including:
Ready-to-use notebook templates
Data access tokens to securely access your data
Nekt SDK to easily access tables from your Lakehouse
Notebook templates are pre-configured with the necessary imports and setup to access data from your Lakehouse. They work like a playground where you can explore the data and validate your transformation logic before running it on Nekt.
Here’s the step by step guide for using a notebook template:
Go to the Notebooks module and click on the Add Notebook button.
Choose PySpark as the transformation type.
Click on Tokens to create a data access token.
In the modal that pops up, click on Create token.
Select the tables you want to use in your transformation.
Click Create.
Close the modal, we’ll come back to it later to copy the token and input tables.
Click on Notebooks and select Google Colab (feel free to use any other notebook provider you prefer).
This will open the Nekt template on Google Colab.
Click on File > Save a copy in Drive to save a copy of this template on your own Google account.
Run the cell right after the Default installations section to install the necessary dependencies - this will ensure you have the latest version of the Nekt SDK and the necessary libraries.
Copy the token you created earlier and replace the ADD_YOUR_TOKEN_HERE placeholder in the cell below.
In the Example section you have an example of transformation - you can keep the imports and remove the rest of the code.
Copy the input tables and paste them right below the imports.
Now you have everything you need to work on your transformation - the setup is done and data is already loaded into the notebook.Below you’ll find the PySpark equivalent code from the previous use case to validate your transformation logic before running it on Nekt.
When working with notebooks, you can split the code into multiple cells to make it easier to debug and test, as we can run them separately. You can make extensive use of the printSchema() function to check the schema of the dataframe and the show() function to print the first few rows of a dataframe.
Now that you have validated your logic, you can create a notebook in Nekt.
Go to the Notebooks module and click on the Add Notebook button.
Choose PySpark as the notebook type.
Copy and paste the code from your local notebook into the editor.
Add the nekt.save_table() call to ensure the final dataframe is saved to your Lakehouse as a new table.
Click Next.
Create a description and define the trigger.
Click Done.
Here’s the full code to create a PySpark notebook in Nekt.
PySpark Code to create a notebook in Nekt
Copy
Ask AI
# Custom importsimport nektfrom pyspark.sql import DataFramefrom pyspark.sql import functions as Foutbound_leads_df = nekt.load_table(layer_name="Raw", table_name="google_sheets_outbound_leads")creation_date_col = F.to_date(F.substring(F.col("creation_date"), 1, 10))# Calculate scoresoutbound_leads_df = outbound_leads_df \ .withColumn("source_score", F.when(F.col("source").isin("Referral", "Direct"), 30) .when(F.col("source").isin("LinkedIn", "Trade Show"), 25) .when(F.col("source") == "Email Campaign", 20) .when(F.col("source").isin("Google Ads", "Facebook Ads"), 15) .when(F.col("source") == "Social Media", 10) .when(F.col("source") == "Organic Search", 5) .otherwise(0) ) \ .withColumn("recency_score", F.when(creation_date_col >= F.date_sub(F.current_date(), 30), 30) .when(creation_date_col >= F.date_sub(F.current_date(), 90), 20) .when(creation_date_col >= F.date_sub(F.current_date(), 180), 10) .otherwise(5) ) \ .withColumn("interest_score", F.when(F.col("main_interest").isin("Technology", "Healthcare", "Finance"), 25) .when(F.col("main_interest").isin("Real Estate", "Consulting"), 20) .otherwise(15) ) \ .withColumn("age_score", F.when(F.col("age_range") == "25-44", 15) .when(F.col("age_range") == "35-54", 12) .otherwise(8))# Calculate total lead scoreoutbound_leads_with_score_df = outbound_leads_df \ .withColumn("lead_score", F.col("source_score") + F.col("recency_score") + F.col("interest_score") + F.col("age_score"))outbound_leads_with_score_df = outbound_leads_with_score_df \ .withColumn("lead_tier", F.when(F.col("lead_score").between(80, 100), "Hot") .when(F.col("lead_score").between(50, 79), "Warm") .otherwise("Cold"))# Create the final dataframe with all columnsfinal_df = outbound_leads_with_score_df.select( "name", "creation_date", "email", "source", "age_range", "main_interest", "lead_score", "lead_tier", "_nekt_sync_at")# Save the final dataframe to your Lakehouse as a new table - you're free to choose the layer and table name you prefernekt.save_table(dataframe=final_df, layer_name="Trusted", table_name="outbound_leads_with_score_pyspark")
The main difference is the addition of the nekt.save_table() call at the end to ensure the final dataframe is saved to your Lakehouse as a new table.Now that you created the notebook, you can run it manually or wait for the automated trigger to see the output table available in your Catalog.Video TutorialFor a visual walkthrough of how to build your first PySpark notebook, watch this video:
It’s been an exciting journey so far! But processing the data is just the beginning.➡️ Let’s move on to the next page to understand how we can put the transformed data in the hands of the right people.Need Help?If you encounter any issues during onboarding or need assistance, feel free to reach out to our support team. We’re here to help you get started.