Skip to main content
Nekt is a modern data platform that provides comprehensive data engineering solutions for companies. It enables organizations to extract, transform, and load data from various external sources through custom connectors, manage data pipelines, and create powerful analytics workflows.

Configuring Nekt as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the Nekt option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

You’ll need the following credentials from your Nekt account:
  • API Key: Your personal API key for accessing the Nekt API. You can generate one in your Nekt workspace settings under the “API Keys” section. Make sure the key has the necessary permissions to read the data you want to sync.
  • Start Date: The earliest record date to sync.
Once you have the required credentials, add the account access and click Next.

2. Select streams

Choose which data streams you want to sync - you can select all streams or pick specific ones that matter most to you.
Tip: The stream can be found more easily by typing its name.
Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select a name for each table (which will contain the fetched data) and the type of sync.
  • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix and make this process faster!
  • Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
    • Incremental: every time the extraction happens, we’ll get only the new data - which is good if, for example, you want to keep every record ever fetched.
    • Full table: every time the extraction happens, we’ll get the current state of the data - which is good if, for example, you don’t want to have deleted data in your catalog.
Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can determine when to execute a full sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while. Once you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.
For you to be able to see it on your Catalog, you need at least one successful source run.

Streams and Fields

Below you’ll find all available data streams from Nekt and their corresponding fields:
Stream containing all data sources configured in your Nekt workspace.Key Fields:
  • id - Unique identifier for the source
  • slug - URL-friendly identifier for the source
  • active - Whether the source is active
  • status - Current status of the source
  • description - Description of the source
  • table_prefix - Prefix applied to tables created by this source
  • created_at - When the source was created
  • updated_at - When the source was last updated
  • connector_config - Configuration settings for the connector (JSON string)
  • connector_version - Version of the connector being used
  • last_run - Timestamp of the last execution
Resource Configuration:
  • python_execution_cpu - CPU allocation for Python execution
  • python_execution_memory - Memory allocation for Python execution
  • spark_driver_cores - Number of cores for Spark driver
  • spark_driver_memory - Memory allocation for Spark driver
  • spark_executor_cores - Number of cores per Spark executor
  • spark_executor_memory - Memory allocation per Spark executor
  • spark_executor_instances - Number of Spark executor instances
  • spark_executor_disk - Disk space allocation per Spark executor
  • spark_execution_timeout_minutes - Timeout for Spark execution
Settings:
  • settings_number_of_retries - Number of retry attempts on failure
  • settings_retry_delay_seconds - Delay between retry attempts
  • settings_max_consecutive_failures - Maximum consecutive failures before pausing
  • settings_full_sync_cron - Cron expression for full sync schedule
  • settings_full_sync_cron_timezone - Timezone for full sync schedule
Nested Objects:
  • output_layer - Target layer information (id, name, slug, description, database_name)
  • created_by - User who created the source (id, name, email, role, permissions)
Stream containing all pipeline execution runs in your Nekt workspace.Key Fields:
  • id - Unique identifier for the run
  • number - Sequential run number
  • created_at - When the run was created
  • updated_at - When the run was last updated
  • started_at - When the run execution started
  • ended_at - When the run execution ended
  • duration_seconds - Total execution time in seconds
  • status - Current status of the run (e.g., running, completed, failed)
  • full_sync - Whether this was a full synchronization run
  • credit_charge_status - Status of credit charging for this run
  • triggered_by_token - Token used to trigger the run (if applicable)
Nested Objects:
  • trigger - Trigger configuration (type, cron_expression, cron_timezone, event_rule)
  • triggered_by - User who triggered the run
  • source - Source information (id, slug, active, status, description)
  • destination - Destination information (id, slug, active, status, description)
  • transformation - Transformation information (id, slug, type, active, status, description)
Stream containing all data destinations configured in your Nekt workspace.Key Fields:
  • id - Unique identifier for the destination
  • slug - URL-friendly identifier for the destination
  • active - Whether the destination is active
  • status - Current status of the destination
  • description - Description of the destination
  • is_ml_ai_connector - Whether this is an ML/AI connector
  • connector_config - Configuration settings for the connector (JSON string)
  • connector_version - Version of the connector being used
  • created_at - When the destination was created
  • updated_at - When the destination was last updated
  • last_run - Timestamp of the last execution
Resource Configuration:
  • python_execution_cpu - CPU allocation for Python execution
  • python_execution_memory - Memory allocation for Python execution
Settings:
  • settings_number_of_retries - Number of retry attempts on failure
  • settings_retry_delay_seconds - Delay between retry attempts
  • settings_max_consecutive_failures - Maximum consecutive failures before pausing
Nested Objects:
  • input_tables - Array of input table configurations (table, primary_keys, fields, name, layer)
  • created_by - User who created the destination
Stream containing all data transformations configured in your Nekt workspace.Key Fields:
  • id - Unique identifier for the transformation
  • slug - URL-friendly identifier for the transformation
  • type - Type of transformation (e.g., pyspark, sql)
  • active - Whether the transformation is active
  • status - Current status of the transformation
  • description - Description of the transformation
  • code - Transformation code/script
  • created_at - When the transformation was created
  • updated_at - When the transformation was last updated
  • last_run - Timestamp of the last execution
  • dependencies - Array of Python dependencies
  • add_apache_sedona - Whether Apache Sedona is enabled
Resource Configuration:
  • spark_driver_cores - Number of cores for Spark driver
  • spark_driver_memory - Memory allocation for Spark driver
  • spark_executor_cores - Number of cores per Spark executor
  • spark_executor_memory - Memory allocation per Spark executor
  • spark_executor_instances - Number of Spark executor instances
  • spark_executor_disk - Disk space allocation per Spark executor
  • spark_execution_timeout_minutes - Timeout for Spark execution
Settings:
  • settings_number_of_retries - Number of retry attempts on failure
  • settings_retry_delay_seconds - Delay between retry attempts
  • settings_max_consecutive_failures - Maximum consecutive failures before pausing
  • settings_timezone - Timezone for transformation execution
  • delta_log_retention_duration - Delta log retention period
Nested Objects:
  • input_tables - Array of input tables (id, name_reference, table, timestamps)
  • output_tables - Array of output tables (id, name_reference, table, timestamps)
  • created_by - User who created the transformation
  • input_volumes - Array of input volume references
Stream containing all audit activities and changes in your Nekt workspace.Key Fields:
  • id - Unique identifier for the activity
  • activity_type - Type of activity performed
  • changed_fields - Fields that were changed (JSON string)
  • created_at - When the activity occurred
  • created_by_system - Whether the activity was performed by the system
  • pipeline_automatically_paused_after_x_failures - Auto-pause threshold
Entity References:
  • source - Related source ID (if applicable)
  • destination - Related destination ID (if applicable)
  • transformation - Related transformation ID (if applicable)
  • table - Related table ID (if applicable)
  • volume - Related volume ID (if applicable)
  • run - Related run ID (if applicable)
  • visualization - Related visualization ID (if applicable)
Nested Objects:
  • created_by - User who performed the activity
Stream containing trigger configurations for data sources.Key Fields:
  • id - Unique identifier for the trigger
  • type - Type of trigger (e.g., cron, event)
  • created_at - When the trigger was created
  • updated_at - When the trigger was last updated
  • cron_expression - Cron expression for scheduled triggers
  • cron_timezone - Timezone for cron execution
  • events - Array of events that trigger execution
  • event_rule - Rule for event-based triggers
Stream containing trigger configurations for data destinations.Key Fields:
  • id - Unique identifier for the trigger
  • type - Type of trigger (e.g., cron, event)
  • created_at - When the trigger was created
  • updated_at - When the trigger was last updated
  • cron_expression - Cron expression for scheduled triggers
  • cron_timezone - Timezone for cron execution
  • events - Array of events that trigger execution
  • event_rule - Rule for event-based triggers

Data Model

The following diagram illustrates the relationships between the core data streams in Nekt. The arrows indicate the join keys that link the different entities, providing a clear overview of the data platform structure.

Use Cases for Data Analysis

Here are some valuable business intelligence use cases when consolidating Nekt platform data, along with ready-to-use SQL queries that you can run on Explorer.

1. Daily Credits Consumption per Pipeline

Monitor credit consumption for each pipeline in a daily basis. Business Value:
  • Track pipeline credit consumption
  • Analyze resource utilization and make future projections
  • Identify pipelines that can be optmized

SQL code

SELECT
	coalesce(s.slug, t.slug, d.slug) AS identifier,
	DATE(r.ended_at) AS date,
	count(r.id) AS total_runs,
	sum(r.duration_seconds) / 60 AS duration_minutes
FROM
	"nekt_raw"."nekt_runs" r
	LEFT JOIN "nekt_raw"."nekt_sources" s ON r.source.id = s.id
	LEFT JOIN "nekt_raw"."nekt_destinations" d ON r.destination.id = d.id
	LEFT JOIN "nekt_raw"."nekt_transformations" t ON r.transformation.id = t.id
WHERE
	r.credit_charge_status = 'charged'
GROUP BY
	coalesce(s.slug, t.slug, d.slug),
	DATE(r.ended_at)
ORDER BY
	DATE(r.ended_at) DESC

2. Data Source Performance and Reliability Analysis

Monitor data pipeline performance, execution times, and failure rates across your data sources. Business Value:
  • Track pipeline execution trends and identify performance bottlenecks
  • Monitor data pipeline reliability and uptime
  • Analyze resource utilization and optimize infrastructure costs
  • Identify sources that need attention

SQL code

WITH
	run_metrics AS (
		SELECT
			r.source.slug AS source_name,
			r.status,
			r.full_sync,
			r.duration_seconds,
			r.started_at,
			r.ended_at,
			r.credit_charge_status,
			DATE_TRUNC ('day', r.started_at) AS run_date,
			CASE
				WHEN r.status = 'success' THEN 1
				ELSE 0
			END AS success_flag,
			CASE
				WHEN r.status = 'failed' THEN 1
				ELSE 0
			END AS failure_flag
		FROM
			nekt_raw.nekt_runs r
		WHERE
			r.started_at >= CURRENT_DATE - INTERVAL '30' DAY
			AND r.started_at IS NOT NULL
	),
	daily_summary AS (
		SELECT
			run_date,
			source_name,
			COUNT(*) AS total_runs,
			SUM(success_flag) AS successful_runs,
			SUM(failure_flag) AS failed_runs,
			AVG(duration_seconds) AS avg_duration_seconds,
			APPROX_PERCENTILE (duration_seconds, 0.5) AS median_duration_seconds,
			MAX(duration_seconds) AS max_duration_seconds,
			COUNT(
				CASE
					WHEN full_sync = TRUE THEN 1
				END
			) AS full_sync_runs
		FROM
			run_metrics
		WHERE
			source_name IS NOT NULL
		GROUP BY
			run_date,
			source_name
	),
	source_reliability AS (
		SELECT
			source_name,
			SUM(total_runs) AS total_runs_30d,
			SUM(successful_runs) AS total_successful_runs,
			SUM(failed_runs) AS total_failed_runs,
			ROUND(
				CAST(SUM(successful_runs) AS DOUBLE) / NULLIF(SUM(total_runs), 0) * 100,
				2
			) AS success_rate_percentage,
			ROUND(AVG(avg_duration_seconds) / 60.0, 2) AS avg_duration_minutes,
			ROUND(AVG(median_duration_seconds) / 60.0, 2) AS median_duration_minutes,
			COUNT(DISTINCT run_date) AS active_days
		FROM
			daily_summary
		GROUP BY
			source_name
	)
SELECT
	source_name,
	total_runs_30d,
	total_successful_runs,
	total_failed_runs,
	success_rate_percentage,
	avg_duration_minutes,
	median_duration_minutes,
	active_days,
	CASE
		WHEN success_rate_percentage >= 95 THEN 'Excellent'
		WHEN success_rate_percentage >= 90 THEN 'Good'
		WHEN success_rate_percentage >= 80 THEN 'Needs Attention'
		ELSE 'Critical'
	END AS reliability_status,
	CASE
		WHEN avg_duration_minutes <= 5 THEN 'Fast'
		WHEN avg_duration_minutes <= 15 THEN 'Normal'
		WHEN avg_duration_minutes <= 60 THEN 'Slow'
		ELSE 'Very Slow'
	END AS performance_status
FROM
	source_reliability
WHERE
	total_runs_30d > 0
ORDER BY
	total_runs_30d DESC,
	success_rate_percentage DESC

3. User Activity and Platform Usage Analysis

Track user engagement, platform adoption, and operational activities across your Nekt workspace. Business Value:
  • Monitor platform adoption and user engagement
  • Identify power users and training needs
  • Track operational patterns and peak usage times
  • Optimize team collaboration and workflow efficiency

SQL code

WITH
  user_activities AS (
    SELECT
      a.created_by.email AS user_email,
      a.created_by.first_name || ' ' || a.created_by.last_name AS user_name,
      a.created_by.role AS user_role,
      a.created_by.functional_area AS functional_area,
      a.activity_type,
      a.created_at,
      a.created_by_system,
      DATE_TRUNC('day', a.created_at) AS activity_date,
      DATE_TRUNC('hour', a.created_at) AS activity_hour,
      CASE
        WHEN a.source IS NOT NULL THEN 'source'
        WHEN a.destination IS NOT NULL THEN 'destination'
        WHEN a.transformation IS NOT NULL THEN 'transformation'
        WHEN a.visualization IS NOT NULL THEN 'visualization'
        WHEN a.run IS NOT NULL THEN 'run'
        ELSE 'other'
      END AS entity_type
    FROM nekt_raw.nekt_activities a
    WHERE a.created_at >= CURRENT_DATE - INTERVAL '30' DAY
      AND a.created_by_system = false
      AND a.created_by.email IS NOT NULL
  ),
  user_summary AS (
    SELECT
      user_email,
      user_name,
      user_role,
      functional_area,
      COUNT(*) AS total_activities,
      COUNT(DISTINCT activity_date) AS active_days,
      COUNT(DISTINCT activity_type) AS unique_activity_types,
      COUNT(DISTINCT entity_type) AS entity_types_used,
      MIN(created_at) AS first_activity,
      MAX(created_at) AS last_activity,
      ARBITRARY(activity_type) AS most_common_activity,
      ARBITRARY(entity_type) AS most_used_entity_type
    FROM user_activities
    GROUP BY user_email, user_name, user_role, functional_area
  ),
  hourly_patterns AS (
    SELECT
      user_email,
      EXTRACT(hour FROM activity_hour) AS hour_of_day,
      COUNT(*) AS activities_count
    FROM user_activities
    GROUP BY user_email, EXTRACT(hour FROM activity_hour)
  ),
  peak_hours AS (
    SELECT
      user_email,
      hour_of_day,
      activities_count,
      ROW_NUMBER() OVER (PARTITION BY user_email ORDER BY activities_count DESC) AS hour_rank
    FROM hourly_patterns
  ),
  team_collaboration AS (
    SELECT
      functional_area,
      COUNT(DISTINCT user_email) AS team_size,
      SUM(total_activities) AS team_total_activities,
      AVG(total_activities) AS avg_activities_per_user,
      AVG(active_days) AS avg_active_days_per_user,
      SUM(CASE WHEN active_days >= 20 THEN 1 ELSE 0 END) AS highly_active_users
    FROM user_summary
    WHERE functional_area IS NOT NULL
    GROUP BY functional_area
  )
SELECT
  us.user_name,
  us.user_email,
  us.user_role,
  us.functional_area,
  us.total_activities,
  us.active_days,
  ROUND(
    CAST(us.active_days AS DOUBLE) / 30.0 * 100,
    1
  ) AS engagement_percentage,
  us.unique_activity_types,
  us.entity_types_used,
  us.most_common_activity,
  us.most_used_entity_type,
  ph.hour_of_day AS peak_activity_hour,
  us.first_activity,
  us.last_activity,
  tc.team_size,
  ROUND(tc.avg_activities_per_user, 1) AS team_avg_activities,
  CASE
    WHEN us.active_days >= 25 THEN 'Highly Active'
    WHEN us.active_days >= 15 THEN 'Active'
    WHEN us.active_days >= 5 THEN 'Moderate'
    ELSE 'Low Activity'
  END AS activity_level,
  CASE
    WHEN us.entity_types_used >= 4 THEN 'Power User'
    WHEN us.entity_types_used >= 2 THEN 'Regular User'
    ELSE 'Basic User'
  END AS user_type
FROM user_summary us
LEFT JOIN peak_hours ph ON us.user_email = ph.user_email AND ph.hour_rank = 1
LEFT JOIN team_collaboration tc ON us.functional_area = tc.functional_area
ORDER BY 
  us.total_activities DESC,
  us.active_days DESC