Skip to main content
Vault is a secrets management tool that helps organizations store, access, and manage sensitive data such as API keys, passwords, certificates, and other secrets. It provides a centralized platform for managing secrets across applications and infrastructure.

Configuring Vault as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the Vault option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

You’ll need to provide your Vault instance details and authentication token to access your secrets data. The following configurations are available:
  • Vault Address: The URL of your Vault instance (e.g., https://vault.<your-company>.com.br/). The connector will automatically extract the base URL and use the /v1 API endpoint.
  • Auth Token: Your Vault authentication token (Bearer token) used for API access. For KV v2 streams, the token needs list and read on the secret paths you use. For Identity streams, it needs list and read on /identity/entity/name, /identity/entity-alias/id, and/or /identity/group/name as needed.
Once you’re done, click Next.

2. Select streams

Choose which data streams you want to sync. The connector provides two families of streams: Identity (Vault Identity secrets engine):
  • kv_identity_list_entities: Lists all entity names; parent for entity details
  • kv_identity_entity: Full entity data (aliases, policies, groups) for each entity name
  • kv_identity_list_entity_aliases: Lists all entity-alias IDs; parent for alias details
  • kv_identity_entity_alias: Full alias data (mount, canonical entity) for each alias ID
  • kv_identity_list_groups: Lists all group names; parent for group details
  • kv_identity_group: Full group data (members, policies, type) for each group name
KV v2 (secrets engine):
  • kv_list_secrets: Lists all secret paths and keys recursively from your Vault instance
  • kv_secrets: Retrieves the actual secret data (key-value pairs) for each secret
  • kv_subkeys: Retrieves the subkey structure for each secret (keys only, values are null)
Tip: The stream can be found more easily by typing its name.
Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.
  • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
  • Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
  • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
  • Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
    • Incremental: every time the extraction happens, we’ll get only the new data - which is good if, for example, you want to keep every record ever fetched.
    • Full table: every time the extraction happens, we’ll get the current state of the data - which is good if, for example, you don’t want to have deleted data in your catalog.
Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define some additional settings:
  • Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
  • Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.
Once you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.
For you to be able to see it on your Catalog, you need at least one successful source run.

Streams and Fields

Below you’ll find all available data streams from Vault and their corresponding fields. Streams are grouped into Identity (entities, aliases, groups) and KV v2 (secrets).
Parent stream that lists all entity names via the Identity API (/identity/entity/name?list=true). The child stream kv_identity_entity uses each name to fetch full entity data.Key Fields:
  • name - Entity name (identifier used to read entity by name)
How it works:
  • Calls the list endpoint to get all entity names
  • Child stream kv_identity_entity receives each name as context and fetches detailed entity data
Child stream that retrieves full entity data for each entity name from kv_identity_list_entities. Uses the Identity API endpoint /identity/entity/name/:name.Key Fields:
  • id - Entity identifier (UUID)
  • name - Entity name
  • entity_name - Entity name from parent context
  • aliases - Entity aliases as a JSON array string (to avoid SDK flattening into dotted keys)
  • creation_time, last_update_time - Timestamps
  • direct_group_ids, group_ids, inherited_group_ids - Group membership
  • merged_entity_ids - Merged entity IDs
  • disabled - Whether the entity is disabled
  • metadata - Metadata key-value map (JSON string)
  • policies - Policies tied to the entity
Notes:
  • One record per entity; supports incremental sync via last_update_time
Parent stream that lists all entity-alias IDs via the Identity API (/identity/entity-alias/id?list=true). The child stream kv_identity_entity_alias uses each ID to fetch full alias data.Key Fields:
  • id - Entity alias identifier (UUID)
How it works:
  • Calls the list endpoint to get all alias IDs
  • Child stream kv_identity_entity_alias receives each ID as context and fetches detailed alias data
Child stream that retrieves full alias data for each alias ID from kv_identity_list_entity_aliases. Uses the Identity API endpoint /identity/entity-alias/id/:id.Key Fields:
  • id - Entity alias identifier (UUID)
  • canonical_id - Entity ID this alias belongs to
  • creation_time, last_update_time - Timestamps
  • local - Whether the alias is local to the cluster
  • mount_accessor, mount_path, mount_type - Mount information
  • name - Alias name (e.g. username in auth source)
  • custom_metadata, metadata - JSON strings
Notes:
  • One record per alias; supports incremental sync via last_update_time
Parent stream that lists all group names via the Identity API (/identity/group/name?list=true). The child stream kv_identity_group uses each name to fetch full group data.Key Fields:
  • name - Group name (identifier used to read group by name)
How it works:
  • Calls the list endpoint to get all group names
  • Child stream kv_identity_group receives each name as context and fetches detailed group data
Child stream that retrieves full group data for each group name from kv_identity_list_groups. Uses the Identity API endpoint /identity/group/name/:name.Key Fields:
  • id - Group identifier (UUID)
  • name - Group name
  • alias - Group alias object
  • creation_time, last_update_time - Timestamps
  • member_entity_ids, member_group_ids - Members
  • parent_group_ids - Parent group IDs
  • modify_index - Modify index
  • type - Group type (internal or external)
  • metadata - Metadata key-value map (JSON string)
  • policies - Policies tied to the group
Notes:
  • One record per group; supports incremental sync via last_update_time
Parent stream that recursively lists all secret paths and keys from your Vault KV v2 secrets engine. This stream traverses folders and subfolders to discover all available secrets.Key Fields:
  • path - The folder path where the secret or key was found
  • key - The name of the key or secret (folders end with /)
  • is_folder - Boolean indicating whether the key is a folder (true) or an actual secret (false)
How it works:
  • Starts from the root of the KV v2 secrets engine (secret/)
  • Recursively traverses all folders and subfolders
  • Lists both folders and individual secrets
  • Child streams (kv_secrets and kv_subkeys) use this stream’s output to fetch detailed data for each secret
Child stream that retrieves the actual secret data (key-value pairs) for each secret discovered by kv_list_secrets. This stream fetches the complete secret data including all key-value pairs stored in each secret.Key Fields:
  • path - The full path to the secret (e.g., Data Ops/airbnb-contato)
  • version - The version number of the secret
  • data - JSON field containing all key-value pairs stored in the secret. Values can be strings, numbers, booleans, or other JSON types.
  • metadata - Object containing version-specific metadata:
    • created_time - ISO 8601 timestamp when the secret version was created
    • custom_metadata - JSON field with custom metadata key-value pairs (string values)
    • deletion_time - ISO 8601 timestamp when the secret was deleted (empty string if not deleted)
    • destroyed - Boolean indicating if the secret version has been destroyed
    • version - The version number
Notes:
  • Returns one record per secret version
  • The data field can contain any JSON-serializable values (not just strings)
  • Secrets that don’t exist or return 404 are gracefully skipped
Child stream that retrieves the subkey structure for each secret discovered by kv_list_secrets. This stream provides the hierarchical structure of keys within each secret, with values set to null at leaf nodes.Key Fields:
  • path - The full path to the secret (e.g., Data Ops/airbnb-contato)
  • version - The version number of the secret
  • subkeys - JSON field representing the hierarchical structure of keys within the secret. Leaf nodes have null values, while nested objects represent subdirectories with the same structure.
  • metadata - Object containing version-specific metadata (same structure as kv_secrets):
    • created_time - ISO 8601 timestamp when the secret version was created
    • custom_metadata - JSON field with custom metadata key-value pairs (string values)
    • deletion_time - ISO 8601 timestamp when the secret was deleted (empty string if not deleted)
    • destroyed - Boolean indicating if the secret version has been destroyed
    • version - The version number
Notes:
  • Returns the structure of keys without their values (values are null)
  • Useful for understanding the schema and organization of secrets
  • Secrets that don’t exist or return 404 are gracefully skipped

Data Model

The following diagram illustrates the relationships between the data streams in Vault. The arrows indicate how child streams depend on the parent stream for context. Relationship Details: Identity:
  • kv_identity_list_entities lists entity names; kv_identity_entity uses each entity_name to fetch full entity data (aliases, policies, groups).
  • kv_identity_list_entity_aliases lists entity-alias IDs; kv_identity_entity_alias uses each entity_alias_id to fetch full alias data (canonical entity, mount, metadata).
  • kv_identity_list_groups lists group names; kv_identity_group uses each group_name to fetch full group data (members, policies, type).
KV v2:
  • kv_list_secrets is the parent stream that discovers all secrets recursively.
  • Both kv_secrets and kv_subkeys are child streams that depend on kv_list_secrets; each record provides a secret_path context.
  • Child streams automatically skip folder records (where is_folder is true).

Use Cases for Data Analysis

This guide outlines valuable business intelligence use cases when consolidating Vault data, along with ready-to-use SQL queries that you can run on Explorer.

Identity: Entity and Group Inventory

Sync kv_identity_entity and kv_identity_group to maintain a full list of Vault entities and groups, with policies and membership. Use kv_identity_entity_alias to map auth method aliases (e.g. usernames) to canonical entity IDs for access and audit analysis.

1. Secret Inventory and Organization

Get a comprehensive overview of all secrets stored in your Vault instance, organized by path and type. Business Value:
  • Maintain an inventory of all secrets across your organization
  • Identify secrets by their location and path structure
  • Track which secrets are folders vs. actual secret data
  • Audit secret organization and naming conventions
SELECT
   path,
   key,
   is_folder,
   CASE
      WHEN is_folder THEN 'Folder'
      ELSE 'Secret'
   END AS type
FROM
   nekt_raw.vault_kv_list_secrets
ORDER BY
   path,
   key
pathkeyis_foldertype
(root)airbnb-contatofalseSecret
(root)defaultfalseSecret
Data Opsairbnb-contatofalseSecret
Data OpsdefaultfalseSecret
Data SolutionsDataResourcesPostgreSQLfalseSecret
Administrativo Serviços360ImprimirfalseSecret
Administrativo ServiçosACATEfalseSecret

2. Secret Data Analysis

Analyze the actual secret data stored in your Vault, including all key-value pairs and their metadata. Business Value:
  • Understand what data is stored in each secret
  • Track secret versions and creation times
  • Identify secrets that have been deleted or destroyed
  • Monitor custom metadata associated with secrets
SELECT
   path,
   version,
   data,
   metadata.created_time,
   metadata.destroyed,
   metadata.deletion_time,
   metadata.custom_metadata
FROM
   nekt_raw.vault_kv_secrets
WHERE
   metadata.destroyed = false
   AND metadata.deletion_time = ''
ORDER BY
   path,
   version DESC
pathversiondatacreated_timedestroyeddeletion_timecustom_metadata
Data Ops/airbnb-contato1{"Login": "[email protected]", "Password": "***"}2024-09-23T14:27:07Zfalsenull
Data Solutions/DataResourcesPostgreSQL1{"DB_HOST": "db.example.com", "DB_PORT": 5432, "DB_NAME": "mydb"}2024-04-12T13:12:10Zfalsenull

3. Secret Structure Analysis

Analyze the hierarchical structure of keys within secrets to understand their organization and schema. Business Value:
  • Understand the schema and structure of secrets
  • Identify common key patterns across secrets
  • Track changes in secret structure over time
  • Plan migrations or reorganizations based on structure
SELECT
   path,
   version,
   subkeys,
   metadata.created_time,
   metadata.version
FROM
   nekt_raw.vault_kv_subkeys
WHERE
   metadata.destroyed = false
ORDER BY
   path,
   version DESC
pathversionsubkeyscreated_time
Data Solutions/DataResourcesPostgreSQL1{"DB_HOST": null, "DB_PORT": null, "DB_NAME": null}2024-04-12T13:12:10Z
Administrativo Serviços/360Imprimir1{"Senha": null, "Usuário": null}2024-04-12T13:12:10Z

4. Secret Version Tracking

Track secret versions and their lifecycle to understand when secrets were created, modified, or deleted. Business Value:
  • Monitor secret version history
  • Identify recently created or modified secrets
  • Track secret lifecycle and changes over time
  • Audit secret management practices
WITH secret_versions AS (
   SELECT
      path,
      version,
      metadata.created_time,
      metadata.destroyed,
      metadata.deletion_time,
      ROW_NUMBER() OVER (PARTITION BY path ORDER BY version DESC) AS version_rank
   FROM
      nekt_raw.vault_kv_secrets
)
SELECT
   path,
   MAX(version) AS latest_version,
   COUNT(*) AS total_versions,
   MIN(metadata.created_time) AS first_created,
   MAX(metadata.created_time) AS last_modified,
   SUM(CASE WHEN metadata.destroyed THEN 1 ELSE 0 END) AS destroyed_versions,
   SUM(CASE WHEN metadata.deletion_time != '' THEN 1 ELSE 0 END) AS deleted_versions
FROM
   nekt_raw.vault_kv_secrets
GROUP BY
   path
ORDER BY
   last_modified DESC
pathlatest_versiontotal_versionsfirst_createdlast_modifieddestroyed_versionsdeleted_versions
Administrativo Serviços/Amazon332023-08-22T19:56:51Z2024-01-15T10:30:22Z00
Data Ops/airbnb-contato112024-09-23T14:27:07Z2024-09-23T14:27:07Z00
Data Solutions/DataResourcesPostgreSQL112024-04-12T13:12:10Z2024-04-12T13:12:10Z00

Implementation Notes

Data Quality Considerations

  • The connector uses both the Vault Identity API (/identity/entity/name, /identity/entity-alias/id, /identity/group/name) and the KV v2 secrets engine (secret/ mount path).
  • Identity: Parent streams call list endpoints; child streams request each item by name or ID. Ensure the token has list and read on the Identity paths you use.
  • KV v2: Paths with spaces and special characters are automatically URL-encoded. Secrets that return 404 are gracefully skipped (not treated as errors). Folder records are automatically filtered out from child streams. The connector recursively traverses all folders and subfolders.

API Limits & Performance

  • A 1-second delay is applied between requests to avoid rate limiting
  • For large Vault instances with many secrets, extraction may take some time
  • The connector handles recursive folder traversal efficiently by tracking visited folders
  • Child streams only process individual secrets (folders are automatically skipped)

Security Considerations

  • Authentication tokens are stored securely and marked as secret fields
  • The connector only reads secrets (does not modify or delete them)
  • Ensure your authentication token has appropriate read permissions for the KV v2 secrets engine
  • Consider using Vault policies to restrict access to only necessary paths