Configuring Vault as a Source
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the Vault option from the list of connectors. Click Next and you’ll be prompted to add your access.1. Add account access
You’ll need to provide your Vault instance details and authentication token to access your secrets data. The following configurations are available:-
Vault Address: The URL of your Vault instance (e.g.,
https://vault.<your-company>.com.br/). The connector will automatically extract the base URL and use the/v1API endpoint. -
Auth Token: Your Vault authentication token (Bearer token) used for API access. For KV v2 streams, the token needs
listandreadon the secret paths you use. For Identity streams, it needslistandreadon/identity/entity/name,/identity/entity-alias/id, and/or/identity/group/nameas needed.
2. Select streams
Choose which data streams you want to sync. The connector provides two families of streams: Identity (Vault Identity secrets engine):- kv_identity_list_entities: Lists all entity names; parent for entity details
- kv_identity_entity: Full entity data (aliases, policies, groups) for each entity name
- kv_identity_list_entity_aliases: Lists all entity-alias IDs; parent for alias details
- kv_identity_entity_alias: Full alias data (mount, canonical entity) for each alias ID
- kv_identity_list_groups: Lists all group names; parent for group details
- kv_identity_group: Full group data (members, policies, type) for each group name
- kv_list_secrets: Lists all secret paths and keys recursively from your Vault instance
- kv_secrets: Retrieves the actual secret data (key-value pairs) for each secret
- kv_subkeys: Retrieves the subkey structure for each secret (keys only, values are null)
Tip: The stream can be found more easily by typing its name.Select the streams and click Next.
3. Configure data streams
Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.- Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
- Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
- Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
- Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
- Incremental: every time the extraction happens, we’ll get only the new data - which is good if, for example, you want to keep every record ever fetched.
- Full table: every time the extraction happens, we’ll get the current state of the data - which is good if, for example, you don’t want to have deleted data in your catalog.
4. Configure data source
Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define some additional settings:- Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
- Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.
5. Check your new source
You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.Streams and Fields
Below you’ll find all available data streams from Vault and their corresponding fields. Streams are grouped into Identity (entities, aliases, groups) and KV v2 (secrets).kv_identity_list_entities
kv_identity_list_entities
Parent stream that lists all entity names via the Identity API (
/identity/entity/name?list=true). The child stream kv_identity_entity uses each name to fetch full entity data.Key Fields:name- Entity name (identifier used to read entity by name)
- Calls the list endpoint to get all entity names
- Child stream
kv_identity_entityreceives each name as context and fetches detailed entity data
kv_identity_entity
kv_identity_entity
Child stream that retrieves full entity data for each entity name from
kv_identity_list_entities. Uses the Identity API endpoint /identity/entity/name/:name.Key Fields:id- Entity identifier (UUID)name- Entity nameentity_name- Entity name from parent contextaliases- Entity aliases as a JSON array string (to avoid SDK flattening into dotted keys)creation_time,last_update_time- Timestampsdirect_group_ids,group_ids,inherited_group_ids- Group membershipmerged_entity_ids- Merged entity IDsdisabled- Whether the entity is disabledmetadata- Metadata key-value map (JSON string)policies- Policies tied to the entity
- One record per entity; supports incremental sync via
last_update_time
kv_identity_list_entity_aliases
kv_identity_list_entity_aliases
Parent stream that lists all entity-alias IDs via the Identity API (
/identity/entity-alias/id?list=true). The child stream kv_identity_entity_alias uses each ID to fetch full alias data.Key Fields:id- Entity alias identifier (UUID)
- Calls the list endpoint to get all alias IDs
- Child stream
kv_identity_entity_aliasreceives each ID as context and fetches detailed alias data
kv_identity_entity_alias
kv_identity_entity_alias
Child stream that retrieves full alias data for each alias ID from
kv_identity_list_entity_aliases. Uses the Identity API endpoint /identity/entity-alias/id/:id.Key Fields:id- Entity alias identifier (UUID)canonical_id- Entity ID this alias belongs tocreation_time,last_update_time- Timestampslocal- Whether the alias is local to the clustermount_accessor,mount_path,mount_type- Mount informationname- Alias name (e.g. username in auth source)custom_metadata,metadata- JSON strings
- One record per alias; supports incremental sync via
last_update_time
kv_identity_list_groups
kv_identity_list_groups
Parent stream that lists all group names via the Identity API (
/identity/group/name?list=true). The child stream kv_identity_group uses each name to fetch full group data.Key Fields:name- Group name (identifier used to read group by name)
- Calls the list endpoint to get all group names
- Child stream
kv_identity_groupreceives each name as context and fetches detailed group data
kv_identity_group
kv_identity_group
Child stream that retrieves full group data for each group name from
kv_identity_list_groups. Uses the Identity API endpoint /identity/group/name/:name.Key Fields:id- Group identifier (UUID)name- Group namealias- Group alias objectcreation_time,last_update_time- Timestampsmember_entity_ids,member_group_ids- Membersparent_group_ids- Parent group IDsmodify_index- Modify indextype- Group type (internal or external)metadata- Metadata key-value map (JSON string)policies- Policies tied to the group
- One record per group; supports incremental sync via
last_update_time
kv_list_secrets
kv_list_secrets
Parent stream that recursively lists all secret paths and keys from your Vault KV v2 secrets engine. This stream traverses folders and subfolders to discover all available secrets.Key Fields:
path- The folder path where the secret or key was foundkey- The name of the key or secret (folders end with/)is_folder- Boolean indicating whether the key is a folder (true) or an actual secret (false)
- Starts from the root of the KV v2 secrets engine (
secret/) - Recursively traverses all folders and subfolders
- Lists both folders and individual secrets
- Child streams (
kv_secretsandkv_subkeys) use this stream’s output to fetch detailed data for each secret
kv_secrets
kv_secrets
Child stream that retrieves the actual secret data (key-value pairs) for each secret discovered by
kv_list_secrets. This stream fetches the complete secret data including all key-value pairs stored in each secret.Key Fields:path- The full path to the secret (e.g.,Data Ops/airbnb-contato)version- The version number of the secretdata- JSON field containing all key-value pairs stored in the secret. Values can be strings, numbers, booleans, or other JSON types.metadata- Object containing version-specific metadata:created_time- ISO 8601 timestamp when the secret version was createdcustom_metadata- JSON field with custom metadata key-value pairs (string values)deletion_time- ISO 8601 timestamp when the secret was deleted (empty string if not deleted)destroyed- Boolean indicating if the secret version has been destroyedversion- The version number
- Returns one record per secret version
- The
datafield can contain any JSON-serializable values (not just strings) - Secrets that don’t exist or return 404 are gracefully skipped
kv_subkeys
kv_subkeys
Child stream that retrieves the subkey structure for each secret discovered by
kv_list_secrets. This stream provides the hierarchical structure of keys within each secret, with values set to null at leaf nodes.Key Fields:path- The full path to the secret (e.g.,Data Ops/airbnb-contato)version- The version number of the secretsubkeys- JSON field representing the hierarchical structure of keys within the secret. Leaf nodes havenullvalues, while nested objects represent subdirectories with the same structure.metadata- Object containing version-specific metadata (same structure askv_secrets):created_time- ISO 8601 timestamp when the secret version was createdcustom_metadata- JSON field with custom metadata key-value pairs (string values)deletion_time- ISO 8601 timestamp when the secret was deleted (empty string if not deleted)destroyed- Boolean indicating if the secret version has been destroyedversion- The version number
- Returns the structure of keys without their values (values are
null) - Useful for understanding the schema and organization of secrets
- Secrets that don’t exist or return 404 are gracefully skipped
Data Model
The following diagram illustrates the relationships between the data streams in Vault. The arrows indicate how child streams depend on the parent stream for context. Relationship Details: Identity:kv_identity_list_entitieslists entity names;kv_identity_entityuses eachentity_nameto fetch full entity data (aliases, policies, groups).kv_identity_list_entity_aliaseslists entity-alias IDs;kv_identity_entity_aliasuses eachentity_alias_idto fetch full alias data (canonical entity, mount, metadata).kv_identity_list_groupslists group names;kv_identity_groupuses eachgroup_nameto fetch full group data (members, policies, type).
kv_list_secretsis the parent stream that discovers all secrets recursively.- Both
kv_secretsandkv_subkeysare child streams that depend onkv_list_secrets; each record provides asecret_pathcontext. - Child streams automatically skip folder records (where
is_folderistrue).
Use Cases for Data Analysis
This guide outlines valuable business intelligence use cases when consolidating Vault data, along with ready-to-use SQL queries that you can run on Explorer.Identity: Entity and Group Inventory
Synckv_identity_entity and kv_identity_group to maintain a full list of Vault entities and groups, with policies and membership. Use kv_identity_entity_alias to map auth method aliases (e.g. usernames) to canonical entity IDs for access and audit analysis.
1. Secret Inventory and Organization
Get a comprehensive overview of all secrets stored in your Vault instance, organized by path and type. Business Value:- Maintain an inventory of all secrets across your organization
- Identify secrets by their location and path structure
- Track which secrets are folders vs. actual secret data
- Audit secret organization and naming conventions
SQL query
SQL query
- AWS
- GCP
Sample Result
Sample Result
| path | key | is_folder | type |
|---|---|---|---|
| (root) | airbnb-contato | false | Secret |
| (root) | default | false | Secret |
| Data Ops | airbnb-contato | false | Secret |
| Data Ops | default | false | Secret |
| Data Solutions | DataResourcesPostgreSQL | false | Secret |
| Administrativo Serviços | 360Imprimir | false | Secret |
| Administrativo Serviços | ACATE | false | Secret |
2. Secret Data Analysis
Analyze the actual secret data stored in your Vault, including all key-value pairs and their metadata. Business Value:- Understand what data is stored in each secret
- Track secret versions and creation times
- Identify secrets that have been deleted or destroyed
- Monitor custom metadata associated with secrets
SQL query
SQL query
- AWS
- GCP
Sample Result
Sample Result
| path | version | data | created_time | destroyed | deletion_time | custom_metadata |
|---|---|---|---|---|---|---|
| Data Ops/airbnb-contato | 1 | {"Login": "[email protected]", "Password": "***"} | 2024-09-23T14:27:07Z | false | null | |
| Data Solutions/DataResourcesPostgreSQL | 1 | {"DB_HOST": "db.example.com", "DB_PORT": 5432, "DB_NAME": "mydb"} | 2024-04-12T13:12:10Z | false | null |
3. Secret Structure Analysis
Analyze the hierarchical structure of keys within secrets to understand their organization and schema. Business Value:- Understand the schema and structure of secrets
- Identify common key patterns across secrets
- Track changes in secret structure over time
- Plan migrations or reorganizations based on structure
SQL query
SQL query
- AWS
- GCP
Sample Result
Sample Result
| path | version | subkeys | created_time |
|---|---|---|---|
| Data Solutions/DataResourcesPostgreSQL | 1 | {"DB_HOST": null, "DB_PORT": null, "DB_NAME": null} | 2024-04-12T13:12:10Z |
| Administrativo Serviços/360Imprimir | 1 | {"Senha": null, "Usuário": null} | 2024-04-12T13:12:10Z |
4. Secret Version Tracking
Track secret versions and their lifecycle to understand when secrets were created, modified, or deleted. Business Value:- Monitor secret version history
- Identify recently created or modified secrets
- Track secret lifecycle and changes over time
- Audit secret management practices
SQL query
SQL query
- AWS
- GCP
Sample Result
Sample Result
| path | latest_version | total_versions | first_created | last_modified | destroyed_versions | deleted_versions |
|---|---|---|---|---|---|---|
| Administrativo Serviços/Amazon | 3 | 3 | 2023-08-22T19:56:51Z | 2024-01-15T10:30:22Z | 0 | 0 |
| Data Ops/airbnb-contato | 1 | 1 | 2024-09-23T14:27:07Z | 2024-09-23T14:27:07Z | 0 | 0 |
| Data Solutions/DataResourcesPostgreSQL | 1 | 1 | 2024-04-12T13:12:10Z | 2024-04-12T13:12:10Z | 0 | 0 |
Implementation Notes
Data Quality Considerations
- The connector uses both the Vault Identity API (
/identity/entity/name,/identity/entity-alias/id,/identity/group/name) and the KV v2 secrets engine (secret/mount path). - Identity: Parent streams call list endpoints; child streams request each item by name or ID. Ensure the token has
listandreadon the Identity paths you use. - KV v2: Paths with spaces and special characters are automatically URL-encoded. Secrets that return 404 are gracefully skipped (not treated as errors). Folder records are automatically filtered out from child streams. The connector recursively traverses all folders and subfolders.
API Limits & Performance
- A 1-second delay is applied between requests to avoid rate limiting
- For large Vault instances with many secrets, extraction may take some time
- The connector handles recursive folder traversal efficiently by tracking visited folders
- Child streams only process individual secrets (folders are automatically skipped)
Security Considerations
- Authentication tokens are stored securely and marked as secret fields
- The connector only reads secrets (does not modify or delete them)
- Ensure your authentication token has appropriate read permissions for the KV v2 secrets engine
- Consider using Vault policies to restrict access to only necessary paths