XML (eXtensible Markup Language) is a markup language designed to store and transport data in a structured format. It uses tags to define elements and attributes, making it self-descriptive and widely used for data exchange between different systems and applications.

Configuring XML as a Source

In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the XML option from the list of connectors. Click Next and you’ll be prompted to add your access.

1. Add account access

You’ll need to configure your XML data source with the following parameters:

Site URL: The full URL for the XML file you’d like to extract (e.g., https://www.example.com/file.xml).
Target tag name: The name of the XML tag that should be mapped to each individual record generated by the stream (e.g., Listing, Comments, Item).
Sample size (optional): The number of tags to process for generating the schema. This parameter is important for faster processing times when parsing large XML files. However, if the elements in the target XML don’t have a very consistent schema, it’s recommended to set the sample size to a higher number to avoid generating an incomplete schema. Default is 5000.
Authentication (optional): If your XML file requires authentication, you can configure one of the following types:
- Basic: Username and password authentication
- API Key: API key authentication (can be sent in header or query parameters)
- Bearer: Bearer token authentication
- Opensan: Custom authentication for Opensan integration
Append nocache parameter (optional): (Default: false) When enabled, appends a _nocache query parameter with the current timestamp to the URL. This is useful for bypassing server-side caching.

Once you’re done, click Next.

2. Select streams

Choose which data streams you want to sync. The XML connector creates a single stream based on the target tag name you specified. This stream will contain all the data extracted from the XML file, with each record representing one instance of the target tag.

Tip: The stream can be found more easily by typing its name.

Select the streams and click Next.

3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.

Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
- Incremental: every time the extraction happens, we’ll get only the new data - which is good if, for example, you want to keep every record ever fetched.
- Full table: every time the extraction happens, we’ll get the current state of the data - which is good if, for example, you don’t want to have deleted data in your catalog.

Once you are done configuring, click Next.

4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define some additional settings:

Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.

Once you are ready, click Next to finalize the setup.

5. Check your new source

You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.

For you to be able to see it on your Catalog, you need at least one successful source run.

Streams and Fields

The XML connector creates a single data stream based on the target tag name you specified during configuration. The stream structure depends on the XML schema of your target elements.

Dynamic Stream (Based on Target Tag)

The XML connector analyzes your XML structure and creates a stream with fields corresponding to the elements and attributes within your target tag. The exact fields will vary based on your XML schema.Key Characteristics:

Stream name matches your configured target tag name
Each record represents one instance of the target XML element
Field names correspond to child elements and attributes
Field types are automatically inferred during schema generation
Nested elements are flattened or represented as JSON objects
Sample size configuration determines how many elements are analyzed for schema generation

Common Field Patterns:

Attributes become fields with their attribute names
Child elements become fields with their tag names
Text content of elements becomes field values
Nested structures may be represented as JSON strings or flattened fields

Example: If your target tag is Product and your XML contains:

<Product id="123" category="electronics">
  <Name>Laptop</Name>
  <Price currency="USD">999.99</Price>
  <Description>High-performance laptop</Description>
</Product>

The resulting stream might have fields like:

id (String) - Product ID from attribute
category (String) - Category from attribute
Name (String) - Product name
Price (Number) - Product price
Price_currency (String) - Currency attribute
Description (String) - Product description

Implementation Notes

Data Processing

The connector downloads and caches XML files locally for processing efficiency
Large XML files are processed in chunks to manage memory usage
Schema generation samples the specified number of target elements (default: 5000)
Field types are automatically inferred from the sample data

Network & Reliability

To maximize compatibility and bypass restrictive firewalls or anti-bot protections, the connector mimics standard browser access by including a recognized User-Agent.
The connector features automatic fallback mechanisms using TLS impersonation to seamlessly retry and maintain connections if dropped by the server.
To bypass server-side caching, the connector supports optionally appending a dynamic _nocache timestamp parameter to the URL.

Performance Considerations

For large XML files, consider adjusting the sample size based on schema consistency
The connector caches downloaded files to improve processing speed on subsequent schema analysis
Processing time depends on XML file size and complexity of the target elements

Authentication Support

The connector supports various authentication methods for accessing protected XML files, including Basic Auth, API Key authentication, Bearer tokens, and custom authentication schemes.

Skills for agents

Download XML skills file

XML connector documentation as plain markdown, for use in AI agent contexts.

Documentation

Get started

Using Nekt

Workspace

Resources

XML as a data source

Configuring XML as a Source

1. Add account access

2. Select streams

3. Configure data streams

4. Configure data source

5. Check your new source

Streams and Fields

Implementation Notes

Data Processing

Network & Reliability

Performance Considerations

Authentication Support

Skills for agents

Download XML skills file

Documentation

Get started

Using Nekt

Workspace

Resources

Documentation Index

​Configuring XML as a Source

​1. Add account access

​2. Select streams

​3. Configure data streams

​4. Configure data source

​5. Check your new source

​Streams and Fields

​Implementation Notes

​Data Processing

​Network & Reliability

​Performance Considerations

​Authentication Support

​Skills for agents

Download XML skills file

Configuring XML as a Source

1. Add account access

2. Select streams

3. Configure data streams

4. Configure data source

5. Check your new source

Streams and Fields

Implementation Notes

Data Processing

Network & Reliability

Performance Considerations

Authentication Support

Skills for agents