> ## Documentation Index
> Fetch the complete documentation index at: https://docs.nekt.com/llms.txt
> Use this file to discover all available pages before exploring further.

# XML as a data source

> Bring data from XML to Nekt.

XML (eXtensible Markup Language) is a markup language designed to store and transport data in a structured format. It uses tags to define elements and attributes, making it self-descriptive and widely used for data exchange between different systems and applications.

<img height="50" src="https://mintcdn.com/nekt/43FsQ37QF_gxIqKI/assets/logo/logo--xml.png?fit=max&auto=format&n=43FsQ37QF_gxIqKI&q=85&s=711c0478212a8b6826d65cbdaca10155" data-path="assets/logo/logo--xml.png" />

## Configuring XML as a Source

In the [Sources](https://app.nekt.ai/sources) tab, click on the "Add source" button located on the top right of your screen. Then, select the XML option from the list of connectors.

Click **Next** and you'll be prompted to add your access.

### 1. Add account access

You'll need to configure your XML data source with the following parameters:

* **Site URL**: The full URL for the XML file you'd like to extract (e.g., `https://www.example.com/file.xml`).

* **Target tag name**: The name of the XML tag that should be mapped to each individual record generated by the stream (e.g., `Listing`, `Comments`, `Item`).

* **Sample size (optional)**: The number of tags to process for generating the schema. This parameter is important for faster processing times when parsing large XML files. However, if the elements in the target XML don't have a very consistent schema, it's recommended to set the sample size to a higher number to avoid generating an incomplete schema. Default is 5000.

* **Authentication (optional)**: If your XML file requires authentication, you can configure one of the following types:
  * **Basic**: Username and password authentication
  * **API Key**: API key authentication (can be sent in header or query parameters)
  * **Bearer**: Bearer token authentication
  * **Opensan**: Custom authentication for Opensan integration

* **Append nocache parameter (optional)**: (Default: `false`) When enabled, appends a `_nocache` query parameter with the current timestamp to the URL. This is useful for bypassing server-side caching.

Once you're done, click **Next**.

### 2. Select streams

Choose which data streams you want to sync. The XML connector creates a single stream based on the target tag name you specified. This stream will contain all the data extracted from the XML file, with each record representing one instance of the target tag.

> Tip: The stream can be found more easily by typing its name.

Select the streams and click **Next**.

### 3. Configure data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.

* **Layer**: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
* **Folder**: a folder can be created inside the selected layer to group all tables being created from this new data source.
* **Table name**: we suggest a name, but feel free to customize it. You have the option to add a **prefix** to all tables at once and make this process faster!
* **Sync Type**: you can choose between INCREMENTAL and FULL\_TABLE.
  * Incremental: every time the extraction happens, we'll get only the new data - which is good if, for example, you want to keep every record ever fetched.
  * Full table: every time the extraction happens, we'll get the current state of the data - which is good if, for example, you don't want to have deleted data in your catalog.

Once you are done configuring, click **Next**.

### 4. Configure data source

Describe your data source for easy identification within your organization, not exceeding 140 characters.

To define your [Trigger](https://docs.nekt.com/get-started/core-concepts/triggers), consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).

Optionally, you can define some additional settings:

* Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource [here](https://docs.nekt.com/get-started/core-concepts/resource-control).
* Determine when to execute an **Additional [Full Sync](https://docs.nekt.com/get-started/core-concepts/types-of-sync#additional-full-sync)**. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.

Once you are ready, click **Next** to finalize the setup.

### 5. Check your new source

You can view your new source on the [Sources](https://app.nekt.ai/sources) page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.

<Warning>For you to be able to see it on your [Catalog](https://app.nekt.ai/catalog), you need at least one successful source run.</Warning>

# Streams and Fields

The XML connector creates a single data stream based on the target tag name you specified during configuration. The stream structure depends on the XML schema of your target elements.

<AccordionGroup>
  <Accordion title="Dynamic Stream (Based on Target Tag)">
    The XML connector analyzes your XML structure and creates a stream with fields corresponding to the elements and attributes within your target tag. The exact fields will vary based on your XML schema.

    **Key Characteristics:**

    * Stream name matches your configured target tag name
    * Each record represents one instance of the target XML element
    * Field names correspond to child elements and attributes
    * Field types are automatically inferred during schema generation
    * Nested elements are flattened or represented as JSON objects
    * Sample size configuration determines how many elements are analyzed for schema generation

    **Common Field Patterns:**

    * Attributes become fields with their attribute names
    * Child elements become fields with their tag names
    * Text content of elements becomes field values
    * Nested structures may be represented as JSON strings or flattened fields

    **Example:** If your target tag is `Product` and your XML contains:

    ```xml theme={null}
    <Product id="123" category="electronics">
      <Name>Laptop</Name>
      <Price currency="USD">999.99</Price>
      <Description>High-performance laptop</Description>
    </Product>
    ```

    The resulting stream might have fields like:

    * `id` (String) - Product ID from attribute
    * `category` (String) - Category from attribute
    * `Name` (String) - Product name
    * `Price` (Number) - Product price
    * `Price_currency` (String) - Currency attribute
    * `Description` (String) - Product description
  </Accordion>
</AccordionGroup>

# Implementation Notes

### Data Processing

* The connector downloads and caches XML files locally for processing efficiency
* Large XML files are processed in chunks to manage memory usage
* Schema generation samples the specified number of target elements (default: 5000)
* Field types are automatically inferred from the sample data

### Network & Reliability

* To maximize compatibility and bypass restrictive firewalls or anti-bot protections, the connector mimics standard browser access by including a recognized `User-Agent`.
* The connector features automatic fallback mechanisms using TLS impersonation to seamlessly retry and maintain connections if dropped by the server.
* To bypass server-side caching, the connector supports optionally appending a dynamic `_nocache` timestamp parameter to the URL.

### Performance Considerations

* For large XML files, consider adjusting the sample size based on schema consistency
* The connector caches downloaded files to improve processing speed on subsequent schema analysis
* Processing time depends on XML file size and complexity of the target elements

### Authentication Support

The connector supports various authentication methods for accessing protected XML files, including Basic Auth, API Key authentication, Bearer tokens, and custom authentication schemes.

## Skills for agents

<Snippet file="agent-skills-intro.mdx" />

<Card title="Download XML skills file" icon="wand-magic-sparkles" href="/sources/xml.md">
  XML connector documentation as plain markdown, for use in AI agent contexts.
</Card>
