
Configuring XML as a Source
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the XML option from the list of connectors. Click Next and you’ll be prompted to add your access.1. Add account access
You’ll need to configure your XML data source with the following parameters:-
Site URL: The full URL for the XML file you’d like to extract (e.g.,
https://www.example.com/file.xml). -
Target tag name: The name of the XML tag that should be mapped to each individual record generated by the stream (e.g.,
Listing,Comments,Item). - Sample size (optional): The number of tags to process for generating the schema. This parameter is important for faster processing times when parsing large XML files. However, if the elements in the target XML don’t have a very consistent schema, it’s recommended to set the sample size to a higher number to avoid generating an incomplete schema. Default is 5000.
-
Authentication (optional): If your XML file requires authentication, you can configure one of the following types:
- Basic: Username and password authentication
- API Key: API key authentication (can be sent in header or query parameters)
- Bearer: Bearer token authentication
- Opensan: Custom authentication for Opensan integration
2. Select streams
Choose which data streams you want to sync. The XML connector creates a single stream based on the target tag name you specified. This stream will contain all the data extracted from the XML file, with each record representing one instance of the target tag.Tip: The stream can be found more easily by typing its name.Select the streams and click Next.
3. Configure data streams
Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.- Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
- Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
- Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
- Sync Type: you can choose between INCREMENTAL and FULL_TABLE.
- Incremental: every time the extraction happens, we’ll get only the new data - which is good if, for example, you want to keep every record ever fetched.
- Full table: every time the extraction happens, we’ll get the current state of the data - which is good if, for example, you don’t want to have deleted data in your catalog.
4. Configure data source
Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times). Optionally, you can define some additional settings:- Configure Delta Log Retention and determine for how long we should store old states of this table as it gets updated. Read more about this resource here.
- Determine when to execute an Additional Full Sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while.
5. Check your new source
You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.Streams and Fields
The XML connector creates a single data stream based on the target tag name you specified during configuration. The stream structure depends on the XML schema of your target elements.Dynamic Stream (Based on Target Tag)
Dynamic Stream (Based on Target Tag)
The XML connector analyzes your XML structure and creates a stream with fields corresponding to the elements and attributes within your target tag. The exact fields will vary based on your XML schema.Key Characteristics:The resulting stream might have fields like:
- Stream name matches your configured target tag name
- Each record represents one instance of the target XML element
- Field names correspond to child elements and attributes
- Field types are automatically inferred during schema generation
- Nested elements are flattened or represented as JSON objects
- Sample size configuration determines how many elements are analyzed for schema generation
- Attributes become fields with their attribute names
- Child elements become fields with their tag names
- Text content of elements becomes field values
- Nested structures may be represented as JSON strings or flattened fields
Product and your XML contains:id(String) - Product ID from attributecategory(String) - Category from attributeName(String) - Product namePrice(Number) - Product pricePrice_currency(String) - Currency attributeDescription(String) - Product description
Implementation Notes
Data Processing
- The connector downloads and caches XML files locally for processing efficiency
- Large XML files are processed in chunks to manage memory usage
- Schema generation samples the specified number of target elements (default: 5000)
- Field types are automatically inferred from the sample data
Network & Reliability
- To maximize compatibility and bypass restrictive firewalls or anti-bot protections, the connector mimics standard browser access by including a recognized
User-Agent. - The connector features automatic fallback mechanisms using TLS impersonation to seamlessly retry and maintain connections if dropped by the server.
Performance Considerations
- For large XML files, consider adjusting the sample size based on schema consistency
- The connector caches downloaded files to improve processing speed on subsequent schema analysis
- Processing time depends on XML file size and complexity of the target elements
Authentication Support
The connector supports various authentication methods for accessing protected XML files, including Basic Auth, API Key authentication, Bearer tokens, and custom authentication schemes.Skills for agents
Download XML skills file
XML connector documentation as plain markdown, for use in AI agent contexts.