1. Introduction

The REST API connector allows you to integrate with any REST API service by configuring the necessary parameters such as base URL, authentication, pagination, and stream mapping. This connector is highly flexible and can be customized to work with most REST APIs that follow standard conventions.

With this connector, you can:

  • Connect to any REST API endpoint
  • Configure different authentication methods (Basic Auth, Bearer Token, API Key)
  • Set up pagination to handle large datasets
  • Define custom headers and query parameters
  • Map API responses to your desired schema

This guide will walk you through the process of setting up and configuring your REST API connector to start ingesting data from your chosen API service.

2. Add your REST API configuration

  1. In the Sources tab, click on the “Add source” button located at the top right of your screen. Then, select the REST API option from the list of connectors.

  2. Click Next and you’ll be prompted to add the connector configuration. There are two distinct types of configuration for this source: top-level connector configuration and stream-level configuration.

Top-level connector configuration

The top-level connector configuration defines paramaters such as base URL, authentication, pagination, headers and query params. However, some of these configs may be different from stream to stream. In that case, you can override some of these parameters in the stream configuration if needed.

This is the list of top-level configurations available:

  • Base API URL: The base URL of the REST API endpoint. This is the root URL that all endpoints will be appended to. Example: https://api.example.com/v1.

  • Authentication type: The type of authentication required by the API. You should choose the method that matches your API’s authentication requirements. Depending on the auth mode selected, additional fields will become available to capture the required information.

    • None: No authentication required.
    • Basic Auth: Username/password authentication.
    • Bearer Token: Token-based authentication.
    • API Key: API key in header authentication.

    For this method, you can pass a list of key-value pairs for API key header authentication. The key is the header name and the value is the header value.

  • Pagination mode: The type of pagination required by the API. Choose the method that matches your API’s pagination style.

    • Cursor Paginator: Uses a cursor token to navigate through pages.
    • Offset Paginator: Uses offset/limit parameters.
    • Page Number Paginator: Uses page numbers.
    • None: No pagination required.
  • Next page token path: For cursor paginatioin, the JSONPath expression indicating where to find the cursor information in the response payload.

  • Pagination page size: Number of records to fetch per page. This helps control the size of each request.

  • Pagination limit per page API param: The name of the API parameter that specifies the number of records to fetch per page. For example: limit.

  • Pagination next page/offset API param: The name of the API parameter that specifies the next page/offset. For example: page.

  • Pagination initial page/offset: The initial offset to start pagination from. Usually it’s 0 for offset based pagination and 1 for page number based pagination, but it can vary from API to API.

  • Query params: A list of key-value pairs representing the query params for a GET method. Pagination parameters don’t need to be provided as they are automatically added by the connector.

  • Headers: A list of key-value pairs representing headers to pass into the api calls. Stream level headers will be merged with top-level params with stream level params overwriting top-level params with the same key.

Stream-level configuration

Stream configuration is specific to each data stream. Imagine you have multiple endpoints you want to fetch data from (e.g in a Trello connector, you could have users, projects, boards and tasks). Each of these entities correspond to a data stream and has specific configurations.

This is the list of stream configurations available:

  • Data stream name: The name of the data stream. It shouldn’t contain spaces or special characters. Example: users

  • Endpoint path: The relative endpoint path to the base URL. Example: /users

  • Primary keys: The primary keys of the data stream. It should correspond to existing field names on the payload. For composite primary keys, add more than one item. You should make sure the name of the primary key matches the name of the field in the payload (it’s case-sensitive).

  • Records JSON Path: A JSONPath string representing the path in the request response that contains the records to process. Default is $[*] for responses that return an array of items directly.

  • Number of records to infer the schema: The number of records to use to infer the schema. The higher the number, the more accurate the schema will be, but the more the whole execution process will take. Default is 100.

  • Replication key: The field to use for incremental replication. It must be either a date or a number. You need to choose a field that is updated whenever a record is created or updated to ensure consistency. Example: updated_at

    • Query param replication field: The name of the query parameter in the source API to filter records by the replication key value. The reason for this field to exist is that in some cases the replication key is different from the query param field. Example last-updated
    • Source search query: The query to use for incremental replication. In case the API requires more complex queries, you should use an expression with $last_run_date as the variable representing the last time the execution happened, which will be replaced at runtime. Example: > 2021-01-01
    • Start date: The date to start the extraction from. It must be a date in the format DDD-MM-YYYY
  • Request params: A list of key-value pairs providing the params for a GET method. Pagination parameters don’t need to be provided as they are automatically added by the connector. Stream level params will be merged with top-level params with stream level params overwriting top-level params with the same key.

  • Request headers: A list of key-value pairs representing the headers to pass into the API calls. Stream level headers will be merged with top-level params with stream level params overwriting top-level params with the same key.

Once this is done, you’re good to move on to the next step.

  1. Click Next.

3. Select your REST API streams

  1. In this step, you will be able to see the discovered streams catalog and the available fields for each stream. You have the option to filter both streams and fields during this step.

  2. Select the desired streams and fields - by default, all are selected.

  3. Click Next.

4. Configure your REST API data streams

  1. Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.

    • Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
    • Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
    • Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
    • Sync Type: this connector allows INCREMENTAL sync based on the date the documents were last modified. Read more about Sync Types here.
  2. Click Next.

5. Configure your REST API data source

  1. Describe your data source for easy identification within your organization. You can inform things like what data it brings, to which team it belongs, etc.

  2. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).

Check your new source!

  1. Click Next to finalize the setup. Once completed, you’ll receive confirmation that your new source is set up!

  2. You can view your new source on the Sources page. Now, for you to be able to see it on your Catalog, you have to wait for the pipeline to run. You can now monitor it on the Sources page to see its execution and completion. If needed, manually trigger the pipeline by clicking on the refresh icon. Once executed, your new table will appear in the Catalog section.

If you encounter any issues, reach out to us via AWS S3 Parquet, and we’ll gladly assist you!