REST API as a data source

REST API (Representational State Transfer Application Programming Interface) is a standard architectural style for designing networked applications. It allows different software systems to communicate over HTTP using standard methods like GET, POST, PUT, and DELETE to exchange data in various formats like JSON or XML.

1. Introduction

The REST API connector allows you to integrate with any REST API service by configuring the necessary parameters such as base URL, authentication, pagination, and stream mapping. This connector is highly flexible and can be customized to work with most REST APIs that follow standard conventions. With this connector, you can:

Connect to any REST API endpoint
Configure different authentication methods (Basic Auth, Bearer Token, API Key)
Set up pagination to handle large datasets
Define custom headers and query parameters
Map API responses to your desired schema

This guide will walk you through the process of setting up and configuring your REST API connector to start ingesting data from your chosen API service.

2. Add your REST API configuration

In the Sources tab, click on the “Add source” button located at the top right of your screen. Then, select the REST API option from the list of connectors.
Click Next and you’ll be prompted to add the connector configuration. There are two distinct types of configuration for this source: top-level connector configuration and stream-level configuration.

Top-level connector configuration

The top-level connector configuration defines paramaters such as base URL, authentication, pagination, headers and query params. However, some of these configs may be different from stream to stream. In that case, you can override some of these parameters in the stream configuration if needed. This is the list of top-level configurations available:

Base API URL: The base URL of the REST API endpoint. This is the root URL that all endpoints will be appended to. Example: https://api.example.com/v1.
Authentication type: The type of authentication required by the API. You should choose the method that matches your API’s authentication requirements. Depending on the auth mode selected, additional fields will become available to capture the required information.
- None: No authentication required.
- Basic Auth: Username/password authentication.
- Bearer Token: Token-based authentication.
- API Key: API key in header authentication.
For API Key method, you can pass a list of key-value pairs for API key header authentication. The key is the header name and the value is the header value.
- OAuth: OAuth 2.0 authentication for secure API access. For OAuth method, you’ll need to provide additional OAuth configuration details including Client ID, Client Secret, Token URL, and Grant Type.
  - Client ID: The client identifier for OAuth authentication. This is provided by the API service when you register your application.
  - Client Secret: The client secret for OAuth authentication. This is provided by the API service when you register your application and should be kept secure.
  - OAuth Token URL: The URL endpoint where OAuth tokens are exchanged. This is typically provided by the API service documentation.
  - Grant Type: The OAuth grant type to use for authentication:
    - Client Credentials: Used for server-to-server authentication where the application acts on its own behalf.
    - Refresh Token: Used when you have a refresh token to obtain new access tokens.
  - Refresh Token: The refresh token for OAuth authentication. Required when using the “Refresh Token” grant type.
  - Scopes: Optional scopes for OAuth authentication. One or more scopes can be provided delimited by a space or comma, depending on how the API works. Scopes define the level of access your application has to the API.
Pagination mode: The type of pagination required by the API. Choose the method that matches your API’s pagination style.
- Cursor Paginator: Uses a cursor token to navigate through pages.
- Offset Paginator: Uses offset/limit parameters.
- Page Number Paginator: Uses page numbers.
- None: No pagination required.
- Daily Paginator: Uses date ranges passed as query params to paginate through data.
  - Date start param name: The name of the query parameter that indicates the start date.
  - Date end param name: The name of the query parameter that indicates the end date.
  - Date format: The format of the date to use in requests. Options:
    - YYYY-MM-DD: Standard date format (e.g. 2024-01-15)
    - ISO8601: ISO 8601 format (e.g. 2024-01-15T00:00:00Z)
Next page token path: For cursor paginatioin, the JSONPath expression indicating where to find the cursor information in the response payload.
Example
// Example API Response: { "data": [ {"id": 1, "name": "Item 1"}, {"id": 2, "name": "Item 2"} ], "pagination": { "cursor": "eyJpZCI6Mn0=", "has_more": true, "total": 100 } }
In this case, the Next Page Token Path following the JSONPath syntax is going to be $.pagination.cursor. For detailed instructions on JSONPath syntax, click here.
Pagination page size: Number of records to fetch per page. This helps control the size of each request.
Pagination limit per page API param: The name of the API parameter that specifies the number of records to fetch per page. For example: limit.
Pagination next page/offset API param: The name of the API parameter that specifies the next page/offset. For example: page.
Example of limit and next page params
Let’s say you set Pagination page size to 50, Pagination limite per page API param to limit and Pagination next page/offset API param to offset. The request will created the following query params: ?limit=50&offset=0.
Please note offset is dynamic and will be incremented automatically by the paginator.
Pagination initial page/offset: The initial offset to start pagination from. Usually it’s 0 for offset based pagination and 1 for page number based pagination, but it can vary from API to API.
Query params: A list of key-value pairs representing the query params for a GET method. Pagination parameters don’t need to be provided as they are automatically added by the connector.
Headers: A list of key-value pairs representing headers to pass into the api calls. Stream level headers will be merged with top-level params with stream level params overwriting top-level params with the same key.

Stream-level configuration

Stream configuration is specific to each data stream. Imagine you have multiple endpoints you want to fetch data from (e.g in a Trello connector, you could have users, projects, boards and tasks). Each of these entities correspond to a data stream and has specific configurations. This is the list of stream configurations available:

Data stream name: The name of the data stream. It shouldn’t contain spaces or special characters. Example: users
Endpoint path: The relative endpoint path to the base URL. Example: /users
Primary keys: The primary keys of the data stream. It should correspond to existing field names on the payload. For composite primary keys, add more than one item. You should make sure the name of the primary key matches the name of the field in the payload (it’s case-sensitive).
Example
// Example API Response: { "data": [ { "user_id": 1, "email": "john@example.com", "name": "John Doe" }, { "user_id": 2, "email": "jane@example.com", "name": "Jane Doe" } ] }
In this case, you could set user_id as the primary key since it uniquely identifies each record. For a composite key example, if you had a purchases stream with fields user_id and order_id, you might want to set both as primary keys since the combination uniquely identifies each purchase.
Records JSON Path: A JSONPath string representing the path in the request response that contains the records to process. Default is $[*] for responses that return an array of items directly.
Example
// Example API Response: { "meta": { "total": 100 }, "data": { "users": [ {"id": 1, "name": "User 1"}, {"id": 2, "name": "User 2"} ] } }
In this case, the Records JSON Path would be $.data.users[*] to extract the array of user records.
Number of records to infer the schema: The number of records to use to infer the schema. The higher the number, the more accurate the schema will be, but the more the whole execution process will take. Default is 100.
Replication key: The field to use for incremental replication. It must be either a date or a number. You need to choose a field that is updated whenever a record is created or updated to ensure consistency. Example: updated_at
Example
// Example API Response: { "data": [ { "id": 1, "name": "John Doe", "updated_at": "2024-01-15T10:30:00Z" }, { "id": 2, "name": "Jane Doe", "updated_at": "2024-01-16T14:20:00Z" } ] }
In this case, updated_at would be a good replication key since it changes whenever a record is modified.
- Query param replication field: The name of the query parameter in the source API to filter records by the replication key value. The reason for this field to exist is that in some cases the replication key is different from the query param field. Example last-updated
Example
// If your API endpoint looks like: GET /users?last-updated=2024-01-15T10:30:00Z // But in the response payload the field is named: { "updated_at": "2024-01-15T10:30:00Z" } // Then you would set: **Replication key**: updated_at **Query param replication field**: last-updated
- Source search query: The query to use for incremental replication. In case the API requires more complex queries, you should use an expression with $last_run_date as the variable representing the last time the execution happened, which will be replaced at runtime. Example: > 2021-01-01
Example
// For an API that accepts complex query syntax like: GET /users?filter=updated_from > '2024-01-15' // You would set the source search query as: gt '$last_run_date' // At runtime, if the last run was on Jan 15, 2024, it would become: GET /users?filter=updated_at gt '2024-01-15'
- Start date: The date to start the extraction from. It must be a date in the format DDD-MM-YYYY
Request params: A list of key-value pairs providing the params for a GET method. Pagination parameters don’t need to be provided as they are automatically added by the connector. Stream level params will be merged with top-level params with stream level params overwriting top-level params with the same key.
Request headers: A list of key-value pairs representing the headers to pass into the API calls. Stream level headers will be merged with top-level params with stream level params overwriting top-level params with the same key.

Once this is done, you’re good to move on to the next step.

Click Next.

3. Select your REST API streams

In this step, you will be able to see the discovered streams catalog and the available fields for each stream. You have the option to filter both streams and fields during this step.
Select the desired streams and fields - by default, all are selected.
Click Next.

4. Configure your REST API data streams

Customize how you want your data to appear in your catalog. Select the desired layer where the data will be placed, a folder to organize it inside the layer, a name for each table (which will effectively contain the fetched data) and the type of sync.
- Layer: choose between the existing layers on your catalog. This is where you will find your new extracted tables as the extraction runs successfully.
- Folder: a folder can be created inside the selected layer to group all tables being created from this new data source.
- Table name: we suggest a name, but feel free to customize it. You have the option to add a prefix to all tables at once and make this process faster!
- Sync Type: this connector allows INCREMENTAL sync based on the date the documents were last modified. Read more about Sync Types here.
Click Next.

5. Configure your REST API data source

Describe your data source for easy identification within your organization. You can inform things like what data it brings, to which team it belongs, etc.
To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently you need the new table data updated (every day, once a week, or only at specific times).

Check your new source!

Click Next to finalize the setup. Once completed, you’ll receive confirmation that your new source is set up!
You can view your new source on the Sources page. Now, for you to be able to see it on your Catalog, you have to wait for the pipeline to run. You can now monitor it on the Sources page to see its execution and completion. If needed, manually trigger the pipeline by clicking on the refresh icon. Once executed, your new table will appear in the Catalog section.

If you encounter any issues, reach out to us via AWS S3 Parquet, and we’ll gladly assist you!

Get started

Using Nekt

Understand

Use cases

FAQ

REST API as a data source

1. Introduction

2. Add your REST API configuration

Top-level connector configuration

Stream-level configuration

3. Select your REST API streams

4. Configure your REST API data streams

5. Configure your REST API data source

Check your new source!

Get started

Using Nekt

Understand

Use cases

FAQ

​1. Introduction

​2. Add your REST API configuration

​Top-level connector configuration

​Stream-level configuration

​3. Select your REST API streams

​4. Configure your REST API data streams

​5. Configure your REST API data source

​Check your new source!

1. Introduction

2. Add your REST API configuration

Top-level connector configuration

Stream-level configuration

3. Select your REST API streams

4. Configure your REST API data streams

5. Configure your REST API data source

Check your new source!