
Configuring ClickHouse as a Source
In the Sources tab, click on the “Add source” button located on the top right of your screen. Then, select the ClickHouse option from the list of connectors. Click Next and you’ll be prompted to add your database access.1. Add database access
You’ll need the following connection details to connect to your ClickHouse database:-
Host: The host address of your ClickHouse database. Do not include the protocol (http:// or https://).
- For ClickHouse Cloud:
your-instance.region.provider.clickhouse.cloud - For self-hosted: Your server hostname or IP address
- For ClickHouse Cloud:
-
Port: The port used for connecting to your ClickHouse database.
- Default for HTTPS:
8443 - Default for HTTP:
8123 - Default for Native protocol:
9440(secure) or9000(insecure)
- Default for HTTPS:
-
Database: The name of the database you want to extract data from. Default is
default. -
Username: The username for accessing your ClickHouse database. Default is
default. - Password: The password for the specified user.
-
Batch Size: The number of rows to fetch per batch during extraction. Default is
50000. Adjust based on your table row sizes and memory constraints.
Finding Your ClickHouse Cloud Connection Details
Finding Your ClickHouse Cloud Connection Details
If you’re using ClickHouse Cloud:
- Log in to your ClickHouse Cloud Console
- Select your service from the dashboard
- Click on Connect in the left sidebar
- Choose HTTPS as the connection method
- Copy the following details:
- Host: The hostname shown (e.g.,
abc123.us-east1.gcp.clickhouse.cloud) - Port: Usually
8443for HTTPS - Username: Your configured username (default is
default) - Password: The password you set when creating the service
- Host: The hostname shown (e.g.,
Make sure your ClickHouse Cloud service allows connections from Nekt’s IP addresses. You may need to configure the IP Access List in your service settings.
2. Select streams
Choose which data streams (tables/views) you want to sync. You can select entire databases or pick specific tables. The connector will automatically discover all available tables and views in the specified database. System tables are excluded by default.Tip: The stream can be found more easily by typing its name.Select the streams and click Next.
3. Configure data streams
Customize how you want your data to appear in your catalog. Select a name for each table (which will contain the fetched data) and the type of sync.- Table name: We suggest a name, but feel free to customize it. You have the option to add a prefix and make this process faster!
-
Sync Type: You can choose between INCREMENTAL and FULL_TABLE.
- Incremental: Every time the extraction happens, we’ll get only the new data since the last sync. This is efficient for large tables with a reliable timestamp or incrementing ID column.
- Full table: Every time the extraction happens, we’ll get the current state of the data. This is useful for dimension tables or when you need complete accuracy.
For incremental syncs, you’ll need to select a Replication Key - a column that indicates when a row was created or modified (e.g.,
created_at, updated_at, or an auto-incrementing id).4. Configure data source
Describe your data source for easy identification within your organization, not exceeding 140 characters. To define your Trigger, consider how often you want data to be extracted from this source. This decision usually depends on how frequently your ClickHouse data is updated and how current you need your analytics to be. Optionally, you can determine when to execute a full sync. This will complement the incremental data extractions, ensuring that your data is completely synchronized with your source every once in a while. Once you are ready, click Next to finalize the setup.5. Check your new source
You can view your new source on the Sources page. If needed, manually trigger the source extraction by clicking on the arrow button. Once executed, your data will appear in your Catalog.Implementation Notes
Connection Security
Connection Security
By default, the connector uses secure HTTPS connections to your ClickHouse database. This ensures your data is encrypted in transit.ClickHouse Cloud: Always uses HTTPS on port 8443.Self-hosted: Make sure your ClickHouse server is configured with SSL/TLS certificates for secure connections.
Performance Optimization
Performance Optimization
ClickHouse is optimized for reading large amounts of data quickly. To get the best performance:
- Batch Size: The default batch size of 50,000 rows works well for most use cases. If you have very wide tables (many columns), consider reducing this value.
- Incremental Syncs: Use incremental syncs whenever possible. ClickHouse excels at filtering data by sorted columns (typically date-based).
-
Replication Key Selection: Choose a column that ClickHouse can efficiently filter on like
updated_ator a column that identifies a record has been modified.
DateTime Handling
DateTime Handling
ClickHouse DateTime columns don’t accept ISO format strings with timezone suffixes in comparisons. The connector automatically handles this by converting bookmark values to ClickHouse-compatible format (
YYYY-MM-DD HH:MM:SS).This means you don’t need to worry about timezone handling - the connector takes care of it automatically.Excluded Tables
Excluded Tables
The following system databases are automatically excluded from discovery:
systemINFORMATION_SCHEMAinformation_schema
Data Types
Data Types
ClickHouse data types are mapped to standard types for compatibility:
| ClickHouse Type | Output Type |
|---|---|
| Int8, Int16, Int32, Int64 | Integer |
| UInt8, UInt16, UInt32, UInt64 | Integer |
| Float32, Float64 | Number |
| Decimal | Number |
| String, FixedString | String |
| Date, Date32 | Date |
| DateTime, DateTime64 | DateTime |
| Bool | Boolean |
| UUID | String |
| Array | Array |
| Nullable(T) | Nullable of T |